Tags¶
- Metadata: #topic
- Part of: Artificial Intelligence Machine learning Risks of artificial intelligence Risks
- Related:
- Includes:
- Additional:
Significance¶
Intuitive summaries¶
Definitions¶
Technical summaries¶
Main resources¶
Landscapes¶
-
Methods
- Mechanistic interpretability
- Readteaming
- Evaluating dangerous capabilities
- Process supervision
- [[Artificial Intelligence governance]]
- Mechanistic interpretability
-
- Mechanistic interpretability
- Agent foundations
- [[Cognitive Emulation]] - build predictably boundable systems (Cognitive Emulation: A Naive AI Safety Proposal — LessWrong)
- Shard theory
- [[Infrabayesianism]] - Infra-Bayesianism - LessWrong
- [[Eliciting latent knowledge]] - How can we train this model to report its latent knowledge of off-screen events? Eliciting latent knowledge. How can we train an AI to honestly tell… | by Paul Christiano | AI Alignment
Contents¶
Deep dives¶
Brain storming¶
Additional resources¶
Related¶
Related resources¶
AI¶
Additional metadata¶
-
processed #processing #toprocess #important #short #long #casual #focus¶
- Unfinished: #metadata #tags