Artificial intelligence safety - Burny website

Skip to content

Burny website

Artificial intelligence safety

Tags¶

Metadata: #topic
Part of: Artificial Intelligence Machine learning Risks of artificial intelligence Risks
Related:
Includes:
Additional:

Significance¶

Intuitive summaries¶

Definitions¶

Technical summaries¶

Main resources¶

Landscapes¶

Methods
- Mechanistic interpretability
- Readteaming
- Evaluating dangerous capabilities
- Process supervision
- [[Artificial Intelligence governance]]
Alex Turner’s landscape
- - Mechanistic interpretability
  - Agent foundations
  - [[Cognitive Emulation]] - build predictably boundable systems (Cognitive Emulation: A Naive AI Safety Proposal — LessWrong)
  - Shard theory
  - [[Infrabayesianism]] - Infra-Bayesianism - LessWrong
  - [[Eliciting latent knowledge]] - How can we train this model to report its latent knowledge of off-screen events? Eliciting latent knowledge. How can we train an AI to honestly tell… | by Paul Christiano | AI Alignment

Contents¶

Deep dives¶

Brain storming¶

Additional resources¶

AI¶

Additional metadata¶

processed #processing #toprocess #important #short #long #casual #focus¶
Unfinished: #metadata #tags