Readteaming - Burny website

Skip to content

Burny website

Readteaming

Tags¶

Metadata: #topic
Part of: OpenAI [[DeepMind]] [[Anthropic]] Artificial intelligence safety
Related:
Includes:
Additional:

Significance¶

Intuitive summaries¶

Definitions¶

Form of evaluation that elicits model vulnerabilities that might lead to undesirable behaviors. The goal of red-teaming language models is to craft a prompt that would trigger the model to generate text that is likely to cause harm

Technical summaries¶

Main resources¶

Red-Teaming Large Language Models

Landscapes¶

Contents¶

Deep dives¶

Brain storming¶

Additional resources¶

AI¶

Additional metadata¶

processed #processing #toprocess #important #short #long #casual #focus¶
Unfinished: #metadata #tags