Tags¶
- Metadata: #topic
- Part of: OpenAI [[DeepMind]] [[Anthropic]] Artificial intelligence safety
- Related:
- Includes:
- Additional:
Significance¶
Intuitive summaries¶
Definitions¶
- Form of evaluation that elicits model vulnerabilities that might lead to undesirable behaviors. The goal of red-teaming language models is to craft a prompt that would trigger the model to generate text that is likely to cause harm
Technical summaries¶
Main resources¶
Landscapes¶
Contents¶
Deep dives¶
Brain storming¶
Additional resources¶
Related¶
Related resources¶
AI¶
Additional metadata¶
-
processed #processing #toprocess #important #short #long #casual #focus¶
- Unfinished: #metadata #tags