Skip to content

Tags

Significance

Intuitive summaries

Definitions

  • Form of evaluation that elicits model vulnerabilities that might lead to undesirable behaviors. The goal of red-teaming language models is to craft a prompt that would trigger the model to generate text that is likely to cause harm

Technical summaries

Main resources

Landscapes

Contents

Deep dives

Brain storming

Additional resources

AI

Additional metadata

  • processed #processing #toprocess #important #short #long #casual #focus

  • Unfinished: #metadata #tags