Explorer
Interact with the MAP-Elites archives. Each cell in the grid represents a unique semantic behavior or strategy used by the attacker LLM. The color intensity indicates the Alignment Deviation (severity of the safety failure) at those behavioral coordinates.
Select a model to see its final archive of behavioral edge cases, and hover over cells to read the exact adversarial prompts that breached the model's safety guardrails.
Interactive Archive Map
No data available for this configuration.