Explorer

Interact with the MAP-Elites archives. Each cell in the grid represents a unique semantic behavior or strategy used by the attacker LLM. The color intensity indicates the Alignment Deviation (severity of the safety failure) at those behavioral coordinates.

Select a model to see its final archive of behavioral edge cases, and hover over cells to read the exact adversarial prompts that breached the model's safety guardrails.

Configuration
Select the target model and run seed.
What are these plots?

Loading details...

Interactive Map: A 2D representation of the behavioral space. Hover over filled cells to see the prompt that breached the model's safety.

3D Landscape & Contour: Visualizes the safety failure severity (Z-axis/color) across the behavioral dimensions (X/Y).

Basins of Attraction: Shows clusters of similar failure modes, highlighting structural weaknesses in the model's alignment.

Interactive Archive Map

No data available for this configuration.