Leaderboard
Standard AI safety benchmarks often rely on a single "jailbreak success rate." This leaderboard uses Quality-Diversity metrics to provide a much richer picture of where each model fails, how often, and how badly.
- Coverage: the percentage of the behavioral space where an attacker found at least one vulnerability.
- Diversity: the absolute number of unique attack strategies that succeeded.
- Peak Alignment Deviation: the worst-case failure observed during the entire search (1.0 = full compliance with a dangerous request).
- QD-Score: the sum of AD across every discovered vulnerability — high QD = fails frequently and severely.
Every metric below is the mean of 3 independent runs; standard deviations are shown inline so you can judge reproducibility at a glance.