Cross-Model Correlations

A single model archive shows you where one model breaks. But the most interesting questions are comparative: do all models break in the same places? Are there universal vulnerability hotspots? Do larger models simply scale up the same blind spots, or do they fail in genuinely different ways?

This page aggregates all 9 model archives and computes correlations, consensus maps, vocabulary patterns, and multi-metric trade-offs in your browser.

Run Selection

All correlations are computed across the same run seed for fairness. Active run: run_0