Leaderboards

Curated benchmark baskets for specific model-comparison views.

2leaderboards

Leaderboard Registry

Name Benchmarks Models Top models Description
BenchmarkList General
benchmarklist_general
35 123 Claude Mythos Preview, Claude Opus 4.8, Qwen3.7 Max Default general-purpose frontier-model leaderboard basket across reasoning, coding, writing, agentic, multimodal, and structured-output benchmarks.
Legal AI
legal
9 60 Claude Opus 4.7, GPT-5.5, GPT-5 Legal reasoning, contract, legal-agent, and jurisdiction-specific benchmark basket.