BenchBench
Benchmark Agreement Testing leaderboard that aggregates model scores across benchmarks and analyzes benchmark agreement/correlation under a standardized BAT methodology.
137rows
aggregate_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Aggregate Score
| Rank | Subject | Aggregate Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt_4o_2024_05_13 | 0.98 | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Imported | 2026-05-06 |
| 2 | chatgpt_4o_latest | 0.98 | — | Imported | 2026-05-06 |
| 3 | gpt_4o_2024_08_06 | 0.97 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-06 |
| 4 | claude_3_5_sonnet_20240620 | 0.96 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 5 | gemini_1_5_pro_exp_0801 | 0.95 | — | Imported | 2026-05-06 |
| 6 | llama3_1_70b_instruct | 0.93 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Imported | 2026-05-06 |
| 7 | gpt_4_turbo_2024_04_09 | 0.91 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 8 | claude_3_opus_20240229 | 0.88 | — | Imported | 2026-05-06 |
| 9 | yi_large_preview | 0.87 | — | Imported | 2026-05-06 |
| 10 | llama3_1_405b_instruct | 0.86 | — | Imported | 2026-05-06 |
| 11 | gpt_4_0125_preview | 0.85 | GPT-4 Turbo Preview openai-gpt-4-turbo-preview | Imported | 2026-05-06 |
| 12 | hermes_3_llama3_1_70b | 0.85 | Hermes 3 70B Instruct nousresearch-hermes-3-llama-3.1-70b | Imported | 2026-05-06 |
| 13 | zephyr_orpo_141b_a35b_v0_1 | 0.84 | — | Imported | 2026-05-06 |
| 14 | mistral_large_2407 | 0.84 | Mistral Large 2407 mistralai-mistral-large-2407 | Imported | 2026-05-06 |
| 15 | gpt_4o_mini_2024_07_18 | 0.83 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-06 |
| 16 | claude_2_0 | 0.83 | — | Imported | 2026-05-06 |
| 17 | smaug_qwen2_72b_instruct | 0.83 | — | Imported | 2026-05-06 |
| 18 | gemini_1_5_pro_api_0514 | 0.83 | — | Imported | 2026-05-06 |
| 19 | llama3_70b_instruct | 0.82 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-06 |
| 20 | llama3_70b | 0.81 | — | Imported | 2026-05-06 |
| 21 | gemma_2_9b_it_dpo | 0.81 | — | Imported | 2026-05-06 |
| 22 | llama3_instruct_8b_simpo | 0.80 | — | Imported | 2026-05-06 |
| 23 | yi_large | 0.79 | — | Imported | 2026-05-06 |
| 24 | gemma_2_27b_it | 0.78 | Gemma 2 27B google-gemma-2-27b-it | Imported | 2026-05-06 |
| 25 | qwen2_72b_instruct | 0.77 | — | Imported | 2026-05-06 |
| 26 | qwen1_5_32b | 0.77 | — | Imported | 2026-05-06 |
| 27 | gpt_4_0613 | 0.76 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 28 | phi_3_5_moe_instruct | 0.76 | — | Imported | 2026-05-06 |
| 29 | qwen1_5_110b_chat | 0.74 | — | Imported | 2026-05-06 |
| 30 | mixtral_8x22b_v0_1 | 0.74 | — | Imported | 2026-05-06 |
| 31 | gemma_2_9b_it_simpo | 0.73 | — | Imported | 2026-05-06 |
| 32 | gemini_pro | 0.73 | — | Imported | 2026-05-06 |
| 33 | llama_2_70b | 0.73 | — | Imported | 2026-05-06 |
| 34 | gemini_1_5_flash_api_0514 | 0.73 | — | Imported | 2026-05-06 |
| 35 | yi_34b | 0.72 | — | Imported | 2026-05-06 |
| 36 | deepseek_coder_v2 | 0.71 | — | Imported | 2026-05-06 |
| 37 | nous_hermes_2_mixtral_8x7b_dpo | 0.71 | — | Imported | 2026-05-06 |
| 38 | gpt_3_5_turbo_0613 | 0.69 | GPT-3.5 Turbo (older v0613) openai-gpt-3.5-turbo-0613 | Imported | 2026-05-06 |
| 39 | claude_2_1 | 0.67 | — | Imported | 2026-05-06 |
| 40 | yi_1_5_34b_chat | 0.67 | — | Imported | 2026-05-06 |
| 41 | mistral_medium | 0.66 | — | Imported | 2026-05-06 |
| 42 | phi_3_small_128k_instruct | 0.66 | — | Imported | 2026-05-06 |
| 43 | infinity_instruct_3m_0625_llama3_8b | 0.65 | — | Imported | 2026-05-06 |
| 44 | claude_instant_1_2 | 0.65 | — | Imported | 2026-05-06 |
| 45 | mistral_v0_1_7b | 0.62 | — | Imported | 2026-05-06 |
| 46 | command_r_plus | 0.62 | — | Imported | 2026-05-06 |
| 47 | phi_3_5_mini_instruct | 0.61 | — | Imported | 2026-05-06 |
| 48 | llama3_1_8b_instruct | 0.61 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
| 49 | gemma_2_9b_it | 0.60 | — | Imported | 2026-05-06 |
| 50 | yi_1_5_9b_chat | 0.60 | — | Imported | 2026-05-06 |
| 51 | claude_3_sonnet_20240229 | 0.60 | — | Imported | 2026-05-06 |
| 52 | mixtral_8x22b_instruct_v0_1 | 0.59 | — | Imported | 2026-05-06 |
| 53 | qwen1_5_14b | 0.58 | — | Imported | 2026-05-06 |
| 54 | llama_65b | 0.58 | — | Imported | 2026-05-06 |
| 55 | deepseek_llm_67b_chat | 0.57 | — | Imported | 2026-05-06 |
| 56 | qwen1_5_32b_chat | 0.57 | — | Imported | 2026-05-06 |
| 57 | wizardlm_70b | 0.56 | — | Imported | 2026-05-06 |
| 58 | yi_34b_chat | 0.56 | — | Imported | 2026-05-06 |
| 59 | qwen1_5_72b_chat | 0.55 | — | Imported | 2026-05-06 |
| 60 | dbrx_instructruct | 0.54 | — | Imported | 2026-05-06 |
| 61 | jurassic_2_jumbo_178b | 0.53 | — | Imported | 2026-05-06 |
| 62 | mixtral_8x7b_v0_1 | 0.53 | — | Imported | 2026-05-06 |
| 63 | openchat_3_5 | 0.53 | — | Imported | 2026-05-06 |
| 64 | mistral_large_2402 | 0.51 | — | Imported | 2026-05-06 |
| 65 | solar_10_7b_instruct_v1_0 | 0.50 | — | Imported | 2026-05-06 |
| 66 | qwen2_7b_instruct | 0.50 | — | Imported | 2026-05-06 |
| 67 | phi_3_medium_4k_instruct | 0.49 | — | Imported | 2026-05-06 |
| 68 | dolphin_2_2_1_mistral_7b | 0.48 | — | Imported | 2026-05-06 |
| 69 | mistral_small_2402 | 0.48 | — | Imported | 2026-05-06 |
| 70 | glm_4_9b_chat | 0.48 | — | Imported | 2026-05-06 |
| 71 | dbrx_instruct | 0.47 | — | Imported | 2026-05-06 |
| 72 | qwen1_5_14b_chat | 0.45 | — | Imported | 2026-05-06 |
| 73 | claude_3_haiku_20240307 | 0.45 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-06 |
| 74 | gemma_7b | 0.45 | — | Imported | 2026-05-06 |
| 75 | llama3_8b_instruct | 0.44 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 76 | llama3_8b | 0.44 | — | Imported | 2026-05-06 |
| 77 | wizardlm_13b | 0.43 | — | Imported | 2026-05-06 |
| 78 | starling_lm_7b_alpha | 0.43 | — | Imported | 2026-05-06 |
| 79 | jurassic_2_grande_17b | 0.42 | — | Imported | 2026-05-06 |
| 80 | mistral_7b_v0_3 | 0.42 | — | Imported | 2026-05-06 |
| 81 | llama_2_13b | 0.41 | — | Imported | 2026-05-06 |
| 82 | llama_2_70b_chat | 0.41 | — | Imported | 2026-05-06 |
| 83 | phi_3_mini_4k_instruct | 0.40 | — | Imported | 2026-05-06 |
| 84 | openhermes_2_5_mistral_7b | 0.40 | — | Imported | 2026-05-06 |
| 85 | llama_2_13b_chat | 0.39 | — | Imported | 2026-05-06 |
| 86 | guanaco_33b | 0.38 | — | Imported | 2026-05-06 |
| 87 | phi_3_mini_128k_instruct | 0.38 | — | Imported | 2026-05-06 |
| 88 | mistral_7b_v0_2 | 0.38 | — | Imported | 2026-05-06 |
| 89 | internlm2_chat_20b | 0.37 | — | Imported | 2026-05-06 |
| 90 | starling_lm_7b_beta | 0.36 | — | Imported | 2026-05-06 |
| 91 | gpt_3_5_turbo_0125 | 0.36 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 92 | tulu_2_dpo_70b | 0.36 | — | Imported | 2026-05-06 |
| 93 | qwen1_5_7b | 0.35 | — | Imported | 2026-05-06 |
| 94 | falcon_40b | 0.35 | — | Imported | 2026-05-06 |
| 95 | yi_1_5_6b_chat | 0.34 | — | Imported | 2026-05-06 |
| 96 | zephyr_7b_alpha | 0.34 | — | Imported | 2026-05-06 |
| 97 | command_r | 0.33 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-06 |
| 98 | luminous_supreme_70b | 0.33 | — | Imported | 2026-05-06 |
| 99 | yi_6b | 0.30 | — | Imported | 2026-05-06 |
| 100 | zephyr_7b_beta | 0.29 | — | Imported | 2026-05-06 |
| 101 | mixtral_8x7b_instruct_v0_1 | 0.28 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 102 | qwen_14b_chat | 0.28 | — | Imported | 2026-05-06 |
| 103 | gemma_2_2b_it | 0.28 | — | Imported | 2026-05-06 |
| 104 | phi_3_small_8k_instruct | 0.27 | — | Imported | 2026-05-06 |
| 105 | gemma_1_1_7b_it | 0.26 | — | Imported | 2026-05-06 |
| 106 | llama_2_7b | 0.25 | — | Imported | 2026-05-06 |
| 107 | mistral_7b_instruct_v0_2 | 0.25 | — | Imported | 2026-05-06 |
| 108 | mistral_7b_instruct_v0_3 | 0.25 | — | Imported | 2026-05-06 |
| 109 | qwen1_5_7b_chat | 0.24 | — | Imported | 2026-05-06 |
| 110 | alpaca_7b | 0.23 | — | Imported | 2026-05-06 |
| 111 | luminous_extended_30b | 0.23 | — | Imported | 2026-05-06 |
| 112 | llama_13b | 0.22 | — | Imported | 2026-05-06 |
| 113 | phi_2 | 0.20 | — | Imported | 2026-05-06 |
| 114 | qwen2_1_5b_instruct | 0.20 | — | Imported | 2026-05-06 |
| 115 | yi_6b_chat | 0.19 | — | Imported | 2026-05-06 |
| 116 | vicuna_7b | 0.19 | — | Imported | 2026-05-06 |
| 117 | gemma_7b_it | 0.19 | — | Imported | 2026-05-06 |
| 118 | olmo_7b_instruct | 0.16 | — | Imported | 2026-05-06 |
| 119 | vicuna_7b_v1_5 | 0.15 | — | Imported | 2026-05-06 |
| 120 | vicuna_13b | 0.15 | — | Imported | 2026-05-06 |
| 121 | gpt_neox_20b | 0.14 | — | Imported | 2026-05-06 |
| 122 | falcon_40b_instruct | 0.13 | — | Imported | 2026-05-06 |
| 123 | qwen1_5_4b_chat | 0.13 | — | Imported | 2026-05-06 |
| 124 | falcon_7b | 0.11 | — | Imported | 2026-05-06 |
| 125 | llama_2_7b_chat | 0.11 | — | Imported | 2026-05-06 |
| 126 | gpt_j_6b | 0.10 | — | Imported | 2026-05-06 |
| 127 | luminous_base_13b | 0.08 | — | Imported | 2026-05-06 |
| 128 | gemma_2b_it | 0.08 | — | Imported | 2026-05-06 |
| 129 | gemma_1_1_2b_it | 0.07 | — | Imported | 2026-05-06 |
| 130 | olmo_7b | 0.06 | — | Imported | 2026-05-06 |
| 131 | qwen1_5_1_8b_chat | 0.06 | — | Imported | 2026-05-06 |
| 132 | qwen2_0_5b_instruct | 0.06 | — | Imported | 2026-05-06 |
| 133 | pythia_12b | 0.05 | — | Imported | 2026-05-06 |
| 134 | chatglm2_6b | 0.03 | — | Imported | 2026-05-06 |
| 135 | pythia_6_9b | 0.02 | — | Imported | 2026-05-06 |
| 136 | qwen1_5_0_5b_chat | 0.01 | — | Imported | 2026-05-06 |
| 137 | falcon_7b_instruct | 0.01 | — | Imported | 2026-05-06 |
No matching rows.