RewardBench
Reward model benchmark evaluating preference models across chat, hard chat, safety, reasoning, and prior preference-evaluation sets.
188rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Chat, Chat Hard, Safety, Reasoning, Prior Sets (0.5 weight)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | infly/INF-ORM-Llama3.1-70B | 95.11 | — | Imported | 2026-05-06 |
| 2 | ShikaiChen/LDL-Reward-Gemma-2-27B-v0.1 | 94.99 | — | Imported | 2026-05-06 |
| 3 | nicolinho/QRM-Gemma-2-27B | 94.44 | — | Imported | 2026-05-06 |
| 4 | Skywork/Skywork-Reward-Gemma-2-27B-v0.2 | 94.26 | — | Imported | 2026-05-06 |
| 5 | nvidia/Llama-3.1-Nemotron-70B-Reward * | 94.11 | — | Imported | 2026-05-06 |
| 6 | Skywork/Skywork-Reward-Gemma-2-27B ⚠️ | 93.80 | — | Imported | 2026-05-06 |
| 7 | SF-Foundation/TextEval-Llama3.1-70B * ⚠️ | 93.48 | — | Imported | 2026-05-06 |
| 8 | meta-metrics/MetaMetrics-RM-v1.0 | 93.42 | — | Imported | 2026-05-06 |
| 9 | Skywork/Skywork-Critic-Llama-3.1-70B ⚠️ | 93.31 | — | Imported | 2026-05-06 |
| 10 | nicolinho/QRM-Llama3.1-8B-v2 | 93.14 | — | Imported | 2026-05-06 |
| 11 | Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | 93.13 | — | Imported | 2026-05-06 |
| 12 | nicolinho/QRM-Llama3.1-8B ⚠️ | 93.06 | — | Imported | 2026-05-06 |
| 13 | LxzGordon/URM-LLaMa-3.1-8B ⚠️ | 92.94 | — | Imported | 2026-05-06 |
| 14 | Salesforce/SFR-LLaMa-3.1-70B-Judge-r * | 92.72 | — | Imported | 2026-05-06 |
| 15 | R-I-S-E/RISE-Judge-Qwen2.5-32B | 92.66 | — | Imported | 2026-05-06 |
| 16 | Skywork/Skywork-Reward-Llama-3.1-8B ⚠️ | 92.52 | — | Imported | 2026-05-06 |
| 17 | AtlaAI/Selene-1 | 92.41 | — | Imported | 2026-05-06 |
| 18 | general-preference/GPM-Llama-3.1-8B ⚠️ | 92.24 | — | Imported | 2026-05-06 |
| 19 | nvidia/Nemotron-4-340B-Reward * | 92.00 | — | Imported | 2026-05-06 |
| 20 | Ray2333/GRM-Llama3-8B-rewardmodel-ft ⚠️ | 91.54 | — | Imported | 2026-05-06 |
| 21 | nicolinho/QRM-Llama3-8B ⚠️ | 91.10 | — | Imported | 2026-05-06 |
| 22 | SF-Foundation/TextEval-OffsetBias-12B * | 91.05 | — | Imported | 2026-05-06 |
| 23 | Ray2333/GRM-llama3.2-3B-rewardmodel-ft | 90.92 | — | Imported | 2026-05-06 |
| 24 | Salesforce/SFR-nemo-12B-Judge-r * | 90.27 | — | Imported | 2026-05-06 |
| 25 | internlm/internlm2-20b-reward | 90.16 | — | Imported | 2026-05-06 |
| 26 | Skywork/Skywork-VL-Reward-7B | 90.07 | — | Imported | 2026-05-06 |
| 27 | facebook/Self-taught-evaluator-llama3.1-70B * | 90.01 | — | Imported | 2026-05-06 |
| 28 | LxzGordon/URM-LLaMa-3-8B | 89.91 | — | Imported | 2026-05-06 |
| 29 | NCSOFT/Llama-3-OffsetBias-RM-8B | 89.42 | — | Imported | 2026-05-06 |
| 30 | AtlaAI/Selene-1-Mini-Llama-3.1-8B | 89.13 | — | Imported | 2026-05-06 |
| 31 | Skywork/Skywork-Critic-Llama-3.1-8B | 88.96 | — | Imported | 2026-05-06 |
| 32 | nvidia/Llama3-70B-SteerLM-RM * | 88.77 | — | Imported | 2026-05-06 |
| 33 | Salesforce/SFR-LLaMa-3.1-8B-Judge-r * | 88.65 | — | Imported | 2026-05-06 |
| 34 | facebook/Self-taught-Llama-3-70B * | 88.63 | — | Imported | 2026-05-06 |
| 35 | RLHFlow/ArmoRM-Llama3-8B-v0.1 | 88.60 | — | Imported | 2026-05-06 |
| 36 | Ray2333/GRM-gemma2-2B-rewardmodel-ft | 88.39 | — | Imported | 2026-05-06 |
| 37 | google/gemini-1.5-pro-0514 * | 88.20 | — | Imported | 2026-05-06 |
| 38 | R-I-S-E/RISE-Judge-Qwen2.5-7B | 88.19 | — | Imported | 2026-05-06 |
| 39 | Cohere May 2024 * | 88.16 | — | Imported | 2026-05-06 |
| 40 | google/flame-1.0-24B-july-2024 * | 87.81 | — | Imported | 2026-05-06 |
| 41 | internlm/internlm2-7b-reward | 87.59 | — | Imported | 2026-05-06 |
| 42 | ZiyiYe/Con-J-Qwen2-7B ⚠️ | 87.12 | — | Imported | 2026-05-06 |
| 43 | google/gemini-1.5-pro-0924 | 86.78 | — | Imported | 2026-05-06 |
| 44 | openai/gpt-4o-2024-08-06 | 86.73 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-06 |
| 45 | RLHFlow/pair-preference-model-LLaMA3-8B | 85.75 | — | Imported | 2026-05-06 |
| 46 | Ray2333/GRM-llama3-8B-sftreg | 85.42 | — | Imported | 2026-05-06 |
| 47 | opencompass/CompassJudger-1-32B-Instruct | 85.22 | — | Imported | 2026-05-06 |
| 48 | Cohere March 2024 * | 85.11 | — | Imported | 2026-05-06 |
| 49 | Ray2333/GRM-llama3-8B-distill | 84.64 | — | Imported | 2026-05-06 |
| 50 | Ray2333/GRM-Gemma-2B-rewardmodel-ft ⚠️ | 84.47 | — | Imported | 2026-05-06 |
| 51 | openai/gpt-4-0125-preview | 84.34 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 52 | mattshumer/Reflection-70B | 84.22 | — | Imported | 2026-05-06 |
| 53 | Anthropic/claude-3-5-sonnet-20240620 | 84.17 | — | Imported | 2026-05-06 |
| 54 | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 84.12 | — | Imported | 2026-05-06 |
| 55 | opencompass/CompassJudger-1-14B-Instruct | 84.09 | — | Imported | 2026-05-06 |
| 56 | meta-llama/Meta-Llama-3.1-70B-Instruct | 84.05 | — | Imported | 2026-05-06 |
| 57 | NCSOFT/Llama-3-OffsetBias-8B | 83.97 | — | Imported | 2026-05-06 |
| 58 | openai/gpt-4-turbo-2024-04-09 | 83.95 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 59 | sfairXC/FsfairX-LLaMA3-RM-v0.1 | 83.38 | — | Imported | 2026-05-06 |
| 60 | openai/gpt-4o-2024-05-13 | 83.27 | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Imported | 2026-05-06 |
| 61 | opencompass/CompassJudger-1-7B-Instruct | 83.17 | — | Imported | 2026-05-06 |
| 62 | internlm/internlm2-1_8b-reward | 82.17 | — | Imported | 2026-05-06 |
| 63 | CIR-AMS/BTRM_Qwen2_7b_0613 | 81.72 | — | Imported | 2026-05-06 |
| 64 | openbmb/Eurus-RM-7b | 81.59 | — | Imported | 2026-05-06 |
| 65 | Nexusflow/Starling-RM-34B | 81.33 | — | Imported | 2026-05-06 |
| 66 | google/gemma-2-27b-it | 80.90 | Gemma 2 27B google-gemma-2-27b-it | Imported | 2026-05-06 |
| 67 | google/gemini-1.5-flash-001 | 80.54 | — | Imported | 2026-05-06 |
| 68 | Ray2333/Gemma-2B-rewardmodel-ft ⚠️ | 80.48 | — | Imported | 2026-05-06 |
| 69 | allenai/tulu-v2.5-13b-preference-mix-rm | 80.27 | — | Imported | 2026-05-06 |
| 70 | Anthropic/claude-3-opus-20240229 | 80.08 | — | Imported | 2026-05-06 |
| 71 | openai/gpt-4o-mini-2024-07-18 | 80.07 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-06 |
| 72 | weqweasdas/RM-Mistral-7B | 79.82 | — | Imported | 2026-05-06 |
| 73 | NousResearch/Hermes-3-Llama-3.1-70B | 78.47 | Hermes 3 70B Instruct nousresearch-hermes-3-llama-3.1-70b | Imported | 2026-05-06 |
| 74 | hendrydong/Mistral-RM-for-RAFT-GSHF-v0 | 78.47 | — | Imported | 2026-05-06 |
| 75 | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 78.08 | — | Imported | 2026-05-06 |
| 76 | Ray2333/reward-model-Mistral-7B-instruct-Unifie... | 76.61 | — | Imported | 2026-05-06 |
| 77 | Ahjeong/MMPO_Gemma_7b_gamma1.1_epoch3 | 76.52 | — | Imported | 2026-05-06 |
| 78 | stabilityai/stablelm-2-12b-chat | 76.42 | — | Imported | 2026-05-06 |
| 79 | meta-llama/Meta-Llama-3-70B-Instruct | 76.27 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-06 |
| 80 | allenai/tulu-2-dpo-70b | 76.21 | — | Imported | 2026-05-06 |
| 81 | gemini-1.5-flash-8b | 76.01 | — | Imported | 2026-05-06 |
| 82 | Ahjeong/MMPO_Gemma_7b | 75.87 | — | Imported | 2026-05-06 |
| 83 | PoLL/gpt-3.5-turbo-0125_claude-3-sonnet-2024022... | 75.78 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 84 | allenai/llama-3-tulu-2-dpo-70b | 74.96 | — | Imported | 2026-05-06 |
| 85 | NousResearch/Nous-Hermes-2-Mistral-7B-DPO | 74.81 | — | Imported | 2026-05-06 |
| 86 | Anthropic/claude-3-sonnet-20240229 | 74.58 | — | Imported | 2026-05-06 |
| 87 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 74.55 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 88 | prometheus-eval/prometheus-8x7b-v2.0 | 74.51 | — | Imported | 2026-05-06 |
| 89 | Ray2333/GRM-Gemma-2B-sftreg | 74.51 | — | Imported | 2026-05-06 |
| 90 | general-preference/GPM-Gemma-2B | 74.49 | — | Imported | 2026-05-06 |
| 91 | 0-hero/Matter-0.1-7B-boost-DPO-preview | 74.48 | — | Imported | 2026-05-06 |
| 92 | allenai/tulu-v2.5-70b-uf-rm | 73.98 | — | Imported | 2026-05-06 |
| 93 | HuggingFaceH4/zephyr-7b-alpha | 73.92 | — | Imported | 2026-05-06 |
| 94 | upstage/SOLAR-10.7B-Instruct-v1.0 | 73.91 | — | Imported | 2026-05-06 |
| 95 | allenai/tulu-2-dpo-13b | 73.68 | — | Imported | 2026-05-06 |
| 96 | opencompass/CompassJudger-1-1.5B-Instruct | 73.44 | — | Imported | 2026-05-06 |
| 97 | allenai/llama-3-tulu-2-8b-uf-mean-rm | 73.42 | — | Imported | 2026-05-06 |
| 98 | HuggingFaceH4/starchat2-15b-v0.1 | 73.22 | — | Imported | 2026-05-06 |
| 99 | Ray2333/Gemma-2B-rewardmodel-baseline | 72.90 | — | Imported | 2026-05-06 |
| 100 | Anthropic/claude-3-haiku-20240307 | 72.89 | — | Imported | 2026-05-06 |
| 101 | HuggingFaceH4/zephyr-7b-beta | 72.81 | — | Imported | 2026-05-06 |
| 102 | allenai/llama-3-tulu-2-dpo-8b | 72.75 | — | Imported | 2026-05-06 |
| 103 | 0-hero/Matter-0.1-7B-DPO-preview | 72.47 | — | Imported | 2026-05-06 |
| 104 | jondurbin/bagel-dpo-34b-v0.5 | 72.15 | — | Imported | 2026-05-06 |
| 105 | allenai/tulu-2-dpo-7b | 72.12 | — | Imported | 2026-05-06 |
| 106 | prometheus-eval/prometheus-7b-v2.0 | 72.04 | — | Imported | 2026-05-06 |
| 107 | stabilityai/stablelm-zephyr-3b | 71.46 | — | Imported | 2026-05-06 |
| 108 | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | 71.38 | — | Imported | 2026-05-06 |
| 109 | ai2/tulu-2-7b-rm-v0-nectar-binarized-700k.json | 71.27 | — | Imported | 2026-05-06 |
| 110 | berkeley-nest/Starling-RM-7B-alpha | 71.13 | — | Imported | 2026-05-06 |
| 111 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 70.58 | — | Imported | 2026-05-06 |
| 112 | CohereForAI/c4ai-command-r-plus | 70.57 | — | Imported | 2026-05-06 |
| 113 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 70.19 | — | Imported | 2026-05-06 |
| 114 | allenai/llama-3-tulu-2-70b-uf-mean-rm | 70.19 | — | Imported | 2026-05-06 |
| 115 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 70.08 | — | Imported | 2026-05-06 |
| 116 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 70.04 | — | Imported | 2026-05-06 |
| 117 | weqweasdas/RM-Gemma-7B | 69.67 | — | Imported | 2026-05-06 |
| 118 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 69.45 | — | Imported | 2026-05-06 |
| 119 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 69.24 | — | Imported | 2026-05-06 |
| 120 | weqweasdas/RM-Gemma-7B-4096 | 69.22 | — | Imported | 2026-05-06 |
| 121 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 69.05 | — | Imported | 2026-05-06 |
| 122 | openbmb/UltraRM-13b | 69.03 | — | Imported | 2026-05-06 |
| 123 | OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5 | 69.01 | — | Imported | 2026-05-06 |
| 124 | openbmb/Eurus-7b-kto | 69.00 | — | Imported | 2026-05-06 |
| 125 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 68.95 | — | Imported | 2026-05-06 |
| 126 | Qwen/Qwen1.5-14B-Chat | 68.64 | — | Imported | 2026-05-06 |
| 127 | ai2/tulu-2-7b-rm-v0-nectar-binarized-3.8m-check... | 68.08 | — | Imported | 2026-05-06 |
| 128 | RLHFlow/LLaMA3-iterative-DPO-final | 67.83 | — | Imported | 2026-05-06 |
| 129 | HuggingFaceH4/zephyr-7b-gemma-v0.1 | 67.58 | — | Imported | 2026-05-06 |
| 130 | ai2/tulu-2-7b-rm-v0-nectar-binarized.json | 67.56 | — | Imported | 2026-05-06 |
| 131 | Qwen/Qwen1.5-7B-Chat | 67.50 | — | Imported | 2026-05-06 |
| 132 | openbmb/MiniCPM-2B-dpo-fp32 | 67.30 | — | Imported | 2026-05-06 |
| 133 | mightbe/Better-PairRM | 67.30 | — | Imported | 2026-05-06 |
| 134 | allenai/OLMo-7B-Instruct | 67.27 | — | Imported | 2026-05-06 |
| 135 | Qwen/Qwen1.5-72B-Chat | 67.23 | — | Imported | 2026-05-06 |
| 136 | ai2/tulu-2-7b-rm-v0.json | 66.55 | — | Imported | 2026-05-06 |
| 137 | Qwen/Qwen1.5-MoE-A2.7B-Chat | 66.44 | — | Imported | 2026-05-06 |
| 138 | RLHFlow/RewardModel-Mistral-7B-for-DPA-v1 | 66.33 | — | Imported | 2026-05-06 |
| 139 | stabilityai/stablelm-2-zephyr-1_6b | 65.74 | — | Imported | 2026-05-06 |
| 140 | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 65.65 | — | Imported | 2026-05-06 |
| 141 | weqweasdas/RM-Gemma-2B | 65.49 | — | Imported | 2026-05-06 |
| 142 | openai/gpt-3.5-turbo-0125 | 65.34 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 143 | allenai/tulu-v2.5-70b-preference-mix-rm | 65.16 | — | Imported | 2026-05-06 |
| 144 | wenbopan/Faro-Yi-9B-DPO | 64.61 | — | Imported | 2026-05-06 |
| 145 | meta-llama/Meta-Llama-3-8B-Instruct | 64.50 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 146 | ai2/llama-2-chat-ultrafeedback-60k.jsonl | 64.40 | — | Imported | 2026-05-06 |
| 147 | IDEA-CCNL/Ziya-LLaMA-7B-Reward | 63.78 | — | Imported | 2026-05-06 |
| 148 | PKU-Alignment/beaver-7b-v2.0-reward | 63.66 | — | Imported | 2026-05-06 |
| 149 | stabilityai/stable-code-instruct-3b | 62.16 | — | Imported | 2026-05-06 |
| 150 | OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1 | 61.50 | — | Imported | 2026-05-06 |
| 151 | OpenAssistant/reward-model-deberta-v3-large-v2 | 61.26 | — | Imported | 2026-05-06 |
| 152 | llm-blender/PairRM-hf | 60.87 | — | Imported | 2026-05-06 |
| 153 | PKU-Alignment/beaver-7b-v2.0-cost | 59.57 | — | Imported | 2026-05-06 |
| 154 | ContextualAI/archangel_sft-kto_llama13b | 59.52 | — | Imported | 2026-05-06 |
| 155 | ContextualAI/archangel_sft-kto_llama30b | 59.01 | — | Imported | 2026-05-06 |
| 156 | Qwen/Qwen1.5-1.8B-Chat | 58.90 | — | Imported | 2026-05-06 |
| 157 | ai2/llama-2-chat-7b-nectar-3.8m.json | 58.43 | — | Imported | 2026-05-06 |
| 158 | PKU-Alignment/beaver-7b-v1.0-cost | 57.98 | — | Imported | 2026-05-06 |
| 159 | ContextualAI/archangel_sft-dpo_llama30b | 56.18 | — | Imported | 2026-05-06 |
| 160 | ContextualAI/archangel_sft-kto_pythia1-4b | 55.81 | — | Imported | 2026-05-06 |
| 161 | ContextualAI/archangel_sft-kto_pythia6-9b | 55.61 | — | Imported | 2026-05-06 |
| 162 | ContextualAI/archangel_sft-kto_pythia2-8b | 54.97 | — | Imported | 2026-05-06 |
| 163 | Qwen/Qwen1.5-4B-Chat | 54.77 | — | Imported | 2026-05-06 |
| 164 | ContextualAI/archangel_sft-dpo_llama13b | 54.00 | — | Imported | 2026-05-06 |
| 165 | ContextualAI/archangel_sft-kto_llama7b | 53.88 | — | Imported | 2026-05-06 |
| 166 | ContextualAI/archangel_sft-dpo_llama7b | 53.04 | — | Imported | 2026-05-06 |
| 167 | Qwen/Qwen1.5-0.5B-Chat | 52.98 | — | Imported | 2026-05-06 |
| 168 | ContextualAI/archangel_sft-dpo_pythia2-8b | 52.86 | — | Imported | 2026-05-06 |
| 169 | my_model/ | 52.67 | — | Imported | 2026-05-06 |
| 170 | ContextualAI/archangel_sft-dpo_pythia6-9b | 52.63 | — | Imported | 2026-05-06 |
| 171 | ai2/llama-2-chat-nectar-180k.json | 52.35 | — | Imported | 2026-05-06 |
| 172 | ContextualAI/archangel_sft-dpo_pythia1-4b | 52.33 | — | Imported | 2026-05-06 |
| 173 | stanfordnlp/SteamSHP-flan-t5-xl | 51.35 | — | Imported | 2026-05-06 |
| 174 | SultanR/SmolTulu-1.7b-RM | 50.94 | — | Imported | 2026-05-06 |
| 175 | ContextualAI/archangel_sft-kto_pythia12-0b | 50.53 | — | Imported | 2026-05-06 |
| 176 | weqweasdas/hh_rlhf_rm_open_llama_3b | 50.27 | — | Imported | 2026-05-06 |
| 177 | ContextualAI/archangel_sft-dpo_pythia12-0b | 50.09 | — | Imported | 2026-05-06 |
| 178 | random | 50 | — | Imported | 2026-05-06 |
| 179 | stanfordnlp/SteamSHP-flan-t5-large | 49.62 | — | Imported | 2026-05-06 |
| 180 | allenai/tulu-v2.5-13b-uf-rm | 48.06 | — | Imported | 2026-05-06 |
| 181 | PKU-Alignment/beaver-7b-v1.0-reward | 47.27 | — | Imported | 2026-05-06 |
| 182 | allenai/Llama-3.1-70B-Instruct-RM-RB2 | 90.21 | — | Imported | 2026-05-06 |
| 183 | allenai/Llama-3.1-8B-Instruct-RM-RB2 | 88.85 | — | Imported | 2026-05-06 |
| 184 | allenai/Llama-3.1-8B-Base-RM-RB2 | 84.63 | — | Imported | 2026-05-06 |
| 185 | allenai/Llama-3.1-Tulu-3-8B-SFT-RM-RB2 | 85.51 | — | Imported | 2026-05-06 |
| 186 | allenai/Llama-3.1-Tulu-3-8B-DPO-RM-RB2 | 84.31 | — | Imported | 2026-05-06 |
| 187 | allenai/Llama-3.1-Tulu-3-8B-RL-RM-RB2 | 83.69 | — | Imported | 2026-05-06 |
| 188 | allenai/Llama-3.1-Tulu-3-70B-SFT-RM-RB2 | 88.92 | — | Imported | 2026-05-06 |
No matching rows.