PandaBench
Comprehensive LLM safety benchmark for jailbreak attacks, defense mechanisms, judges, and safety-capability tradeoffs, aggregating attack success rates and AlpacaEval capability scores by model and defense method.
490rows
robustness_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Robustness Score, Mean Attack Success Rate (lower is better), GCG Attack Success Rate (lower is better), PAIR GPT-4o Judge ASR (lower is better), PAIR Qwen Judge ASR (lower is better), PAIR Llama Judge ASR (lower is better), AlpacaEval Win Rate, AlpacaEval LC Win Rate, Aggregated Rows
| Rank | Subject | Robustness Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude-3-5-sonnet + ICL | 98.25 | — | Imported | 2026-05-06 |
| 2 | Claude-3-5-sonnet + SelfReminder | 98.09 | — | Imported | 2026-05-06 |
| 3 | Claude-3-5-haiku + SelfReminder | 97.99 | — | Imported | 2026-05-06 |
| 4 | Claude-3-5-sonnet + GoalPriority | 97.70 | — | Imported | 2026-05-06 |
| 5 | Claude-3-5-sonnet + SelfDefense | 97.65 | — | Imported | 2026-05-06 |
| 6 | Qwen3-30B-A3B + SelfDefense | 96.68 | — | Imported | 2026-05-06 |
| 7 | GPT-4o-11-20 + SelfReminder | 96.67 | — | Imported | 2026-05-06 |
| 8 | Claude-3-5-sonnet + SmoothLLM | 96.39 | — | Imported | 2026-05-06 |
| 9 | Claude-3-5-haiku + GoalPriority | 96.35 | — | Imported | 2026-05-06 |
| 10 | Claude-3-5-haiku + ICL | 96.24 | — | Imported | 2026-05-06 |
| 11 | Qwen3-14B + SelfDefense | 96.18 | — | Imported | 2026-05-06 |
| 12 | GPT-4o-11-20 + ICL | 96.11 | — | Imported | 2026-05-06 |
| 13 | Claude-3-5-haiku + SelfDefense | 96.10 | — | Imported | 2026-05-06 |
| 14 | Llama-3.2-3B + SelfDefense | 96 | — | Imported | 2026-05-06 |
| 15 | Claude-3-7-sonnet + SelfDefense | 95.94 | — | Imported | 2026-05-06 |
| 16 | o3-mini + SelfDefense | 95.93 | — | Imported | 2026-05-06 |
| 17 | Qwen3-8B + SelfDefense | 95.80 | — | Imported | 2026-05-06 |
| 18 | Doubao-pro + SelfDefense | 95.77 | — | Imported | 2026-05-06 |
| 19 | Qwen3-30B-A3B + GoalPriority | 95.73 | — | Imported | 2026-05-06 |
| 20 | Claude-3-5-sonnet + Baseline | 95.73 | — | Imported | 2026-05-06 |
| 21 | Qwen3-32B + SelfDefense | 95.68 | — | Imported | 2026-05-06 |
| 22 | GPT-4o-11-20 + SelfDefense | 95.66 | — | Imported | 2026-05-06 |
| 23 | Claude-3-5-sonnet + PerplexityFilter | 95.56 | — | Imported | 2026-05-06 |
| 24 | Qwen2-72B + SelfDefense | 95.49 | — | Imported | 2026-05-06 |
| 25 | Llama-3.1-Tulu-3-70B + SelfDefense | 95.44 | — | Imported | 2026-05-06 |
| 26 | GPT-4o-11-20 + GoalPriority | 95.35 | — | Imported | 2026-05-06 |
| 27 | Doubao-lite + SelfDefense | 95.23 | — | Imported | 2026-05-06 |
| 28 | Claude-3-7-sonnet + SelfReminder | 95.20 | — | Imported | 2026-05-06 |
| 29 | Claude-3-5-haiku + RPO | 95.16 | — | Imported | 2026-05-06 |
| 30 | Doubao-pro + GoalPriority | 95.14 | — | Imported | 2026-05-06 |
| 31 | Qwen3-14B + GoalPriority | 95.13 | — | Imported | 2026-05-06 |
| 32 | Claude-3-5-sonnet + RPO | 95.02 | — | Imported | 2026-05-06 |
| 33 | Doubao-pro + SelfReminder | 95.01 | — | Imported | 2026-05-06 |
| 34 | Qwen3-30B-A3B + SelfReminder | 94.99 | — | Imported | 2026-05-06 |
| 35 | DS-Llama-70b + SelfDefense | 94.91 | — | Imported | 2026-05-06 |
| 36 | Qwen3-8B + GoalPriority | 94.85 | — | Imported | 2026-05-06 |
| 37 | Claude-3-7-sonnet + GoalPriority | 94.76 | — | Imported | 2026-05-06 |
| 38 | Doubao-1.5-pro + SelfReminder | 94.72 | — | Imported | 2026-05-06 |
| 39 | Doubao-lite + GoalPriority | 94.67 | — | Imported | 2026-05-06 |
| 40 | Llama-3.2-1B + SelfDefense | 94.66 | — | Imported | 2026-05-06 |
| 41 | GPT-4o-08-06 + SelfDefense | 94.43 | — | Imported | 2026-05-06 |
| 42 | o3-mini + GoalPriority | 94.31 | — | Imported | 2026-05-06 |
| 43 | Claude-3-7-sonnet + RPO | 94.30 | — | Imported | 2026-05-06 |
| 44 | Doubao-lite + SelfReminder | 94.26 | — | Imported | 2026-05-06 |
| 45 | Qwen3-8B + SelfReminder | 94.05 | — | Imported | 2026-05-06 |
| 46 | Claude-3-7-sonnet + ICL | 93.94 | — | Imported | 2026-05-06 |
| 47 | Qwen2-72B + SelfReminder | 93.90 | — | Imported | 2026-05-06 |
| 48 | o3-mini + SelfReminder | 93.89 | — | Imported | 2026-05-06 |
| 49 | Llama-3.1-Tulu-3-8B + SelfDefense | 93.85 | — | Imported | 2026-05-06 |
| 50 | Qwen3-4B + SelfDefense | 93.81 | — | Imported | 2026-05-06 |
| 51 | DS-Llama-70b + GoalPriority | 93.75 | — | Imported | 2026-05-06 |
| 52 | GPT-4o-mini + SelfDefense | 93.70 | — | Imported | 2026-05-06 |
| 53 | Qwen3-32B + GoalPriority | 93.69 | — | Imported | 2026-05-06 |
| 54 | Qwen3-14B + SelfReminder | 93.66 | — | Imported | 2026-05-06 |
| 55 | Claude-3-5-haiku + SmoothLLM | 93.63 | — | Imported | 2026-05-06 |
| 56 | GPT-4o-08-06 + SelfReminder | 93.61 | — | Imported | 2026-05-06 |
| 57 | Doubao-lite + ICL | 93.57 | — | Imported | 2026-05-06 |
| 58 | Llama-3.2-1B + GoalPriority | 93.57 | — | Imported | 2026-05-06 |
| 59 | Qwen3-32B + SelfReminder | 93.53 | — | Imported | 2026-05-06 |
| 60 | Gemini-2.0-flash + GoalPriority | 93.47 | — | Imported | 2026-05-06 |
| 61 | o3-mini + ICL | 93.44 | — | Imported | 2026-05-06 |
| 62 | Doubao-1.5-pro + SelfDefense | 93.34 | — | Imported | 2026-05-06 |
| 63 | Llama-3.2-3B + GoalPriority | 93.32 | — | Imported | 2026-05-06 |
| 64 | Doubao-1.5-pro + GoalPriority | 93.26 | — | Imported | 2026-05-06 |
| 65 | Claude-3-5-haiku + Baseline | 93.23 | — | Imported | 2026-05-06 |
| 66 | Claude-3-5-haiku + PerplexityFilter | 93.16 | — | Imported | 2026-05-06 |
| 67 | Llama-3.2-3B + SelfReminder | 93.09 | — | Imported | 2026-05-06 |
| 68 | Qwen2-72B + GoalPriority | 92.95 | — | Imported | 2026-05-06 |
| 69 | Llama-3.1-Tulu-3-70B + GoalPriority | 92.92 | — | Imported | 2026-05-06 |
| 70 | Gemini-2.0-flash + SelfDefense | 92.84 | — | Imported | 2026-05-06 |
| 71 | GPT-4o-mini + GoalPriority | 92.83 | — | Imported | 2026-05-06 |
| 72 | Gemma-2-2b-it + SelfDefense | 92.80 | — | Imported | 2026-05-06 |
| 73 | GPT-4o-08-06 + GoalPriority | 92.75 | — | Imported | 2026-05-06 |
| 74 | Doubao-1.5-lite + SelfDefense | 92.69 | — | Imported | 2026-05-06 |
| 75 | DS-v3-0324 + SelfReminder | 92.60 | — | Imported | 2026-05-06 |
| 76 | Qwen3-1.7B + SelfDefense | 92.48 | — | Imported | 2026-05-06 |
| 77 | Llama-3.1-Tulu-3-70B + SelfReminder | 92.39 | — | Imported | 2026-05-06 |
| 78 | Doubao-1.5-lite + SelfReminder | 92.28 | — | Imported | 2026-05-06 |
| 79 | Llama-3-1-405B + SelfDefense | 92.26 | — | Imported | 2026-05-06 |
| 80 | Qwen2.5-14B + SelfDefense | 92.23 | — | Imported | 2026-05-06 |
| 81 | Doubao-pro + ICL | 92.17 | — | Imported | 2026-05-06 |
| 82 | Llama-3.2-3B + RPO | 92.15 | — | Imported | 2026-05-06 |
| 83 | Qwen2.5-72B + SelfDefense | 92.08 | — | Imported | 2026-05-06 |
| 84 | Qwen2.5-32B + SelfDefense | 92.08 | — | Imported | 2026-05-06 |
| 85 | Llama-3-1-405B + SelfReminder | 92.07 | — | Imported | 2026-05-06 |
| 86 | Kimi-latest + SelfDefense | 92.02 | — | Imported | 2026-05-06 |
| 87 | o3-mini + PerplexityFilter | 91.97 | — | Imported | 2026-05-06 |
| 88 | Llama-3.1-8B + SelfDefense | 91.95 | — | Imported | 2026-05-06 |
| 89 | o3-mini + RPO | 91.94 | — | Imported | 2026-05-06 |
| 90 | DS-Llama-70b + ICL | 91.84 | — | Imported | 2026-05-06 |
| 91 | Claude-3-7-sonnet + SmoothLLM | 91.78 | — | Imported | 2026-05-06 |
| 92 | GPT-4o-mini + SelfReminder | 91.77 | — | Imported | 2026-05-06 |
| 93 | Gemini-2.0-flash + SelfReminder | 91.77 | — | Imported | 2026-05-06 |
| 94 | DS-v3-0324 + SelfDefense | 91.72 | — | Imported | 2026-05-06 |
| 95 | Phi-3-mini + SelfDefense | 91.69 | — | Imported | 2026-05-06 |
| 96 | Doubao-pro + PerplexityFilter | 91.68 | — | Imported | 2026-05-06 |
| 97 | Qwen2-7B + SelfDefense | 91.68 | — | Imported | 2026-05-06 |
| 98 | Llama-3.2-1B + RPO | 91.63 | — | Imported | 2026-05-06 |
| 99 | o3-mini + Baseline | 91.60 | — | Imported | 2026-05-06 |
| 100 | Phi-3-5-MoE + SelfDefense | 91.59 | — | Imported | 2026-05-06 |
| 101 | Qwen3-4B + GoalPriority | 91.52 | — | Imported | 2026-05-06 |
| 102 | GPT-4o-11-20 + RPO | 91.52 | — | Imported | 2026-05-06 |
| 103 | Doubao-lite + SmoothLLM | 91.39 | — | Imported | 2026-05-06 |
| 104 | GLM-4-plus + SelfDefense | 91.38 | — | Imported | 2026-05-06 |
| 105 | Doubao-pro + Baseline | 91.38 | — | Imported | 2026-05-06 |
| 106 | Doubao-lite + PerplexityFilter | 91.34 | — | Imported | 2026-05-06 |
| 107 | Llama-3.1-70B + SelfDefense | 91.33 | — | Imported | 2026-05-06 |
| 108 | Qwen3-30B-A3B + ICL | 91.31 | — | Imported | 2026-05-06 |
| 109 | GPT-4o-11-20 + PerplexityFilter | 91.19 | — | Imported | 2026-05-06 |
| 110 | Llama-3.2-3B + ICL | 91.09 | — | Imported | 2026-05-06 |
| 111 | Doubao-1.5-lite + GoalPriority | 91.08 | — | Imported | 2026-05-06 |
| 112 | GPT-4o-mini + ICL | 91.06 | — | Imported | 2026-05-06 |
| 113 | Llama-3.2-1B + SelfReminder | 91.06 | — | Imported | 2026-05-06 |
| 114 | Llama-3.2-3B + PerplexityFilter | 91.02 | — | Imported | 2026-05-06 |
| 115 | Doubao-lite + Baseline | 91.02 | — | Imported | 2026-05-06 |
| 116 | Llama-3-1-405B + GoalPriority | 90.95 | — | Imported | 2026-05-06 |
| 117 | GPT-4o-08-06 + ICL | 90.93 | — | Imported | 2026-05-06 |
| 118 | GPT-4o-11-20 + Baseline | 90.91 | — | Imported | 2026-05-06 |
| 119 | Qwen3-4B + SelfReminder | 90.90 | — | Imported | 2026-05-06 |
| 120 | Llama-3.1-70B + GoalPriority | 90.76 | — | Imported | 2026-05-06 |
| 121 | Qwen2-72B + ICL | 90.74 | — | Imported | 2026-05-06 |
| 122 | Doubao-pro + RPO | 90.55 | — | Imported | 2026-05-06 |
| 123 | Qwen2.5-7B + SelfDefense | 90.51 | — | Imported | 2026-05-06 |
| 124 | GPT-4o-08-06 + RPO | 90.47 | — | Imported | 2026-05-06 |
| 125 | DS-v3-0324 + GoalPriority | 90.33 | — | Imported | 2026-05-06 |
| 126 | o3-mini + SmoothLLM | 90.30 | — | Imported | 2026-05-06 |
| 127 | Llama-3.3-70B + SelfDefense | 90.22 | — | Imported | 2026-05-06 |
| 128 | Claude-3-7-sonnet + Baseline | 90.17 | — | Imported | 2026-05-06 |
| 129 | Qwen2.5-1.5B + SelfDefense | 90.15 | — | Imported | 2026-05-06 |
| 130 | Qwen2-72B + SmoothLLM | 90.15 | — | Imported | 2026-05-06 |
| 131 | Doubao-1.5-lite + ICL | 90.14 | — | Imported | 2026-05-06 |
| 132 | Llama-3.1-Tulu-3-70B + ICL | 89.97 | — | Imported | 2026-05-06 |
| 133 | Gemini-2.0-flash-lite + SelfDefense | 89.95 | — | Imported | 2026-05-06 |
| 134 | DS-r1 + SelfDefense | 89.85 | — | Imported | 2026-05-06 |
| 135 | Qwen2.5-3B + SelfDefense | 89.82 | — | Imported | 2026-05-06 |
| 136 | Claude-3-7-sonnet + PerplexityFilter | 89.78 | — | Imported | 2026-05-06 |
| 137 | Phi-3-5-MoE + ICL | 89.77 | — | Imported | 2026-05-06 |
| 138 | GPT-4o-11-20 + SmoothLLM | 89.72 | — | Imported | 2026-05-06 |
| 139 | Kimi-latest + SelfReminder | 89.66 | — | Imported | 2026-05-06 |
| 140 | Qwen3-14B + ICL | 89.65 | — | Imported | 2026-05-06 |
| 141 | Kimi-latest + GoalPriority | 89.64 | — | Imported | 2026-05-06 |
| 142 | Qwen3-8B + ICL | 89.52 | — | Imported | 2026-05-06 |
| 143 | Gemini-2.0-pro + SelfReminder | 89.51 | — | Imported | 2026-05-06 |
| 144 | Llama-3.1-70B + SelfReminder | 89.47 | — | Imported | 2026-05-06 |
| 145 | GPT-4o-11-20 + Semantic SmoothLLM | 89.35 | — | Imported | 2026-05-06 |
| 146 | Doubao-1.5-pro + ICL | 89.34 | — | Imported | 2026-05-06 |
| 147 | Doubao-pro + SmoothLLM | 89.27 | — | Imported | 2026-05-06 |
| 148 | Llama-3.2-3B + SmoothLLM | 89.24 | — | Imported | 2026-05-06 |
| 149 | DS-v3 + SelfDefense | 89.23 | — | Imported | 2026-05-06 |
| 150 | GLM-4-flash + SelfDefense | 89.20 | — | Imported | 2026-05-06 |
| 151 | Claude-3-5-haiku + Paraphrase | 89.17 | — | Imported | 2026-05-06 |
| 152 | GPT-4o-mini + RPO | 89.13 | — | Imported | 2026-05-06 |
| 153 | Qwen3-30B-A3B + Paraphrase | 88.95 | — | Imported | 2026-05-06 |
| 154 | Kimi-latest + PerplexityFilter | 88.94 | — | Imported | 2026-05-06 |
| 155 | Qwen2-72B + Baseline | 88.92 | — | Imported | 2026-05-06 |
| 156 | Gemini-2.0-pro + SelfDefense | 88.89 | — | Imported | 2026-05-06 |
| 157 | Llama-3.1-8B + SelfReminder | 88.74 | — | Imported | 2026-05-06 |
| 158 | Llama-3.1-Tulu-3-70B + RPO | 88.73 | — | Imported | 2026-05-06 |
| 159 | Phi-3-5-MoE + Baseline | 88.72 | — | Imported | 2026-05-06 |
| 160 | GLM-4-plus + SelfReminder | 88.69 | — | Imported | 2026-05-06 |
| 161 | Llama-3-1-405B + ICL | 88.67 | — | Imported | 2026-05-06 |
| 162 | Llama-3.2-1B + ICL | 88.67 | — | Imported | 2026-05-06 |
| 163 | Qwen2-72B + PerplexityFilter | 88.57 | — | Imported | 2026-05-06 |
| 164 | Phi-3-5-MoE + GoalPriority | 88.55 | — | Imported | 2026-05-06 |
| 165 | Gemini-2.0-pro + GoalPriority | 88.55 | — | Imported | 2026-05-06 |
| 166 | Qwen2.5-72B + GoalPriority | 88.54 | — | Imported | 2026-05-06 |
| 167 | Qwen2.5-72B + SelfReminder | 88.50 | — | Imported | 2026-05-06 |
| 168 | Qwen2-72B + RPO | 88.47 | — | Imported | 2026-05-06 |
| 169 | GPT-4o-mini + PerplexityFilter | 88.47 | — | Imported | 2026-05-06 |
| 170 | Qwen2.5-0.5B + SelfDefense | 88.42 | — | Imported | 2026-05-06 |
| 171 | Phi-3-mini + SelfReminder | 88.38 | — | Imported | 2026-05-06 |
| 172 | GPT-4o-mini + SmoothLLM | 88.37 | — | Imported | 2026-05-06 |
| 173 | Llama-3.1-Tulu-3-8B + GoalPriority | 88.31 | — | Imported | 2026-05-06 |
| 174 | Qwen3-32B + ICL | 88.23 | — | Imported | 2026-05-06 |
| 175 | Qwen3-0.6B + SelfDefense | 88.23 | — | Imported | 2026-05-06 |
| 176 | Phi-3-mini + GoalPriority | 88.21 | — | Imported | 2026-05-06 |
| 177 | Qwen3-8B + Paraphrase | 88.19 | — | Imported | 2026-05-06 |
| 178 | Llama-3.2-1B + SmoothLLM | 88.14 | — | Imported | 2026-05-06 |
| 179 | GLM-4-plus + GoalPriority | 88.12 | — | Imported | 2026-05-06 |
| 180 | Doubao-lite + RPO | 88.08 | — | Imported | 2026-05-06 |
| 181 | GPT-4o-08-06 + SmoothLLM | 88.05 | — | Imported | 2026-05-06 |
| 182 | Llama-3.1-8B + GoalPriority | 88.02 | — | Imported | 2026-05-06 |
| 183 | GPT-4o-08-06 + PerplexityFilter | 87.88 | — | Imported | 2026-05-06 |
| 184 | Qwen3-14B + Paraphrase | 87.69 | — | Imported | 2026-05-06 |
| 185 | Llama-3.2-1B + PerplexityFilter | 87.64 | — | Imported | 2026-05-06 |
| 186 | Doubao-lite + Paraphrase | 87.64 | — | Imported | 2026-05-06 |
| 187 | GPT-4o-mini + Semantic SmoothLLM | 87.58 | — | Imported | 2026-05-06 |
| 188 | Phi-3-5-MoE + Semantic SmoothLLM | 87.53 | — | Imported | 2026-05-06 |
| 189 | GPT-4o-mini + Baseline | 87.52 | — | Imported | 2026-05-06 |
| 190 | Llama-3.1-Tulu-3-8B + SelfReminder | 87.39 | — | Imported | 2026-05-06 |
| 191 | GPT-4o-08-06 + Baseline | 87.38 | — | Imported | 2026-05-06 |
| 192 | Qwen3-4B + Paraphrase | 87.27 | — | Imported | 2026-05-06 |
| 193 | Doubao-1.5-pro + Baseline | 87.24 | — | Imported | 2026-05-06 |
| 194 | Phi-3-5-MoE + PerplexityFilter | 87.22 | — | Imported | 2026-05-06 |
| 195 | GPT-4o-08-06 + Semantic SmoothLLM | 87.14 | — | Imported | 2026-05-06 |
| 196 | Llama-3.1-Tulu-3-70B + Baseline | 87.14 | — | Imported | 2026-05-06 |
| 197 | DS-2-1212 + SelfDefense | 87.14 | — | Imported | 2026-05-06 |
| 198 | Claude-3-5-sonnet + Paraphrase | 87.13 | — | Imported | 2026-05-06 |
| 199 | DS-v3-0324 + ICL | 87.11 | — | Imported | 2026-05-06 |
| 200 | Phi-3-5-MoE + SelfReminder | 87.10 | — | Imported | 2026-05-06 |
| 201 | Phi-3-5-MoE + RPO | 87.07 | — | Imported | 2026-05-06 |
| 202 | Doubao-1.5-pro + PerplexityFilter | 87.06 | — | Imported | 2026-05-06 |
| 203 | Llama-3.1-Tulu-3-70B + PerplexityFilter | 87.03 | — | Imported | 2026-05-06 |
| 204 | Doubao-1.5-lite + PerplexityFilter | 87 | — | Imported | 2026-05-06 |
| 205 | Qwen2.5-72B + ICL | 86.95 | — | Imported | 2026-05-06 |
| 206 | Qwen3-32B + Paraphrase | 86.94 | — | Imported | 2026-05-06 |
| 207 | Qwen3-30B-A3B + SmoothLLM | 86.92 | — | Imported | 2026-05-06 |
| 208 | Llama-3.2-3B + Baseline | 86.90 | — | Imported | 2026-05-06 |
| 209 | Gemini-2.0-flash-lite + GoalPriority | 86.83 | — | Imported | 2026-05-06 |
| 210 | Qwen2.5-14B + GoalPriority | 86.67 | — | Imported | 2026-05-06 |
| 211 | Doubao-1.5-lite + Baseline | 86.60 | — | Imported | 2026-05-06 |
| 212 | Qwen2.5-72B + SmoothLLM | 86.40 | — | Imported | 2026-05-06 |
| 213 | Doubao-1.5-pro + RPO | 86.36 | — | Imported | 2026-05-06 |
| 214 | Phi-3-mini + ICL | 86.36 | — | Imported | 2026-05-06 |
| 215 | Qwen3-30B-A3B + RPO | 86.30 | — | Imported | 2026-05-06 |
| 216 | Phi-3-5-MoE + SmoothLLM | 86.19 | — | Imported | 2026-05-06 |
| 217 | Llama-3.1-Tulu-3-70B + Paraphrase | 86.03 | — | Imported | 2026-05-06 |
| 218 | DS-v3 + GoalPriority | 86.02 | — | Imported | 2026-05-06 |
| 219 | Qwen3-8B + Semantic SmoothLLM | 85.99 | — | Imported | 2026-05-06 |
| 220 | Llama-3.2-1B + Baseline | 85.81 | — | Imported | 2026-05-06 |
| 221 | Doubao-1.5-lite + RPO | 85.80 | — | Imported | 2026-05-06 |
| 222 | DS-Llama-70b + SelfReminder | 85.72 | — | Imported | 2026-05-06 |
| 223 | Qwen2.5-14B + SelfReminder | 85.67 | — | Imported | 2026-05-06 |
| 224 | Gemma-2-2b-it + SmoothLLM | 85.52 | — | Imported | 2026-05-06 |
| 225 | Qwen2.5-32B + SelfReminder | 85.51 | — | Imported | 2026-05-06 |
| 226 | Qwen3-1.7B + Paraphrase | 85.48 | — | Imported | 2026-05-06 |
| 227 | Qwen2.5-72B + PerplexityFilter | 85.48 | — | Imported | 2026-05-06 |
| 228 | Llama-3.1-Tulu-3-8B + RPO | 85.43 | — | Imported | 2026-05-06 |
| 229 | Qwen2.5-32B + GoalPriority | 85.42 | — | Imported | 2026-05-06 |
| 230 | Qwen3-30B-A3B + Semantic SmoothLLM | 85.20 | — | Imported | 2026-05-06 |
| 231 | Qwen2.5-72B + Baseline | 85.18 | — | Imported | 2026-05-06 |
| 232 | Qwen2.5-72B + RPO | 85.17 | — | Imported | 2026-05-06 |
| 233 | Qwen2.5-14B + SmoothLLM | 85.10 | — | Imported | 2026-05-06 |
| 234 | Phi-3-mini + Baseline | 85.04 | — | Imported | 2026-05-06 |
| 235 | Phi-3-mini + Paraphrase | 85.02 | — | Imported | 2026-05-06 |
| 236 | Llama-3.1-Tulu-3-70B + SmoothLLM | 85 | — | Imported | 2026-05-06 |
| 237 | Qwen3-4B + ICL | 84.92 | — | Imported | 2026-05-06 |
| 238 | Claude-3-7-sonnet + Paraphrase | 84.90 | — | Imported | 2026-05-06 |
| 239 | Kimi-latest + ICL | 84.82 | — | Imported | 2026-05-06 |
| 240 | Phi-3-mini + SmoothLLM | 84.77 | — | Imported | 2026-05-06 |
| 241 | Llama-3.1-70B + RPO | 84.74 | — | Imported | 2026-05-06 |
| 242 | Phi-3-mini + RPO | 84.61 | — | Imported | 2026-05-06 |
| 243 | Qwen2.5-14B + RPO | 84.58 | — | Imported | 2026-05-06 |
| 244 | Qwen3-32B + SmoothLLM | 84.52 | — | Imported | 2026-05-06 |
| 245 | Doubao-1.5-pro + SmoothLLM | 84.52 | — | Imported | 2026-05-06 |
| 246 | Qwen3-30B-A3B + Baseline | 84.44 | — | Imported | 2026-05-06 |
| 247 | Phi-3-mini + PerplexityFilter | 84.44 | — | Imported | 2026-05-06 |
| 248 | Qwen3-30B-A3B + PerplexityFilter | 84.39 | — | Imported | 2026-05-06 |
| 249 | Llama-3.1-Tulu-3-8B + Semantic SmoothLLM | 84.24 | — | Imported | 2026-05-06 |
| 250 | Qwen2.5-1.5B + SelfReminder | 84.23 | — | Imported | 2026-05-06 |
| 251 | Phi-3-mini + Semantic SmoothLLM | 84.23 | — | Imported | 2026-05-06 |
| 252 | Qwen2.5-14B + PerplexityFilter | 84.21 | — | Imported | 2026-05-06 |
| 253 | Qwen2.5-14B + ICL | 84.20 | — | Imported | 2026-05-06 |
| 254 | Llama-3-1-405B + PerplexityFilter | 84.17 | — | Imported | 2026-05-06 |
| 255 | Llama-3.1-Tulu-3-8B + ICL | 84.17 | — | Imported | 2026-05-06 |
| 256 | Qwen3-14B + SmoothLLM | 84.17 | — | Imported | 2026-05-06 |
| 257 | Llama-3.1-Tulu-3-8B + Paraphrase | 84.16 | — | Imported | 2026-05-06 |
| 258 | GLM-4-plus + ICL | 84.15 | — | Imported | 2026-05-06 |
| 259 | Qwen2-72B + Semantic SmoothLLM | 84.14 | — | Imported | 2026-05-06 |
| 260 | Llama-3.1-70B + ICL | 84.10 | — | Imported | 2026-05-06 |
| 261 | GLM-4-plus + RPO | 84.02 | — | Imported | 2026-05-06 |
| 262 | DS-v3 + SelfReminder | 83.98 | — | Imported | 2026-05-06 |
| 263 | Qwen2.5-1.5B + Paraphrase | 83.88 | — | Imported | 2026-05-06 |
| 264 | Qwen3-14B + PerplexityFilter | 83.86 | — | Imported | 2026-05-06 |
| 265 | Gemini-2.0-flash-lite + SelfReminder | 83.81 | — | Imported | 2026-05-06 |
| 266 | DS-Llama-70b + PerplexityFilter | 83.78 | — | Imported | 2026-05-06 |
| 267 | Qwen2.5-14B + Baseline | 83.77 | — | Imported | 2026-05-06 |
| 268 | Gemma-2-2b-it + GoalPriority | 83.74 | — | Imported | 2026-05-06 |
| 269 | DS-Llama-70b + Paraphrase | 83.67 | — | Imported | 2026-05-06 |
| 270 | Llama-3.1-Tulu-3-8B + PerplexityFilter | 83.67 | — | Imported | 2026-05-06 |
| 271 | DS-r1 + GoalPriority | 83.65 | — | Imported | 2026-05-06 |
| 272 | Qwen3-14B + RPO | 83.59 | — | Imported | 2026-05-06 |
| 273 | Qwen2-7B + GoalPriority | 83.50 | — | Imported | 2026-05-06 |
| 274 | Qwen2.5-72B + Semantic SmoothLLM | 83.48 | — | Imported | 2026-05-06 |
| 275 | Qwen2.5-1.5B + RPO | 83.45 | — | Imported | 2026-05-06 |
| 276 | Qwen2.5-32B + SmoothLLM | 83.41 | — | Imported | 2026-05-06 |
| 277 | Claude-3-5-haiku + Semantic SmoothLLM | 83.35 | — | Imported | 2026-05-06 |
| 278 | Gemma-2-2b-it + SelfReminder | 83.31 | — | Imported | 2026-05-06 |
| 279 | GLM-4-plus + PerplexityFilter | 83.30 | — | Imported | 2026-05-06 |
| 280 | Kimi-latest + Baseline | 83.30 | — | Imported | 2026-05-06 |
| 281 | Kimi-latest + SmoothLLM | 83.22 | — | Imported | 2026-05-06 |
| 282 | Qwen2.5-32B + ICL | 83.20 | — | Imported | 2026-05-06 |
| 283 | Qwen3-32B + RPO | 83.18 | — | Imported | 2026-05-06 |
| 284 | Claude-3-5-sonnet + Semantic SmoothLLM | 83.14 | — | Imported | 2026-05-06 |
| 285 | Qwen2.5-1.5B + SmoothLLM | 83.09 | — | Imported | 2026-05-06 |
| 286 | DS-Llama-70b + RPO | 83.05 | — | Imported | 2026-05-06 |
| 287 | Qwen2.5-1.5B + GoalPriority | 83.03 | — | Imported | 2026-05-06 |
| 288 | Qwen2.5-1.5B + ICL | 83.03 | — | Imported | 2026-05-06 |
| 289 | Llama-3.1-Tulu-3-8B + Baseline | 83 | — | Imported | 2026-05-06 |
| 290 | Kimi-latest + RPO | 82.99 | — | Imported | 2026-05-06 |
| 291 | Llama-3.1-Tulu-3-8B + SmoothLLM | 82.97 | — | Imported | 2026-05-06 |
| 292 | Doubao-1.5-lite + SmoothLLM | 82.95 | — | Imported | 2026-05-06 |
| 293 | DS-v3-0324 + RPO | 82.93 | — | Imported | 2026-05-06 |
| 294 | Qwen2.5-1.5B + Semantic SmoothLLM | 82.88 | — | Imported | 2026-05-06 |
| 295 | Llama-3.3-70B + ICL | 82.86 | — | Imported | 2026-05-06 |
| 296 | o3-mini + Paraphrase | 82.85 | — | Imported | 2026-05-06 |
| 297 | GLM-4-plus + Baseline | 82.80 | — | Imported | 2026-05-06 |
| 298 | Gemma-2-2b-it + RPO | 82.76 | — | Imported | 2026-05-06 |
| 299 | Llama-3-1-405B + SmoothLLM | 82.69 | — | Imported | 2026-05-06 |
| 300 | Qwen2.5-0.5B + ICL | 82.63 | — | Imported | 2026-05-06 |
| 301 | GLM-4-flash + GoalPriority | 82.61 | — | Imported | 2026-05-06 |
| 302 | Qwen3-0.6B + Paraphrase | 82.56 | — | Imported | 2026-05-06 |
| 303 | Llama-3-1-405B + RPO | 82.49 | — | Imported | 2026-05-06 |
| 304 | Llama-3.1-70B + SmoothLLM | 82.48 | — | Imported | 2026-05-06 |
| 305 | Llama-3.1-8B + ICL | 82.42 | — | Imported | 2026-05-06 |
| 306 | Qwen3-8B + SmoothLLM | 82.34 | — | Imported | 2026-05-06 |
| 307 | Qwen2.5-32B + RPO | 82.32 | — | Imported | 2026-05-06 |
| 308 | GPT-4o-11-20 + Paraphrase | 82.27 | — | Imported | 2026-05-06 |
| 309 | Qwen3-14B + Baseline | 82.27 | — | Imported | 2026-05-06 |
| 310 | DS-r1 + SelfReminder | 82.16 | — | Imported | 2026-05-06 |
| 311 | Qwen2-7B + SelfReminder | 82.14 | — | Imported | 2026-05-06 |
| 312 | Qwen3-1.7B + GoalPriority | 82.13 | — | Imported | 2026-05-06 |
| 313 | Qwen3-14B + Semantic SmoothLLM | 82.13 | — | Imported | 2026-05-06 |
| 314 | DS-Llama-70b + SmoothLLM | 82.11 | — | Imported | 2026-05-06 |
| 315 | Doubao-pro + Paraphrase | 82.06 | — | Imported | 2026-05-06 |
| 316 | Gemini-2.0-flash + Paraphrase | 82.05 | — | Imported | 2026-05-06 |
| 317 | Qwen2.5-0.5B + Paraphrase | 81.93 | — | Imported | 2026-05-06 |
| 318 | Qwen2.5-1.5B + PerplexityFilter | 81.86 | — | Imported | 2026-05-06 |
| 319 | Qwen2.5-14B + Semantic SmoothLLM | 81.74 | — | Imported | 2026-05-06 |
| 320 | Llama-3.1-Tulu-3-70B + Semantic SmoothLLM | 81.73 | — | Imported | 2026-05-06 |
| 321 | Qwen3-8B + PerplexityFilter | 81.71 | — | Imported | 2026-05-06 |
| 322 | Llama-3.1-8B + RPO | 81.70 | — | Imported | 2026-05-06 |
| 323 | DS-2-1212 + GoalPriority | 81.67 | — | Imported | 2026-05-06 |
| 324 | Qwen3-8B + RPO | 81.57 | — | Imported | 2026-05-06 |
| 325 | Llama-3.3-70B + SelfReminder | 81.56 | — | Imported | 2026-05-06 |
| 326 | Gemma-2-2b-it + ICL | 81.49 | — | Imported | 2026-05-06 |
| 327 | Qwen3-32B + Semantic SmoothLLM | 81.47 | — | Imported | 2026-05-06 |
| 328 | Qwen2.5-32B + Semantic SmoothLLM | 81.42 | — | Imported | 2026-05-06 |
| 329 | Qwen2.5-32B + PerplexityFilter | 81.38 | — | Imported | 2026-05-06 |
| 330 | Gemma-2-2b-it + Semantic SmoothLLM | 81.32 | — | Imported | 2026-05-06 |
| 331 | Qwen2-7B + PerplexityFilter | 81.29 | — | Imported | 2026-05-06 |
| 332 | Qwen2.5-1.5B + Baseline | 81.23 | — | Imported | 2026-05-06 |
| 333 | DS-v3-0324 + SmoothLLM | 81.23 | — | Imported | 2026-05-06 |
| 334 | Gemini-2.0-pro + ICL | 81.23 | — | Imported | 2026-05-06 |
| 335 | Qwen3-8B + Baseline | 81.20 | — | Imported | 2026-05-06 |
| 336 | Qwen2-7B + Baseline | 81.20 | — | Imported | 2026-05-06 |
| 337 | DS-2-1212 + SelfReminder | 81.19 | — | Imported | 2026-05-06 |
| 338 | DS-Llama-70b + Baseline | 81.16 | — | Imported | 2026-05-06 |
| 339 | Qwen2-7B + RPO | 81.10 | — | Imported | 2026-05-06 |
| 340 | Qwen2.5-32B + Baseline | 81.02 | — | Imported | 2026-05-06 |
| 341 | Doubao-1.5-pro + Paraphrase | 80.97 | — | Imported | 2026-05-06 |
| 342 | GLM-4-plus + SmoothLLM | 80.92 | — | Imported | 2026-05-06 |
| 343 | Qwen3-4B + Semantic SmoothLLM | 80.88 | — | Imported | 2026-05-06 |
| 344 | Gemma-2-2b-it + PerplexityFilter | 80.69 | — | Imported | 2026-05-06 |
| 345 | Claude-3-7-sonnet + Semantic SmoothLLM | 80.67 | — | Imported | 2026-05-06 |
| 346 | DS-Llama-70b + Semantic SmoothLLM | 80.66 | — | Imported | 2026-05-06 |
| 347 | Qwen2-7B + Semantic SmoothLLM | 80.64 | — | Imported | 2026-05-06 |
| 348 | DS-v3-0324 + Baseline | 80.60 | — | Imported | 2026-05-06 |
| 349 | Llama-3.3-70B + SmoothLLM | 80.60 | — | Imported | 2026-05-06 |
| 350 | Qwen2-7B + ICL | 80.60 | — | Imported | 2026-05-06 |
| 351 | Phi-3-5-MoE + Paraphrase | 80.58 | — | Imported | 2026-05-06 |
| 352 | Qwen2.5-3B + SmoothLLM | 80.57 | — | Imported | 2026-05-06 |
| 353 | Doubao-1.5-lite + Paraphrase | 80.50 | — | Imported | 2026-05-06 |
| 354 | Kimi-latest + Semantic SmoothLLM | 80.48 | — | Imported | 2026-05-06 |
| 355 | Llama-3.3-70B + GoalPriority | 80.41 | — | Imported | 2026-05-06 |
| 356 | Llama-3.1-8B + SmoothLLM | 80.33 | — | Imported | 2026-05-06 |
| 357 | Gemma-2-2b-it + Paraphrase | 80.32 | — | Imported | 2026-05-06 |
| 358 | GPT-4o-08-06 + Paraphrase | 80.32 | — | Imported | 2026-05-06 |
| 359 | Qwen3-32B + PerplexityFilter | 80.28 | — | Imported | 2026-05-06 |
| 360 | Gemma-2-2b-it + Baseline | 80.27 | — | Imported | 2026-05-06 |
| 361 | Qwen2.5-3B + ICL | 80.25 | — | Imported | 2026-05-06 |
| 362 | Qwen2.5-7B + Semantic SmoothLLM | 80.22 | — | Imported | 2026-05-06 |
| 363 | Qwen2.5-3B + Semantic SmoothLLM | 80.20 | — | Imported | 2026-05-06 |
| 364 | Qwen2.5-7B + SelfReminder | 80.20 | — | Imported | 2026-05-06 |
| 365 | Qwen2-7B + SmoothLLM | 80.20 | — | Imported | 2026-05-06 |
| 366 | Qwen2.5-3B + RPO | 80.18 | — | Imported | 2026-05-06 |
| 367 | Qwen3-32B + Baseline | 80.17 | — | Imported | 2026-05-06 |
| 368 | Qwen2.5-7B + SmoothLLM | 80.10 | — | Imported | 2026-05-06 |
| 369 | Qwen2.5-7B + RPO | 80.03 | — | Imported | 2026-05-06 |
| 370 | Llama-3.1-8B + PerplexityFilter | 79.99 | — | Imported | 2026-05-06 |
| 371 | Gemini-2.0-flash + Baseline | 79.94 | — | Imported | 2026-05-06 |
| 372 | Qwen2-72B + Paraphrase | 79.78 | — | Imported | 2026-05-06 |
| 373 | Qwen2.5-0.5B + SelfReminder | 79.77 | — | Imported | 2026-05-06 |
| 374 | Llama-3.1-70B + PerplexityFilter | 79.76 | — | Imported | 2026-05-06 |
| 375 | Kimi-latest + Paraphrase | 79.76 | — | Imported | 2026-05-06 |
| 376 | Qwen3-1.7B + SelfReminder | 79.75 | — | Imported | 2026-05-06 |
| 377 | Doubao-lite + Semantic SmoothLLM | 79.75 | — | Imported | 2026-05-06 |
| 378 | Llama-3-1-405B + Baseline | 79.72 | — | Imported | 2026-05-06 |
| 379 | Qwen2.5-7B + GoalPriority | 79.65 | — | Imported | 2026-05-06 |
| 380 | Doubao-1.5-lite + Semantic SmoothLLM | 79.64 | — | Imported | 2026-05-06 |
| 381 | Gemini-2.0-pro + Baseline | 79.58 | — | Imported | 2026-05-06 |
| 382 | Gemini-2.0-flash-lite + Paraphrase | 79.55 | — | Imported | 2026-05-06 |
| 383 | Llama-3.3-70B + RPO | 79.50 | — | Imported | 2026-05-06 |
| 384 | Qwen2-7B + Paraphrase | 79.48 | — | Imported | 2026-05-06 |
| 385 | Qwen3-4B + SmoothLLM | 79.41 | — | Imported | 2026-05-06 |
| 386 | Qwen2.5-3B + Paraphrase | 79.39 | — | Imported | 2026-05-06 |
| 387 | GLM-4-flash + SelfReminder | 79.27 | — | Imported | 2026-05-06 |
| 388 | DS-r1 + ICL | 79.26 | — | Imported | 2026-05-06 |
| 389 | Doubao-1.5-pro + Semantic SmoothLLM | 79.24 | — | Imported | 2026-05-06 |
| 390 | Qwen2.5-72B + Paraphrase | 79.18 | — | Imported | 2026-05-06 |
| 391 | o3-mini + Semantic SmoothLLM | 79.17 | — | Imported | 2026-05-06 |
| 392 | Doubao-pro + Semantic SmoothLLM | 79.07 | — | Imported | 2026-05-06 |
| 393 | Qwen2.5-7B + ICL | 79.03 | — | Imported | 2026-05-06 |
| 394 | Llama-3.1-70B + Baseline | 79.02 | — | Imported | 2026-05-06 |
| 395 | Qwen2.5-7B + PerplexityFilter | 78.99 | — | Imported | 2026-05-06 |
| 396 | DS-v3 + ICL | 78.97 | — | Imported | 2026-05-06 |
| 397 | Gemini-2.0-flash + RPO | 78.97 | — | Imported | 2026-05-06 |
| 398 | Qwen2.5-14B + Paraphrase | 78.95 | — | Imported | 2026-05-06 |
| 399 | Gemini-2.0-flash + SmoothLLM | 78.90 | — | Imported | 2026-05-06 |
| 400 | Qwen2.5-3B + SelfReminder | 78.76 | — | Imported | 2026-05-06 |
| 401 | Qwen2.5-0.5B + PerplexityFilter | 78.69 | — | Imported | 2026-05-06 |
| 402 | DS-r1 + Paraphrase | 78.59 | — | Imported | 2026-05-06 |
| 403 | Llama-3.2-3B + Paraphrase | 78.57 | — | Imported | 2026-05-06 |
| 404 | Gemini-2.0-flash + ICL | 78.51 | — | Imported | 2026-05-06 |
| 405 | Qwen3-1.7B + ICL | 78.50 | — | Imported | 2026-05-06 |
| 406 | Qwen2.5-0.5B + RPO | 78.36 | — | Imported | 2026-05-06 |
| 407 | Qwen2.5-7B + Baseline | 78.31 | — | Imported | 2026-05-06 |
| 408 | Qwen2.5-7B + Paraphrase | 78.31 | — | Imported | 2026-05-06 |
| 409 | Qwen2.5-32B + Paraphrase | 78.18 | — | Imported | 2026-05-06 |
| 410 | Qwen3-1.7B + Semantic SmoothLLM | 78.11 | — | Imported | 2026-05-06 |
| 411 | DS-v3-0324 + Paraphrase | 78.09 | — | Imported | 2026-05-06 |
| 412 | Qwen3-4B + RPO | 78.06 | — | Imported | 2026-05-06 |
| 413 | Llama-3.1-8B + Semantic SmoothLLM | 78.05 | — | Imported | 2026-05-06 |
| 414 | Gemini-2.0-pro + RPO | 78 | — | Imported | 2026-05-06 |
| 415 | Gemini-2.0-flash + Semantic SmoothLLM | 78 | — | Imported | 2026-05-06 |
| 416 | Qwen2.5-3B + PerplexityFilter | 77.94 | — | Imported | 2026-05-06 |
| 417 | Qwen2.5-0.5B + SmoothLLM | 77.93 | — | Imported | 2026-05-06 |
| 418 | Qwen2.5-0.5B + Semantic SmoothLLM | 77.90 | — | Imported | 2026-05-06 |
| 419 | GPT-4o-mini + Paraphrase | 77.83 | — | Imported | 2026-05-06 |
| 420 | Llama-3.1-8B + Baseline | 77.80 | — | Imported | 2026-05-06 |
| 421 | Gemini-2.0-pro + Paraphrase | 77.77 | — | Imported | 2026-05-06 |
| 422 | Llama-3.2-1B + Paraphrase | 77.68 | — | Imported | 2026-05-06 |
| 423 | Qwen2.5-0.5B + GoalPriority | 77.66 | — | Imported | 2026-05-06 |
| 424 | Llama-3.1-8B + Paraphrase | 77.65 | — | Imported | 2026-05-06 |
| 425 | Gemini-2.0-pro + SmoothLLM | 77.63 | — | Imported | 2026-05-06 |
| 426 | Qwen2.5-3B + Baseline | 77.45 | — | Imported | 2026-05-06 |
| 427 | Qwen2.5-0.5B + Baseline | 77.44 | — | Imported | 2026-05-06 |
| 428 | DS-v3-0324 + Semantic SmoothLLM | 77.24 | — | Imported | 2026-05-06 |
| 429 | Qwen2.5-3B + GoalPriority | 77.14 | — | Imported | 2026-05-06 |
| 430 | Gemini-2.0-flash + PerplexityFilter | 77.13 | — | Imported | 2026-05-06 |
| 431 | DS-v3 + RPO | 77.10 | — | Imported | 2026-05-06 |
| 432 | Qwen3-4B + Baseline | 76.83 | — | Imported | 2026-05-06 |
| 433 | Qwen3-0.6B + Semantic SmoothLLM | 76.81 | — | Imported | 2026-05-06 |
| 434 | Qwen3-0.6B + SelfReminder | 76.74 | — | Imported | 2026-05-06 |
| 435 | Llama-3.2-3B + Semantic SmoothLLM | 76.56 | — | Imported | 2026-05-06 |
| 436 | Gemini-2.0-pro + Semantic SmoothLLM | 76.54 | — | Imported | 2026-05-06 |
| 437 | GLM-4-flash + Paraphrase | 76.51 | — | Imported | 2026-05-06 |
| 438 | Llama-3.2-1B + Semantic SmoothLLM | 76.49 | — | Imported | 2026-05-06 |
| 439 | GLM-4-flash + PerplexityFilter | 76.43 | — | Imported | 2026-05-06 |
| 440 | GLM-4-flash + ICL | 76.33 | — | Imported | 2026-05-06 |
| 441 | Gemini-2.0-flash-lite + Baseline | 76.25 | — | Imported | 2026-05-06 |
| 442 | Qwen3-0.6B + GoalPriority | 76.15 | — | Imported | 2026-05-06 |
| 443 | Gemini-2.0-flash-lite + RPO | 76.14 | — | Imported | 2026-05-06 |
| 444 | GLM-4-plus + Paraphrase | 76.13 | — | Imported | 2026-05-06 |
| 445 | Llama-3-1-405B + Semantic SmoothLLM | 76.10 | — | Imported | 2026-05-06 |
| 446 | Gemini-2.0-flash-lite + Semantic SmoothLLM | 76.08 | — | Imported | 2026-05-06 |
| 447 | Qwen3-1.7B + SmoothLLM | 76.07 | — | Imported | 2026-05-06 |
| 448 | DS-r1 + Semantic SmoothLLM | 76.02 | — | Imported | 2026-05-06 |
| 449 | Llama-3.3-70B + PerplexityFilter | 75.97 | — | Imported | 2026-05-06 |
| 450 | GLM-4-flash + RPO | 75.95 | — | Imported | 2026-05-06 |
| 451 | DS-v3 + Semantic SmoothLLM | 75.84 | — | Imported | 2026-05-06 |
| 452 | Qwen3-0.6B + ICL | 75.78 | — | Imported | 2026-05-06 |
| 453 | GLM-4-flash + Baseline | 75.73 | — | Imported | 2026-05-06 |
| 454 | Llama-3.3-70B + Baseline | 75.64 | — | Imported | 2026-05-06 |
| 455 | Qwen3-4B + PerplexityFilter | 75.61 | — | Imported | 2026-05-06 |
| 456 | GLM-4-flash + Semantic SmoothLLM | 75.58 | — | Imported | 2026-05-06 |
| 457 | Llama-3-1-405B + Paraphrase | 75.56 | — | Imported | 2026-05-06 |
| 458 | DS-v3 + Paraphrase | 75.44 | — | Imported | 2026-05-06 |
| 459 | GLM-4-plus + Semantic SmoothLLM | 75.42 | — | Imported | 2026-05-06 |
| 460 | Llama-3.3-70B + Paraphrase | 75.40 | — | Imported | 2026-05-06 |
| 461 | DS-v3 + SmoothLLM | 75.39 | — | Imported | 2026-05-06 |
| 462 | DS-v3 + PerplexityFilter | 75.36 | — | Imported | 2026-05-06 |
| 463 | DS-r1 + RPO | 75.25 | — | Imported | 2026-05-06 |
| 464 | GLM-4-flash + SmoothLLM | 75.21 | — | Imported | 2026-05-06 |
| 465 | DS-2-1212 + Paraphrase | 75.19 | — | Imported | 2026-05-06 |
| 466 | Gemini-2.0-flash-lite + SmoothLLM | 75.11 | — | Imported | 2026-05-06 |
| 467 | Llama-3.1-70B + Paraphrase | 75.05 | — | Imported | 2026-05-06 |
| 468 | Llama-3.3-70B + Semantic SmoothLLM | 74.94 | — | Imported | 2026-05-06 |
| 469 | Gemini-2.0-flash-lite + PerplexityFilter | 74.75 | — | Imported | 2026-05-06 |
| 470 | DS-r1 + SmoothLLM | 74.56 | — | Imported | 2026-05-06 |
| 471 | Gemini-2.0-flash-lite + ICL | 74.55 | — | Imported | 2026-05-06 |
| 472 | DS-v3 + Baseline | 74.44 | — | Imported | 2026-05-06 |
| 473 | Qwen3-0.6B + SmoothLLM | 74.39 | — | Imported | 2026-05-06 |
| 474 | Llama-3.1-70B + Semantic SmoothLLM | 74.03 | — | Imported | 2026-05-06 |
| 475 | DS-v3-0324 + PerplexityFilter | 73.25 | — | Imported | 2026-05-06 |
| 476 | DS-r1 + PerplexityFilter | 73.08 | — | Imported | 2026-05-06 |
| 477 | Qwen3-1.7B + RPO | 72.73 | — | Imported | 2026-05-06 |
| 478 | DS-2-1212 + Semantic SmoothLLM | 72.68 | — | Imported | 2026-05-06 |
| 479 | DS-r1 + Baseline | 72.48 | — | Imported | 2026-05-06 |
| 480 | Qwen3-0.6B + PerplexityFilter | 72 | — | Imported | 2026-05-06 |
| 481 | Qwen3-1.7B + PerplexityFilter | 71.36 | — | Imported | 2026-05-06 |
| 482 | Qwen3-1.7B + Baseline | 70.91 | — | Imported | 2026-05-06 |
| 483 | Qwen3-0.6B + RPO | 70.65 | — | Imported | 2026-05-06 |
| 484 | Qwen3-0.6B + Baseline | 70.42 | — | Imported | 2026-05-06 |
| 485 | DS-2-1212 + RPO | 70.22 | — | Imported | 2026-05-06 |
| 486 | DS-2-1212 + SmoothLLM | 68.32 | — | Imported | 2026-05-06 |
| 487 | DS-2-1212 + ICL | 67.80 | — | Imported | 2026-05-06 |
| 488 | DS-2-1212 + PerplexityFilter | 67.72 | — | Imported | 2026-05-06 |
| 489 | DS-2-1212 + Baseline | 66.36 | — | Imported | 2026-05-06 |
| 490 | Gemini-2.0-pro + PerplexityFilter | 65 | — | Imported | 2026-05-06 |
No matching rows.