IFEval
IFEval evaluates instruction following with verifiable prompt-level and instruction-level constraints, reporting strict and loose accuracy scores.
33rows
final_scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Final Score, Strict Prompt Score, Strict Inst Score, Loose Prompt Score, Loose Inst Score
Showing 2 latest source slices.
| Rank | Subject | Final Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GLM-5.1 Thinking | 94.5% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 2 | Kimi K2.6 Thinking | 94.5% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 3 | Qwen3.6 Plus | 94.3% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 4 | Qwen3.7 Max | 94.3% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 5 | Claude Opus 4.6 Max | 91.9% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 6 | DeepSeek V4 Pro Max | 91.9% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 1 | perplexity/llama-3-sonar-large-32k-chat (Openrouter) | 0.83 | — | Imported | 2026-05-06 |
| 2 | llama3-70b-8192 (Groq) | 0.82 | — | Imported | 2026-05-06 |
| 3 | qwen/qwen-2-72b-instruct (Openrouter) | 0.81 | — | Imported | 2026-05-06 |
| 4 | QuantFactory/NeuralDaredevil-8B-abliterated-GGUF | 0.79 | — | Imported | 2026-05-06 |
| 5 | cohere/command-r-plus (Openrouter) | 0.77 | — | Imported | 2026-05-06 |
| 6 | failspy/Meta-Llama-3-8B-Instruct-abliterated-v3-GGUF | 0.76 | — | Imported | 2026-05-06 |
| 7 | NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS-GGUF | 0.74 | — | Imported | 2026-05-06 |
| 8 | perplexity/llama-3-sonar-small-32k-chat (Openrouter) | 0.74 | — | Imported | 2026-05-06 |
| 9 | MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF | 0.74 | — | Imported | 2026-05-06 |
| 10 | mistralai/mixtral-8x22b-instruct (Openrouter) | 0.73 | Mistral: Mixtral 8x22B Instruct mistralai-mixtral-8x22b-instruct | Imported | 2026-05-06 |
| 11 | bartowski/Llama-3-SauerkrautLM-8b-Instruct-GGUF | 0.71 | — | Imported | 2026-05-06 |
| 12 | mradermacher/LLaMa-3-CursedStock-v1.6-8B-i1-GGUF | 0.70 | — | Imported | 2026-05-06 |
| 13 | mradermacher/c4ai-command-r-v01-GGUF | 0.70 | — | Imported | 2026-05-06 |
| 14 | Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix | 0.69 | — | Imported | 2026-05-06 |
| 15 | MaziyarPanahi/Llama-3-8B-Instruct-v0.9-GGUF | 0.68 | — | Imported | 2026-05-06 |
| 16 | bartowski/Phi-3-medium-4k-instruct-GGUF | 0.67 | — | Imported | 2026-05-06 |
| 17 | cognitivecomputations/dolphin-mixtral-8x22b (Openrouter) | 0.66 | — | Imported | 2026-05-06 |
| 18 | TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF | 0.61 | — | Imported | 2026-05-06 |
| 19 | bartowski/Codestral-22B-v0.1-GGUF | 0.61 | — | Imported | 2026-05-06 |
| 20 | qwp4w3hyb/Nous-Hermes-2-Mixtral-8x7B-DPO-iMat-GGUF | 0.60 | — | Imported | 2026-05-06 |
| 21 | nousresearch/nous-hermes-2-mistral-7b-dpo (Openrouter) | 0.60 | — | Imported | 2026-05-06 |
| 22 | mixtral-8x7b-32768 (Groq) | 0.59 | — | Imported | 2026-05-06 |
| 23 | bartowski/openchat-3.6-8b-20240522-GGUF | 0.58 | — | Imported | 2026-05-06 |
| 24 | MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF | 0.57 | — | Imported | 2026-05-06 |
| 25 | bartowski/Hermes-2-Pro-Llama-3-8B-GGUF | 0.57 | — | Imported | 2026-05-06 |
| 26 | mlabonne/AlphaMonarch-7B | 0.54 | — | Imported | 2026-05-06 |
| 27 | bartowski/dolphin-2.9-llama3-8b-GGUF | 0.52 | — | Imported | 2026-05-06 |
No matching rows.