IFEval

IFEval evaluates instruction following with verifiable prompt-level and instruction-level constraints, reporting strict and loose accuracy scores.

33rows
final_scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Final Score, Strict Prompt Score, Strict Inst Score, Loose Prompt Score, Loose Inst Score

Showing 2 latest source slices.

Latest Results

Provider-published Qwen3.7-Max comparison scores. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Final Score Model Match Provenance Sampled
1 GLM-5.1 Thinking 94.5% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
2 Kimi K2.6 Thinking 94.5% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
3 Qwen3.6 Plus 94.3% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
4 Qwen3.7 Max 94.3% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
5 Claude Opus 4.6 Max 91.9% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
6 DeepSeek V4 Pro Max 91.9% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
1 perplexity/llama-3-sonar-large-32k-chat (Openrouter) 0.83 Imported 2026-05-06
2 llama3-70b-8192 (Groq) 0.82 Imported 2026-05-06
3 qwen/qwen-2-72b-instruct (Openrouter) 0.81 Imported 2026-05-06
4 QuantFactory/NeuralDaredevil-8B-abliterated-GGUF 0.79 Imported 2026-05-06
5 cohere/command-r-plus (Openrouter) 0.77 Imported 2026-05-06
6 failspy/Meta-Llama-3-8B-Instruct-abliterated-v3-GGUF 0.76 Imported 2026-05-06
7 NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS-GGUF 0.74 Imported 2026-05-06
8 perplexity/llama-3-sonar-small-32k-chat (Openrouter) 0.74 Imported 2026-05-06
9 MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF 0.74 Imported 2026-05-06
10 mistralai/mixtral-8x22b-instruct (Openrouter) 0.73 Mistral: Mixtral 8x22B Instruct
mistralai-mixtral-8x22b-instruct
Imported 2026-05-06
11 bartowski/Llama-3-SauerkrautLM-8b-Instruct-GGUF 0.71 Imported 2026-05-06
12 mradermacher/LLaMa-3-CursedStock-v1.6-8B-i1-GGUF 0.70 Imported 2026-05-06
13 mradermacher/c4ai-command-r-v01-GGUF 0.70 Imported 2026-05-06
14 Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix 0.69 Imported 2026-05-06
15 MaziyarPanahi/Llama-3-8B-Instruct-v0.9-GGUF 0.68 Imported 2026-05-06
16 bartowski/Phi-3-medium-4k-instruct-GGUF 0.67 Imported 2026-05-06
17 cognitivecomputations/dolphin-mixtral-8x22b (Openrouter) 0.66 Imported 2026-05-06
18 TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF 0.61 Imported 2026-05-06
19 bartowski/Codestral-22B-v0.1-GGUF 0.61 Imported 2026-05-06
20 qwp4w3hyb/Nous-Hermes-2-Mixtral-8x7B-DPO-iMat-GGUF 0.60 Imported 2026-05-06
21 nousresearch/nous-hermes-2-mistral-7b-dpo (Openrouter) 0.60 Imported 2026-05-06
22 mixtral-8x7b-32768 (Groq) 0.59 Imported 2026-05-06
23 bartowski/openchat-3.6-8b-20240522-GGUF 0.58 Imported 2026-05-06
24 MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF 0.57 Imported 2026-05-06
25 bartowski/Hermes-2-Pro-Llama-3-8B-GGUF 0.57 Imported 2026-05-06
26 mlabonne/AlphaMonarch-7B 0.54 Imported 2026-05-06
27 bartowski/dolphin-2.9-llama3-8b-GGUF 0.52 Imported 2026-05-06