Structured Output Benchmark
SOB evaluates how accurately language models produce schema-compliant and value-correct JSON from normalized text contexts spanning text QA, OCR-derived documents, and meeting transcripts.
28rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, Value Accuracy, Faithfulness, JSON Pass, Path Recall, Structure Coverage, Type Safety, Perfect Response
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.4 | 87 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 2 | Gemini-3.1-Pro | 86.90 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 3 | GLM-5.1 | 86.60 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 4 | Claude-Opus-4.7 | 86.40 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 5 | GLM-4.7 | 86.10 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 6 | Qwen3.5-35B | 86.10 | — | Imported | 2026-05-06 |
| 7 | GPT-5.5 | 86 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 8 | Gemini-2.5-Flash | 86 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 9 | Qwen3-235B | 85.70 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 10 | Interfaze-Beta | 85.50 | — | Imported | 2026-05-06 |
| 11 | Claude-Sonnet-4.6 | 85.40 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 12 | Claude-Opus-4.6 | 85.30 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 13 | DeepSeek-V4-Pro | 85.30 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 14 | Kimi-2.6 | 85.30 | — | Imported | 2026-05-06 |
| 15 | GPT-4.1 | 85 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 16 | GPT-5 | 84.90 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 17 | Gemma-3-27B | 84.70 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-06 |
| 18 | Qwen3-30B | 84.20 | — | Imported | 2026-05-06 |
| 19 | Nemotron-3-Nano-30B | 84.10 | — | Imported | 2026-05-06 |
| 20 | GPT-5-Mini | 83.50 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 21 | Gemma-4-31B | 83.30 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-06 |
| 22 | Gemini-3-Flash-Preview | 83.30 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 23 | Schematron-8B | 83.20 | — | Imported | 2026-05-06 |
| 24 | IBM-Granite-4.0 | 83.20 | — | Imported | 2026-05-06 |
| 25 | Phi-4 | 83.10 | Phi 4 microsoft-phi-4 | Imported | 2026-05-06 |
| 26 | DS-R1-Distill-32B | 82.70 | — | Imported | 2026-05-06 |
| 27 | Ministral-3-14B | 77.80 | — | Imported | 2026-05-06 |
| 28 | GPT-OSS-20B | 73.20 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
No matching rows.