AA-Omniscience
Artificial Analysis knowledge and hallucination benchmark measuring factual recall, abstention, and hallucination across economically relevant domains.
28rows
omniscience_indexprimary metric
2026-05-11sampled
Metadata
Metrics
AA-Omniscience Index, Accuracy, Attempt Rate, Hallucination Rate (lower is better)
| Rank | Subject | AA-Omniscience Index | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 32.93 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-11 |
| 2 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | 26.17 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-11 |
| 3 | GPT-5.5 (xhigh) | 20.07 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-11 |
| 4 | Grok 4.3 | 18.32 | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-11 |
| 5 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | 12.37 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-11 |
| 6 | Gemini 3 Flash Preview (Reasoning) | 11.57 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-11 |
| 7 | Qwen3.6 Max Preview | 10.2 | Qwen3.6 Max Preview qwen-qwen3.6-max-preview | Imported | 2026-05-11 |
| 8 | Kimi K2.6 | 6.42 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-11 |
| 9 | GPT-5.4 (xhigh) | 5.65 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-11 |
| 10 | Muse Spark | 4.08 | — | Imported | 2026-05-11 |
| 11 | MiMo-V2.5-Pro | 3.6 | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-11 |
| 12 | GLM-5.1 (Reasoning) | 1.93 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-11 |
| 13 | MiniMax-M2.7 | 0.68 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-11 |
| 14 | Claude 4.5 Haiku (Reasoning) | -4.22 | — | Imported | 2026-05-11 |
| 15 | DeepSeek V4 Pro (Reasoning, Max Effort) | -10.02 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-11 |
| 16 | Llama 3.1 Instruct 405B | -17.3 | — | Imported | 2026-05-11 |
| 17 | GPT-5.4 mini (xhigh) | -18.68 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-11 |
| 18 | DeepSeek V3.2 (Reasoning) | -20.88 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 19 | DeepSeek V4 Flash (Reasoning, Max Effort) | -22.9 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-11 |
| 20 | Qwen3.5 397B A17B (Reasoning) | -29.78 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-11 |
| 21 | Mistral Small 4 (Reasoning) | -29.9 | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-11 |
| 22 | K2 Think V2 | -33.92 | — | Imported | 2026-05-11 |
| 23 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | -42.07 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-11 |
| 24 | Gemma 4 31B (Reasoning) | -45.42 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-11 |
| 25 | Nova 2.0 Pro Preview (medium) | -48.05 | — | Imported | 2026-05-11 |
| 26 | gpt-oss-120B (high) | -50.05 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-11 |
| 27 | Solar Pro 3 | -53.78 | Solar Pro 3 upstage-solar-pro-3 | Imported | 2026-05-11 |
| 28 | gpt-oss-20B (high) | -63.92 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-11 |
No matching rows.