AA-Omniscience

Artificial Analysis knowledge and hallucination benchmark measuring factual recall, abstention, and hallucination across economically relevant domains.

28rows
omniscience_indexprimary metric
2026-05-11sampled

Metadata

Metrics

AA-Omniscience Index, Accuracy, Attempt Rate, Hallucination Rate (lower is better)

Latest Results

Rows are parsed from the public Artificial Analysis Next.js RSC defaultData payload and ranked by the configured primary metric.

Rank Subject AA-Omniscience Index Model Match Provenance Sampled
1 Gemini 3.1 Pro Preview 32.93 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
2 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 26.17 Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
3 GPT-5.5 (xhigh) 20.07 GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
4 Grok 4.3 18.32 GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
5 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 12.37 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
6 Gemini 3 Flash Preview (Reasoning) 11.57 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
7 Qwen3.6 Max Preview 10.2 Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-11
8 Kimi K2.6 6.42 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
9 GPT-5.4 (xhigh) 5.65 GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
10 Muse Spark 4.08 Imported 2026-05-11
11 MiMo-V2.5-Pro 3.6 MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
12 GLM-5.1 (Reasoning) 1.93 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
13 MiniMax-M2.7 0.68 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
14 Claude 4.5 Haiku (Reasoning) -4.22 Imported 2026-05-11
15 DeepSeek V4 Pro (Reasoning, Max Effort) -10.02 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
16 Llama 3.1 Instruct 405B -17.3 Imported 2026-05-11
17 GPT-5.4 mini (xhigh) -18.68 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
18 DeepSeek V3.2 (Reasoning) -20.88 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
19 DeepSeek V4 Flash (Reasoning, Max Effort) -22.9 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
20 Qwen3.5 397B A17B (Reasoning) -29.78 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
21 Mistral Small 4 (Reasoning) -29.9 Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
22 K2 Think V2 -33.92 Imported 2026-05-11
23 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) -42.07 Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
24 Gemma 4 31B (Reasoning) -45.42 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
25 Nova 2.0 Pro Preview (medium) -48.05 Imported 2026-05-11
26 gpt-oss-120B (high) -50.05 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
27 Solar Pro 3 -53.78 U Solar Pro 3
upstage-solar-pro-3
Imported 2026-05-11
28 gpt-oss-20B (high) -63.92 gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11