AA-Omniscience | BenchmarkList

Metadata

ID: aa_omniscience
Category: Factuality
Release: Unknown
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

AA-Omniscience Index, Accuracy, Attempt Rate, Hallucination Rate (lower is better)

Rank	Subject	AA-Omniscience Index	Model Match	Provenance	Sampled
1	Gemini 3.1 Pro Preview	32.93	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-11
2	Claude Opus 4.7 (Adaptive Reasoning, Max Effort)	26.17	Claude Opus 4.7 anthropic-claude-opus-4.7	Imported	2026-05-11
3	GPT-5.5 (xhigh)	20.07	GPT-5.5 openai-gpt-5.5	Imported	2026-05-11
4	Grok 4.3	18.32	GROK Grok 4.3 x-ai-grok-4.3	Imported	2026-05-11
5	Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)	12.37	Claude Sonnet 4.6 anthropic-claude-sonnet-4.6	Imported	2026-05-11
6	Gemini 3 Flash Preview (Reasoning)	11.57	Gemini 3 Flash Preview google-gemini-3-flash-preview	Imported	2026-05-11
7	Qwen3.6 Max Preview	10.2	Qwen3.6 Max Preview qwen-qwen3.6-max-preview	Imported	2026-05-11
8	Kimi K2.6	6.42	KIMI MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6	Imported	2026-05-11
9	GPT-5.4 (xhigh)	5.65	GPT-5.4 openai-gpt-5.4	Imported	2026-05-11
10	Muse Spark	4.08	—	Imported	2026-05-11
11	MiMo-V2.5-Pro	3.6	MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro	Imported	2026-05-11
12	GLM-5.1 (Reasoning)	1.93	GLM GLM 5.1 z-ai-glm-5.1	Imported	2026-05-11
13	MiniMax-M2.7	0.68	MiniMax M2.7 minimax-minimax-m2.7	Imported	2026-05-11
14	Claude 4.5 Haiku (Reasoning)	-4.22	—	Imported	2026-05-11
15	DeepSeek V4 Pro (Reasoning, Max Effort)	-10.02	DeepSeek V4 Pro deepseek-deepseek-v4-pro	Imported	2026-05-11
16	Llama 3.1 Instruct 405B	-17.3	—	Imported	2026-05-11
17	GPT-5.4 mini (xhigh)	-18.68	GPT-5.4 Mini openai-gpt-5.4-mini	Imported	2026-05-11
18	DeepSeek V3.2 (Reasoning)	-20.88	DeepSeek V3.2 deepseek-deepseek-v3.2	Imported	2026-05-11
19	DeepSeek V4 Flash (Reasoning, Max Effort)	-22.9	DeepSeek V4 Flash deepseek-deepseek-v4-flash	Imported	2026-05-11
20	Qwen3.5 397B A17B (Reasoning)	-29.78	Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b	Imported	2026-05-11
21	Mistral Small 4 (Reasoning)	-29.9	Mistral: Mistral Small 4 mistralai-mistral-small-2603	Imported	2026-05-11
22	K2 Think V2	-33.92	—	Imported	2026-05-11
23	NVIDIA Nemotron 3 Super 120B A12B (Reasoning)	-42.07	Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b	Imported	2026-05-11
24	Gemma 4 31B (Reasoning)	-45.42	Gemma 4 31B google-gemma-4-31b-it	Imported	2026-05-11
25	Nova 2.0 Pro Preview (medium)	-48.05	—	Imported	2026-05-11
26	gpt-oss-120B (high)	-50.05	gpt-oss-120b openai-gpt-oss-120b	Imported	2026-05-11
27	Solar Pro 3	-53.78	U Solar Pro 3 upstage-solar-pro-3	Imported	2026-05-11
28	gpt-oss-20B (high)	-63.92	gpt-oss-20b openai-gpt-oss-20b	Imported	2026-05-11

Metadata

Metrics

Latest Results