OTIS Mock AIME 2024-2025

Competition-level math problems from OTIS Mock AIME evaluating olympiad-level math.

38rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.2 96.11 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
2 Gemini 3 Pro 92.78 Gemini 3
google-gemini-3
Imported 2026-05-06
3 GPT-OSS 120B 88.89 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
4 DeepSeek V3 87.82 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
5 Qwen 3 235B 86.67 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
6 Claude Opus 4.5 86.11 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
7 Gemini 2.5 Pro (Jun 2025) 84.72 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
8 Grok 4 84 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
9 o3 83.89 o3
openai-o3
Imported 2026-05-06
10 kimi-k2-thinking (official) 83.06 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
11 o4-mini (high) 81.67 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
12 Grok-3 mini 77.78 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-06
13 Claude Sonnet 4.5 77.78 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
14 Qwen3-Max-Instruct 73.33 Qwen3 Max
qwen-qwen3-max
Imported 2026-05-06
15 o1 73.33 o1
openai-o1
Imported 2026-05-06
16 Claude Haiku 4.5 66.67 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
17 Gemini 2.0 Flash Thinking Exp 57.78 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-06
18 Claude 3.7 Sonnet 57.78 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
19 DeepSeek R1 53.33 R1
deepseek-r1
Imported 2026-05-06
20 GPT-4.1 mini 44.72 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
21 GPT-4.1 38.33 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
22 Mistral Large 32.22 Mistral Large
mistralai-mistral-large
Imported 2026-05-06
23 Gemini 1.5 Flash 23.06 Imported 2026-05-06
24 Llama 4 Maverick (FP8) 20.56 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
25 Gemma 3 27B 19.72 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-06
26 Qwen Plus 17.78 Qwen-Plus
qwen-qwen-plus
Imported 2026-05-06
27 Qwen2.5-Max 16.11 Imported 2026-05-06
28 Phi-4 13.75 Phi 4
microsoft-phi-4
Imported 2026-05-06
29 Llama 3.1 405B 9.72 Imported 2026-05-06
30 Llama 4 Scout 7.78 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
31 gpt-4o-mini-2024-07-18 6.94 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-06
32 GPT-4 Turbo 6.67 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
33 GPT-4o 6.39 GPT-4o
openai-gpt-4o
Imported 2026-05-06
34 Llama 3.3 70B 5.14 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
35 Claude 3 Opus 4.72 Imported 2026-05-06
36 Claude 3.5 Haiku 4.31 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
37 Meta-Llama-3-8B-Instruct 4.31 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-06
38 Llama-2-7b 0 Imported 2026-05-06