WeirdML

Unusual machine-learning tasks designed to test model adaptability.

31rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.2 72.20 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
2 Gemini 3 Pro 69.93 Gemini 3
google-gemini-3
Imported 2026-05-06
3 Claude Opus 4.5 63.70 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
4 o3 58.21 o3
openai-o3
Imported 2026-05-06
5 Gemini 2.5 Pro (Jun 2025) 54.03 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
6 o4-mini (high) 52.56 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
7 GPT-OSS 120B 48.17 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
8 o1 47.56 o1
openai-o1
Imported 2026-05-06
9 Grok 4 45.73 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
10 kimi-k2-thinking (official) 42.79 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
11 Grok-3 mini 42.58 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-06
12 DeepSeek V3 41.63 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
13 Qwen3-Max-Instruct 41.17 Qwen3 Max
qwen-qwen3-max
Imported 2026-05-06
14 Qwen 3 235B 41.04 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
15 Claude 3.7 Sonnet 39.97 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
16 GPT-4.1 39.37 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
17 GPT-4.1 mini 37.61 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
18 DeepSeek R1 36.49 R1
deepseek-r1
Imported 2026-05-06
19 Grok Code Fast 1 35.06 GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-06
20 Mistral Large 33.13 Mistral Large
mistralai-mistral-large
Imported 2026-05-06
21 Claude 3.5 Haiku 30.73 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
22 GPT-4o 25.12 GPT-4o
openai-gpt-4o
Imported 2026-05-06
23 Gemini 1.5 Flash 24.87 Imported 2026-05-06
24 Llama-4-Maverick-17B-128E-Instruct 24.47 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
25 Claude 3 Opus 23.18 Imported 2026-05-06
26 Llama 3.1 405B 21.38 Imported 2026-05-06
27 GPT-4 Turbo 18.01 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
28 Llama 3.3 70B 14.44 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
29 gpt-4o-mini-2024-07-18 11.76 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-06
30 gpt-3.5-turbo-1106 3.48 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-06
31 Mixtral-8x7B-v0.1 3.17 Imported 2026-05-06