MCP Atlas

Evaluating real-world tool use through the Model Context Protocol (MCP)

38rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Showing 5 latest source slices.

Latest Results

Provider-published system-card benchmark scores parsed from Anthropic's Claude Opus 4.8 capability evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.8 82.2% Claude Opus 4.8
anthropic-claude-opus-4.8
Self-reported 2026-05-28
2 Claude Opus 4.7 79.1% Claude Opus 4.7
anthropic-claude-opus-4.7
Self-reported 2026-05-28
3 Gemini 3.1 Pro Preview 78.2% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Self-reported 2026-05-28
4 GPT-5.5 75.3% GPT-5.5
openai-gpt-5.5
Self-reported 2026-05-28
1 Qwen3.7 Max 76.4% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
2 Claude Opus 4.6 Max 75.8% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
3 Qwen3.6 Plus 74.1% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
4 DeepSeek V4 Pro Max 73.6% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
5 GLM-5.1 Thinking 71.8% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
6 Kimi K2.6 Thinking 66.6% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
1 Muse Spark 82.20 Imported 2026-05-06
1 claude-opus-4-7 (max) 79.10 Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-06
1 gemini-3.1-pro-preview (high) 78.20 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
2 claude-opus-4-6 (max) 76.80 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
2 glm-5p1 75.60 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-06
2 gpt-5.5 (xhigh) 75.30 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
5 gpt-5.4 (xhigh) 70.60 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
5 gemini-3-pro-preview 70.30 Gemini 3
google-gemini-3
Imported 2026-05-06
6 claude-opus-4-5 (high) 69.80 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
7 claude-sonnet-4-6 69.50 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-06
7 gpt-5.2 (xhigh) 67.60 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
9 kimi-k2p5 64.40 Imported 2026-05-06
11 gemini-3-flash-preview 62 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06
12 claude-sonnet-4-5 (thinking) 59.50 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
13 glm-4p7 58.10 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
13 gemini-3.1-flash-lite (high) 57.10 Imported 2026-05-06
13 gpt-5.4-mini (xhigh) 56.70 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-06
18 gpt-5.1 (high) 50.10 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
18 o3-pro 44.50 o3 Pro
openai-o3-pro
Imported 2026-05-06
19 claude-haiku-4-5 40.20 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
1 Claude Opus 4.7 79.1% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
2 Gemini 3.1 Pro Preview 78.2% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23
3 GPT-5.5 75.3% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
4 GPT-5.4 70.6% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
1 Claude Opus 4.7 77.3% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-16
2 Claude Opus 4.6 75.8% Claude Opus 4.6
anthropic-claude-opus-4.6
Launch post 2026-04-16
3 Gemini 3.1 Pro Preview 73.9% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-16
4 GPT-5.4 68.1% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-16