MCP Atlas
Evaluating real-world tool use through the Model Context Protocol (MCP)
38rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
Showing 5 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 82.2% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.7 | 79.1% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 3 | Gemini 3.1 Pro Preview | 78.2% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Self-reported | 2026-05-28 |
| 4 | GPT-5.5 | 75.3% | GPT-5.5 openai-gpt-5.5 | Self-reported | 2026-05-28 |
| 1 | Qwen3.7 Max | 76.4% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.6 Max | 75.8% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 3 | Qwen3.6 Plus | 74.1% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 4 | DeepSeek V4 Pro Max | 73.6% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 5 | GLM-5.1 Thinking | 71.8% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 6 | Kimi K2.6 Thinking | 66.6% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 1 | Muse Spark | 82.20 | — | Imported | 2026-05-06 |
| 1 | claude-opus-4-7 (max) | 79.10 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 1 | gemini-3.1-pro-preview (high) | 78.20 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 2 | claude-opus-4-6 (max) | 76.80 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 2 | glm-5p1 | 75.60 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 2 | gpt-5.5 (xhigh) | 75.30 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 5 | gpt-5.4 (xhigh) | 70.60 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 5 | gemini-3-pro-preview | 70.30 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 6 | claude-opus-4-5 (high) | 69.80 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 7 | claude-sonnet-4-6 | 69.50 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 7 | gpt-5.2 (xhigh) | 67.60 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 9 | kimi-k2p5 | 64.40 | — | Imported | 2026-05-06 |
| 11 | gemini-3-flash-preview | 62 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 12 | claude-sonnet-4-5 (thinking) | 59.50 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 13 | glm-4p7 | 58.10 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 13 | gemini-3.1-flash-lite (high) | 57.10 | — | Imported | 2026-05-06 |
| 13 | gpt-5.4-mini (xhigh) | 56.70 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 18 | gpt-5.1 (high) | 50.10 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 18 | o3-pro | 44.50 | o3 Pro openai-o3-pro | Imported | 2026-05-06 |
| 19 | claude-haiku-4-5 | 40.20 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 1 | Claude Opus 4.7 | 79.1% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-23 |
| 2 | Gemini 3.1 Pro Preview | 78.2% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
| 3 | GPT-5.5 | 75.3% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 4 | GPT-5.4 | 70.6% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
| 1 | Claude Opus 4.7 | 77.3% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-16 |
| 2 | Claude Opus 4.6 | 75.8% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Launch post | 2026-04-16 |
| 3 | Gemini 3.1 Pro Preview | 73.9% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-16 |
| 4 | GPT-5.4 | 68.1% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-16 |
No matching rows.