MCP-Universe

Benchmark for LLMs and agents using real-world Model Context Protocol servers across location navigation, repository management, finance, 3D design, browser automation, and web search tasks.

28rows
overall_success_rateprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Success Rate, Location Navigation, Repository Management, Financial Analysis, 3D Designing, Browser Automation, Web Searching, Average Evaluator Score, Average Steps (lower is better)

Latest Results

Rows ranked by highest Overall Success Rate.

Rank Subject Overall Success Rate Model Match Provenance Sampled
1 GPT-5-High 44.16 GPT-5
openai-gpt-5
Imported 2026-05-06
2 GPT-5-Medium 43.72 GPT-5
openai-gpt-5
Imported 2026-05-06
3 Grok-4 33.33 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
4 Claude-4.0-Sonnet-Thinking 30.30 Imported 2026-05-06
5 Claude-4.1-Opus 29.44 Imported 2026-05-06
6 Claude-4.0-Sonnet 29.44 Imported 2026-05-06
7 Claude-4.0-Opus 28.14 Imported 2026-05-06
8 Grok-4-Fast 27.27 GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-06
9 Grok-Code-Fast-1 26.41 GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-06
10 o3-Medium 26.41 Imported 2026-05-06
11 o4-mini-Medium 25.97 Imported 2026-05-06
12 GLM-4.6 25.97 GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-06
13 GLM-4.5 24.68 GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-06
14 Claude-3.7-Sonnet 24.24 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
15 Qwen3-Coder-480B-A35B-Instruct 22.94 Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-06
16 Gemini-2.5-Pro 22.08 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
17 DeepSeek-V3.1 22.08 DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-06
18 Gemini-2.5-Flash 21.65 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
19 DeepSeek-V3.1-Terminus 21.65 DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-06
20 DeepSeek-V3.2-Exp 19.91 DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-06
21 Kimi-K2-0905 19.91 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-06
22 GLM-4.5-Air 19.48 GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-06
23 Kimi-K2-0711 19.05 Imported 2026-05-06
24 GPT-4.1 18.18 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
25 Qwen3-Max-Preview (Instruct) 18.18 Imported 2026-05-06
26 Qwen3-235B-A22B-Instruct-2507 18.18 Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-06
27 GPT-4o-2024-08-06 15.58 GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-06
28 DeepSeek-V3 14.29 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06