MCPMark

MCP-based agent benchmark evaluating language-model agents across real-world tool environments including filesystem, GitHub, Notion, browser automation, and Postgres tasks.

45rows
pass_at_1_avgprimary metric
2026-05-28sampled

Metadata

Metrics

Pass@1, Pass@1 std (lower is better), Pass@4, Pass^4, Avg agent execution time (lower is better), Avg turns (lower is better), Avg input tokens (lower is better), Avg output tokens (lower is better), Avg total tokens (lower is better), Per-run cost (lower is better), Filesystem Pass@1, Github Pass@1, Notion Pass@1, Playwright Pass@1, Postgres Pass@1

Showing 2 latest source slices.

Latest Results

Provider-published Qwen3.7-Max comparison scores. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Pass@1 Model Match Provenance Sampled
1 Qwen3.7 Max 60.8% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
2 GLM-5.1 Thinking 57.5% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
3 DeepSeek V4 Pro Max 57.1% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
4 Claude Opus 4.6 Max 56.7% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
5 Kimi K2.6 Thinking 55.9% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
6 Qwen3.6 Plus 48.2% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
1 gpt-5-2-high 0.57 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
2 gemini-3-pro-high 0.54 Gemini 3
google-gemini-3
Imported 2026-05-06
3 gpt-5-medium 0.53 GPT-5
openai-gpt-5
Imported 2026-05-06
4 gpt-5-high 0.52 GPT-5
openai-gpt-5
Imported 2026-05-06
5 gemini-3-pro-low 0.51 Gemini 3
google-gemini-3
Imported 2026-05-06
6 gpt-5-low 0.47 GPT-5
openai-gpt-5
Imported 2026-05-06
7 claude-opus-4-5-high 0.42 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
8 deepseek-v3-2-thinking 0.37 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
9 claude-sonnet-4-5 0.32 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
10 grok-4 0.32 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
11 gpt-5-mini-high 0.30 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
12 claude-opus-4-1 0.30 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
13 deepseek-v3-2-chat 0.30 Imported 2026-05-06
14 claude-sonnet-4-high 0.28 Imported 2026-05-06
15 claude-sonnet-4 0.28 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-06
16 claude-sonnet-4-low 0.27 Imported 2026-05-06
17 gpt-5-mini-medium 0.27 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
18 o3 0.25 o3
openai-o3
Imported 2026-05-06
19 qwen-3-coder-plus 0.25 Qwen3 Coder Plus
qwen-qwen3-coder-plus
Imported 2026-05-06
20 grok-4-fast 0.24 GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-06
21 kimi-k2-0905 0.22 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-06
22 deepseek-v3-1-terminus-thinking 0.21 Imported 2026-05-06
23 grok-code-fast-1 0.20 GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-06
24 kimi-k2-0711 0.19 Imported 2026-05-06
25 qwen-3-max 0.18 Qwen3 Max
qwen-qwen3-max
Imported 2026-05-06
26 o4-mini 0.17 o4 Mini
openai-o4-mini
Imported 2026-05-06
27 deepseek-chat 0.17 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
28 deepseek-v3-1-terminus 0.17 DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-06
29 gemini-2-5-pro 0.16 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
30 glm-4-5 0.16 GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-06
31 gemini-2-5-flash 0.09 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
32 gpt-5-mini-low 0.08 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
33 gpt-4-1 0.08 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
34 gpt-5-nano-medium 0.06 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
35 gpt-5-nano-high 0.05 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
36 gpt-oss-120b 0.05 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
37 gpt-5-nano-low 0.04 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
38 gpt-4-1-mini 0.04 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
39 gpt-4-1-nano 0 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-06