Natural Language to Mongosh

MongoDB text-to-query benchmark evaluating natural-language generation of mongosh queries with execution, output, normalization, latency, and token metrics.

106rows
xmanerprimary metric
2026-05-06sampled

Metadata

Metrics

XMaNeR, NeXMaNeR, XNeR, CorrectOutputFuzzy, NonEmptyOutput, NormalizedExecutionTimeNonEmpty, ReasonableOutput, SuccessfulExecution, Duration (lower is better), LLM Duration (lower is better), Prompt Tokens (lower is better), Completion Tokens (lower is better), Total Tokens (lower is better)

Latest Results

Rows are parsed from the public Hugging Face experiment-results CSV. Source experiment variants are preserved; XMaNeR is used as the primary score.

Rank Subject XMaNeR Model Match Provenance Sampled
1 claude-37-sonnet (agentic) 0.90 Imported 2026-05-06
2 claude-37-sonnet (promptCompletion / chainOfThought / annotated) 0.89 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
3 claude-37-sonnet (toolCall / default / annotated) 0.88 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
4 claude-37-sonnet (toolCall / chainOfThought / annotated) 0.87 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
5 claude-37-sonnet (toolCall / chainOfThought / annotated) 0.87 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
6 claude-37-sonnet (promptCompletion / chainOfThought / interpreted) 0.87 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
7 gemini-2-flash (agentic) 0.87 Imported 2026-05-06
8 claude-37-sonnet (promptCompletion / default / annotated) 0.86 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
9 claude-37-sonnet (toolCall / default / annotated) 0.86 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
10 gpt-4o (promptCompletion / chainOfThought / annotated) 0.86 GPT-4o
openai-gpt-4o
Imported 2026-05-06
11 claude-35-haiku (agentic) 0.86 Imported 2026-05-06
12 gpt-4o (agentic) 0.86 Imported 2026-05-06
13 gpt-4o (promptCompletion / default / annotated) 0.86 GPT-4o
openai-gpt-4o
Imported 2026-05-06
14 gpt-4o (promptCompletion / lazy / annotated) 0.86 GPT-4o
openai-gpt-4o
Imported 2026-05-06
15 claude-37-sonnet (promptCompletion / lazy / annotated) 0.86 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
16 claude-37-sonnet (promptCompletion / default / interpreted) 0.86 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
17 gemini-2-flash (promptCompletion / default / annotated) 0.86 Imported 2026-05-06
18 gemini-2-flash (promptCompletion / lazy / annotated) 0.85 Imported 2026-05-06
19 o3-mini (agentic) 0.85 Imported 2026-05-06
20 gemini-2-flash (promptCompletion / chainOfThought / annotated) 0.85 Imported 2026-05-06
21 o3-mini (promptCompletion / default / annotated) 0.85 o3-mini
openai-o3-mini
Imported 2026-05-06
22 claude-37-sonnet (promptCompletion / lazy / none) 0.85 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
23 o3-mini (toolCall / chainOfThought / annotated) 0.85 o3-mini
openai-o3-mini
Imported 2026-05-06
24 gemini-2.5-pro-preview-03-25 (promptCompletion / default / annotated) 0.84 Gemini 2.5 Pro Preview 05-06
google-gemini-2.5-pro-preview-05-06
Imported 2026-05-06
25 gpt-4o-mini (agentic) 0.84 Imported 2026-05-06
26 gemini-2-flash (toolCall / default / annotated) 0.84 Imported 2026-05-06
27 o3-mini (promptCompletion / chainOfThought / annotated) 0.84 o3-mini
openai-o3-mini
Imported 2026-05-06
28 claude-37-sonnet (promptCompletion / lazy / interpreted) 0.84 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
29 claude-35-haiku (toolCall / chainOfThought / annotated) 0.84 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
30 claude-35-haiku (toolCall / default / annotated) 0.84 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
31 o3-mini (toolCall / default / annotated) 0.84 o3-mini
openai-o3-mini
Imported 2026-05-06
32 o3-mini (promptCompletion / lazy / annotated) 0.84 o3-mini
openai-o3-mini
Imported 2026-05-06
33 o3-mini (toolCall / chainOfThought / annotated) 0.84 o3-mini
openai-o3-mini
Imported 2026-05-06
34 o3-mini (toolCall / default / annotated) 0.84 o3-mini
openai-o3-mini
Imported 2026-05-06
35 gpt-4o (toolCall / chainOfThought / annotated) 0.83 GPT-4o
openai-gpt-4o
Imported 2026-05-06
36 gemini-2-flash (toolCall / default / annotated) 0.83 Imported 2026-05-06
37 claude-35-haiku (promptCompletion / default / annotated) 0.83 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
38 gemini-2-flash (toolCall / chainOfThought / annotated) 0.83 Imported 2026-05-06
39 gpt-4o-mini (promptCompletion / lazy / annotated) 0.83 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
40 gemini-2-flash (toolCall / chainOfThought / annotated) 0.83 Imported 2026-05-06
41 gpt-4o (toolCall / default / annotated) 0.83 GPT-4o
openai-gpt-4o
Imported 2026-05-06
42 gpt-4o-mini (promptCompletion / chainOfThought / annotated) 0.83 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
43 nova-pro-v1:0 (promptCompletion / default / annotated) 0.83 Imported 2026-05-06
44 nova-pro-v1:0 (promptCompletion / lazy / annotated) 0.82 Imported 2026-05-06
45 llama-3.3-70b (promptCompletion / lazy / annotated) 0.82 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
46 llama-3.3-70b (promptCompletion / default / annotated) 0.82 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
47 gpt-4o (promptCompletion / lazy / none) 0.82 GPT-4o
openai-gpt-4o
Imported 2026-05-06
48 claude-35-haiku (promptCompletion / lazy / annotated) 0.82 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
49 claude-35-haiku (toolCall / chainOfThought / annotated) 0.82 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
50 claude-35-haiku (promptCompletion / lazy / none) 0.82 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
51 o3-mini (promptCompletion / lazy / none) 0.82 o3-mini
openai-o3-mini
Imported 2026-05-06
52 gpt-4o (toolCall / default / annotated) 0.82 GPT-4o
openai-gpt-4o
Imported 2026-05-06
53 claude-35-haiku (toolCall / default / annotated) 0.81 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
54 nova-pro-v1:0 (agentic) 0.81 Imported 2026-05-06
55 gemini-2-flash (promptCompletion / lazy / none) 0.81 Imported 2026-05-06
56 gpt-4o-mini (promptCompletion / chainOfThought / annotated) 0.81 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
57 gpt-4o-mini (promptCompletion / default / annotated) 0.81 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
58 claude-35-haiku (promptCompletion / lazy / interpreted) 0.81 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
59 nova-pro-v1:0 (promptCompletion / lazy / none) 0.81 Imported 2026-05-06
60 nova-pro-v1:0 (promptCompletion / chainOfThought / annotated) 0.80 Imported 2026-05-06
61 o3-mini (promptCompletion / lazy / interpreted) 0.80 o3-mini
openai-o3-mini
Imported 2026-05-06
62 gpt-4o-mini (promptCompletion / lazy / annotated) 0.80 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
63 gpt-4o (promptCompletion / default / interpreted) 0.80 GPT-4o
openai-gpt-4o
Imported 2026-05-06
64 gemini-2-flash (promptCompletion / lazy / interpreted) 0.80 Imported 2026-05-06
65 o3-mini (promptCompletion / chainOfThought / interpreted) 0.80 o3-mini
openai-o3-mini
Imported 2026-05-06
66 o3-mini (promptCompletion / default / interpreted) 0.80 o3-mini
openai-o3-mini
Imported 2026-05-06
67 llama-3.3-70b (promptCompletion / chainOfThought / annotated) 0.80 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
68 gemini-2-flash (promptCompletion / default / interpreted) 0.79 Imported 2026-05-06
69 mistral-large-2 (promptCompletion / lazy / annotated) 0.79 Imported 2026-05-06
70 claude-35-haiku (promptCompletion / chainOfThought / annotated) 0.79 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
71 gpt-4o (toolCall / chainOfThought / annotated) 0.79 GPT-4o
openai-gpt-4o
Imported 2026-05-06
72 gpt-4o-mini (promptCompletion / lazy / none) 0.79 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
73 gpt-4o (promptCompletion / chainOfThought / interpreted) 0.79 GPT-4o
openai-gpt-4o
Imported 2026-05-06
74 gpt-4o-mini (promptCompletion / default / annotated) 0.79 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
75 nova-pro-v1:0 (toolCall / default / annotated) 0.79 Imported 2026-05-06
76 llama-3.3-70b (promptCompletion / lazy / none) 0.79 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
77 claude-35-haiku (promptCompletion / default / interpreted) 0.79 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
78 gpt-4o-mini (toolCall / chainOfThought / annotated) 0.79 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
79 gpt-4o (promptCompletion / lazy / interpreted) 0.78 GPT-4o
openai-gpt-4o
Imported 2026-05-06
80 llama-3.3-70b (promptCompletion / default / interpreted) 0.78 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
81 gpt-4o-mini (promptCompletion / lazy / none) 0.78 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
82 nova-pro-v1:0 (toolCall / default / annotated) 0.78 Imported 2026-05-06
83 gpt-4o-mini (toolCall / default / annotated) 0.78 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
84 gemini-2-flash (promptCompletion / chainOfThought / interpreted) 0.78 Imported 2026-05-06
85 nova-pro-v1:0 (toolCall / chainOfThought / annotated) 0.78 Imported 2026-05-06
86 mistral-large-2 (promptCompletion / default / annotated) 0.77 Imported 2026-05-06
87 gpt-4o-mini (promptCompletion / chainOfThought / interpreted) 0.77 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
88 gpt-4o-mini (promptCompletion / chainOfThought / interpreted) 0.77 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
89 nova-pro-v1:0 (promptCompletion / default / interpreted) 0.77 Imported 2026-05-06
90 gpt-4o-mini (toolCall / default / annotated) 0.77 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
91 mistral-large-2 (promptCompletion / lazy / none) 0.77 Imported 2026-05-06
92 llama-3.3-70b (promptCompletion / chainOfThought / interpreted) 0.76 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
93 gpt-4o-mini (toolCall / chainOfThought / annotated) 0.75 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
94 gpt-4o-mini (promptCompletion / lazy / interpreted) 0.75 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
95 claude-35-haiku (promptCompletion / chainOfThought / interpreted) 0.75 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
96 gpt-4o-mini (promptCompletion / default / interpreted) 0.74 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
97 gpt-4o-mini (promptCompletion / default / interpreted) 0.74 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
98 nova-pro-v1:0 (toolCall / chainOfThought / annotated) 0.73 Imported 2026-05-06
99 llama-3.3-70b (promptCompletion / lazy / interpreted) 0.73 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
100 gpt-4o-mini (promptCompletion / lazy / interpreted) 0.73 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
101 nova-pro-v1:0 (promptCompletion / chainOfThought / interpreted) 0.72 Imported 2026-05-06
102 mistral-large-2 (promptCompletion / default / interpreted) 0.72 Imported 2026-05-06
103 mistral-large-2 (promptCompletion / lazy / interpreted) 0.71 Imported 2026-05-06
104 nova-pro-v1:0 (promptCompletion / lazy / interpreted) 0.70 Imported 2026-05-06
105 mistral-large-2 (promptCompletion / chainOfThought / interpreted) 0.62 Imported 2026-05-06
106 mistral-large-2 (promptCompletion / chainOfThought / annotated) 0.61 Imported 2026-05-06