Natural Language to Mongosh
MongoDB text-to-query benchmark evaluating natural-language generation of mongosh queries with execution, output, normalization, latency, and token metrics.
106rows
xmanerprimary metric
2026-05-06sampled
Metadata
Metrics
XMaNeR, NeXMaNeR, XNeR, CorrectOutputFuzzy, NonEmptyOutput, NormalizedExecutionTimeNonEmpty, ReasonableOutput, SuccessfulExecution, Duration (lower is better), LLM Duration (lower is better), Prompt Tokens (lower is better), Completion Tokens (lower is better), Total Tokens (lower is better)
| Rank | Subject | XMaNeR | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude-37-sonnet (agentic) | 0.90 | — | Imported | 2026-05-06 |
| 2 | claude-37-sonnet (promptCompletion / chainOfThought / annotated) | 0.89 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 3 | claude-37-sonnet (toolCall / default / annotated) | 0.88 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 4 | claude-37-sonnet (toolCall / chainOfThought / annotated) | 0.87 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 5 | claude-37-sonnet (toolCall / chainOfThought / annotated) | 0.87 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 6 | claude-37-sonnet (promptCompletion / chainOfThought / interpreted) | 0.87 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 7 | gemini-2-flash (agentic) | 0.87 | — | Imported | 2026-05-06 |
| 8 | claude-37-sonnet (promptCompletion / default / annotated) | 0.86 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 9 | claude-37-sonnet (toolCall / default / annotated) | 0.86 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 10 | gpt-4o (promptCompletion / chainOfThought / annotated) | 0.86 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 11 | claude-35-haiku (agentic) | 0.86 | — | Imported | 2026-05-06 |
| 12 | gpt-4o (agentic) | 0.86 | — | Imported | 2026-05-06 |
| 13 | gpt-4o (promptCompletion / default / annotated) | 0.86 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 14 | gpt-4o (promptCompletion / lazy / annotated) | 0.86 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 15 | claude-37-sonnet (promptCompletion / lazy / annotated) | 0.86 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 16 | claude-37-sonnet (promptCompletion / default / interpreted) | 0.86 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 17 | gemini-2-flash (promptCompletion / default / annotated) | 0.86 | — | Imported | 2026-05-06 |
| 18 | gemini-2-flash (promptCompletion / lazy / annotated) | 0.85 | — | Imported | 2026-05-06 |
| 19 | o3-mini (agentic) | 0.85 | — | Imported | 2026-05-06 |
| 20 | gemini-2-flash (promptCompletion / chainOfThought / annotated) | 0.85 | — | Imported | 2026-05-06 |
| 21 | o3-mini (promptCompletion / default / annotated) | 0.85 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 22 | claude-37-sonnet (promptCompletion / lazy / none) | 0.85 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 23 | o3-mini (toolCall / chainOfThought / annotated) | 0.85 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 24 | gemini-2.5-pro-preview-03-25 (promptCompletion / default / annotated) | 0.84 | Gemini 2.5 Pro Preview 05-06 google-gemini-2.5-pro-preview-05-06 | Imported | 2026-05-06 |
| 25 | gpt-4o-mini (agentic) | 0.84 | — | Imported | 2026-05-06 |
| 26 | gemini-2-flash (toolCall / default / annotated) | 0.84 | — | Imported | 2026-05-06 |
| 27 | o3-mini (promptCompletion / chainOfThought / annotated) | 0.84 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 28 | claude-37-sonnet (promptCompletion / lazy / interpreted) | 0.84 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 29 | claude-35-haiku (toolCall / chainOfThought / annotated) | 0.84 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 30 | claude-35-haiku (toolCall / default / annotated) | 0.84 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 31 | o3-mini (toolCall / default / annotated) | 0.84 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 32 | o3-mini (promptCompletion / lazy / annotated) | 0.84 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 33 | o3-mini (toolCall / chainOfThought / annotated) | 0.84 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 34 | o3-mini (toolCall / default / annotated) | 0.84 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 35 | gpt-4o (toolCall / chainOfThought / annotated) | 0.83 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 36 | gemini-2-flash (toolCall / default / annotated) | 0.83 | — | Imported | 2026-05-06 |
| 37 | claude-35-haiku (promptCompletion / default / annotated) | 0.83 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 38 | gemini-2-flash (toolCall / chainOfThought / annotated) | 0.83 | — | Imported | 2026-05-06 |
| 39 | gpt-4o-mini (promptCompletion / lazy / annotated) | 0.83 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 40 | gemini-2-flash (toolCall / chainOfThought / annotated) | 0.83 | — | Imported | 2026-05-06 |
| 41 | gpt-4o (toolCall / default / annotated) | 0.83 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 42 | gpt-4o-mini (promptCompletion / chainOfThought / annotated) | 0.83 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 43 | nova-pro-v1:0 (promptCompletion / default / annotated) | 0.83 | — | Imported | 2026-05-06 |
| 44 | nova-pro-v1:0 (promptCompletion / lazy / annotated) | 0.82 | — | Imported | 2026-05-06 |
| 45 | llama-3.3-70b (promptCompletion / lazy / annotated) | 0.82 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 46 | llama-3.3-70b (promptCompletion / default / annotated) | 0.82 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 47 | gpt-4o (promptCompletion / lazy / none) | 0.82 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 48 | claude-35-haiku (promptCompletion / lazy / annotated) | 0.82 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 49 | claude-35-haiku (toolCall / chainOfThought / annotated) | 0.82 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 50 | claude-35-haiku (promptCompletion / lazy / none) | 0.82 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 51 | o3-mini (promptCompletion / lazy / none) | 0.82 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 52 | gpt-4o (toolCall / default / annotated) | 0.82 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 53 | claude-35-haiku (toolCall / default / annotated) | 0.81 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 54 | nova-pro-v1:0 (agentic) | 0.81 | — | Imported | 2026-05-06 |
| 55 | gemini-2-flash (promptCompletion / lazy / none) | 0.81 | — | Imported | 2026-05-06 |
| 56 | gpt-4o-mini (promptCompletion / chainOfThought / annotated) | 0.81 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 57 | gpt-4o-mini (promptCompletion / default / annotated) | 0.81 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 58 | claude-35-haiku (promptCompletion / lazy / interpreted) | 0.81 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 59 | nova-pro-v1:0 (promptCompletion / lazy / none) | 0.81 | — | Imported | 2026-05-06 |
| 60 | nova-pro-v1:0 (promptCompletion / chainOfThought / annotated) | 0.80 | — | Imported | 2026-05-06 |
| 61 | o3-mini (promptCompletion / lazy / interpreted) | 0.80 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 62 | gpt-4o-mini (promptCompletion / lazy / annotated) | 0.80 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 63 | gpt-4o (promptCompletion / default / interpreted) | 0.80 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 64 | gemini-2-flash (promptCompletion / lazy / interpreted) | 0.80 | — | Imported | 2026-05-06 |
| 65 | o3-mini (promptCompletion / chainOfThought / interpreted) | 0.80 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 66 | o3-mini (promptCompletion / default / interpreted) | 0.80 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 67 | llama-3.3-70b (promptCompletion / chainOfThought / annotated) | 0.80 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 68 | gemini-2-flash (promptCompletion / default / interpreted) | 0.79 | — | Imported | 2026-05-06 |
| 69 | mistral-large-2 (promptCompletion / lazy / annotated) | 0.79 | — | Imported | 2026-05-06 |
| 70 | claude-35-haiku (promptCompletion / chainOfThought / annotated) | 0.79 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 71 | gpt-4o (toolCall / chainOfThought / annotated) | 0.79 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 72 | gpt-4o-mini (promptCompletion / lazy / none) | 0.79 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 73 | gpt-4o (promptCompletion / chainOfThought / interpreted) | 0.79 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 74 | gpt-4o-mini (promptCompletion / default / annotated) | 0.79 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 75 | nova-pro-v1:0 (toolCall / default / annotated) | 0.79 | — | Imported | 2026-05-06 |
| 76 | llama-3.3-70b (promptCompletion / lazy / none) | 0.79 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 77 | claude-35-haiku (promptCompletion / default / interpreted) | 0.79 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 78 | gpt-4o-mini (toolCall / chainOfThought / annotated) | 0.79 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 79 | gpt-4o (promptCompletion / lazy / interpreted) | 0.78 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 80 | llama-3.3-70b (promptCompletion / default / interpreted) | 0.78 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 81 | gpt-4o-mini (promptCompletion / lazy / none) | 0.78 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 82 | nova-pro-v1:0 (toolCall / default / annotated) | 0.78 | — | Imported | 2026-05-06 |
| 83 | gpt-4o-mini (toolCall / default / annotated) | 0.78 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 84 | gemini-2-flash (promptCompletion / chainOfThought / interpreted) | 0.78 | — | Imported | 2026-05-06 |
| 85 | nova-pro-v1:0 (toolCall / chainOfThought / annotated) | 0.78 | — | Imported | 2026-05-06 |
| 86 | mistral-large-2 (promptCompletion / default / annotated) | 0.77 | — | Imported | 2026-05-06 |
| 87 | gpt-4o-mini (promptCompletion / chainOfThought / interpreted) | 0.77 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 88 | gpt-4o-mini (promptCompletion / chainOfThought / interpreted) | 0.77 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 89 | nova-pro-v1:0 (promptCompletion / default / interpreted) | 0.77 | — | Imported | 2026-05-06 |
| 90 | gpt-4o-mini (toolCall / default / annotated) | 0.77 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 91 | mistral-large-2 (promptCompletion / lazy / none) | 0.77 | — | Imported | 2026-05-06 |
| 92 | llama-3.3-70b (promptCompletion / chainOfThought / interpreted) | 0.76 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 93 | gpt-4o-mini (toolCall / chainOfThought / annotated) | 0.75 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 94 | gpt-4o-mini (promptCompletion / lazy / interpreted) | 0.75 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 95 | claude-35-haiku (promptCompletion / chainOfThought / interpreted) | 0.75 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 96 | gpt-4o-mini (promptCompletion / default / interpreted) | 0.74 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 97 | gpt-4o-mini (promptCompletion / default / interpreted) | 0.74 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 98 | nova-pro-v1:0 (toolCall / chainOfThought / annotated) | 0.73 | — | Imported | 2026-05-06 |
| 99 | llama-3.3-70b (promptCompletion / lazy / interpreted) | 0.73 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 100 | gpt-4o-mini (promptCompletion / lazy / interpreted) | 0.73 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 101 | nova-pro-v1:0 (promptCompletion / chainOfThought / interpreted) | 0.72 | — | Imported | 2026-05-06 |
| 102 | mistral-large-2 (promptCompletion / default / interpreted) | 0.72 | — | Imported | 2026-05-06 |
| 103 | mistral-large-2 (promptCompletion / lazy / interpreted) | 0.71 | — | Imported | 2026-05-06 |
| 104 | nova-pro-v1:0 (promptCompletion / lazy / interpreted) | 0.70 | — | Imported | 2026-05-06 |
| 105 | mistral-large-2 (promptCompletion / chainOfThought / interpreted) | 0.62 | — | Imported | 2026-05-06 |
| 106 | mistral-large-2 (promptCompletion / chainOfThought / annotated) | 0.61 | — | Imported | 2026-05-06 |
No matching rows.