CRMArena
CRMArena evaluates LLM agents on realistic customer relationship management tasks in a simulated Salesforce CRM organization across service agent, analyst, and manager personas.
24rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, New Case Routing, Handle Time Understanding, Transfer Count Understanding, Name Entity Disambiguation, Policy Violation Identification, Knowledge Question Answering, Top Issue Identification, Monthly Trend Analysis, Best Region Identification
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | o1 (Function Calling) | 64.30 | — | Imported | 2026-05-06 |
| 2 | o1 (ReAct) | 57.70 | — | Imported | 2026-05-06 |
| 3 | gpt-4o (Function Calling) | 54.40 | — | Imported | 2026-05-06 |
| 4 | llama3.1-405b (Function Calling) | 51.30 | — | Imported | 2026-05-06 |
| 5 | claude-3.5-sonnet (Function Calling) | 41.80 | — | Imported | 2026-05-06 |
| 6 | llama3.1-70b (Function Calling) | 41.10 | — | Imported | 2026-05-06 |
| 7 | gpt-4o (ReAct) | 38.20 | — | Imported | 2026-05-06 |
| 8 | claude-3.5-sonnet (Act) | 37.40 | — | Imported | 2026-05-06 |
| 9 | deepseek-r1 (ReAct) | 35.10 | — | Imported | 2026-05-06 |
| 10 | claude-3.5-sonnet (ReAct) | 34.30 | — | Imported | 2026-05-06 |
| 11 | llama3.1-405b (ReAct) | 33.80 | — | Imported | 2026-05-06 |
| 12 | gpt-4o (Act) | 29.40 | — | Imported | 2026-05-06 |
| 13 | gpt-4o-mini (ReAct) | 28.30 | — | Imported | 2026-05-06 |
| 14 | llama3.1-70b (ReAct) | 27.80 | — | Imported | 2026-05-06 |
| 15 | llama3.1-405b (Act) | 22.20 | — | Imported | 2026-05-06 |
| 16 | gpt-4o-mini (Function Calling) | 19.50 | — | Imported | 2026-05-06 |
| 17 | llama3.1-70b (Act) | 18.60 | — | Imported | 2026-05-06 |
| 18 | claude-3-sonnet (ReAct) | 17.30 | — | Imported | 2026-05-06 |
| 19 | gpt-4o-mini (Act) | 16.70 | — | Imported | 2026-05-06 |
| 20 | claude-3-sonnet (Act) | 16.60 | — | Imported | 2026-05-06 |
| 21 | claude-3-sonnet (Function Calling) | 15.10 | — | Imported | 2026-05-06 |
| 22 | deepseek-r1 (Function Calling) | 9 | — | Imported | 2026-05-06 |
| 23 | llama3.1-8b (ReAct) | 3.10 | — | Imported | 2026-05-06 |
| 24 | llama3.1-8b (Function Calling) | 0 | — | Imported | 2026-05-06 |
No matching rows.