TextClass Benchmark
TextClass Benchmark evaluates LLMs and transformers for social-science text classification across multiple domains and languages, reporting domain-specific Elo leaderboards and a weighted Meta-Elo aggregate.
112rows
meta_eloprimary metric
2026-05-06sampled
Metadata
Metrics
Meta-Elo, Weighted F1, Cycles
| Rank | Subject | Meta-Elo | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o (2024-05-13) | 1825.22 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 2 | GPT-4o (2024-11-20) | 1804.54 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 3 | GPT-4o (2024-08-06) | 1801.72 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 4 | Gemini 1.5 Pro | 1782.70 | — | Imported | 2026-05-06 |
| 5 | GPT-4 Turbo (2024-04-09) | 1781.47 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 6 | o1 (2024-12-17) | 1768.81 | o1 openai-o1 | Imported | 2026-05-06 |
| 7 | GPT-4.5-preview (2025-02-27) | 1767.86 | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-06 |
| 8 | Grok 2 (1212) | 1758.36 | — | Imported | 2026-05-06 |
| 9 | Llama 3.1 (405B) | 1755.81 | — | Imported | 2026-05-06 |
| 10 | GPT-4 (0613) | 1747.59 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 11 | Llama 3.3 (70B-L) | 1746.41 | — | Imported | 2026-05-06 |
| 12 | Grok Beta | 1741.94 | — | Imported | 2026-05-06 |
| 13 | DeepSeek-V3 (671B) | 1732.54 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 14 | Llama 3.1 (70B-L) | 1722.88 | — | Imported | 2026-05-06 |
| 15 | Mistral Large (2411) | 1720.36 | Mistral Large mistralai-mistral-large | Imported | 2026-05-06 |
| 16 | DeepSeek-R1 (671B) | 1718.73 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 17 | Gemini 2.0 Flash | 1701.95 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-06 |
| 18 | Pixtral Large (2411) | 1697.33 | — | Imported | 2026-05-06 |
| 19 | Gemini 2.0 Flash-Lite (02-05) | 1687.68 | Gemini 2.0 Flash Lite google-gemini-2.0-flash-lite-001 | Imported | 2026-05-06 |
| 20 | o3-mini (2025-01-31) | 1684.99 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 21 | Gemini 2.0 Flash Exp. | 1682.46 | — | Imported | 2026-05-06 |
| 22 | OpenThinker (32B-L) | 1678.63 | — | Imported | 2026-05-06 |
| 23 | Athene-V2 (72B-L) | 1678.14 | — | Imported | 2026-05-06 |
| 24 | Qwen 2.5 (32B-L) | 1676.19 | — | Imported | 2026-05-06 |
| 25 | GPT-4o mini (2024-07-18) | 1674.64 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 26 | Nemotron (70B-L) | 1670.56 | — | Imported | 2026-05-06 |
| 27 | Gemini 1.5 Flash | 1668.98 | — | Imported | 2026-05-06 |
| 28 | Gemma 3 (27B-L) | 1665.66 | — | Imported | 2026-05-06 |
| 29 | Qwen 2.5 (72B-L) | 1659.57 | — | Imported | 2026-05-06 |
| 30 | Gemma 3 (12B-L) | 1646.63 | — | Imported | 2026-05-06 |
| 31 | o1-mini (2024-09-12) | 1626.65 | — | Imported | 2026-05-06 |
| 32 | o3 (2025-04-16) | 1625.36 | o3 openai-o3 | Imported | 2026-05-06 |
| 33 | o1-preview (2024-09-12) | 1622.24 | o1-preview openai-o1-preview | Imported | 2026-05-06 |
| 34 | Mistral Saba | 1620.99 | Mistral: Saba mistralai-mistral-saba | Imported | 2026-05-06 |
| 35 | GLM-4 (9B-L) | 1616.51 | — | Imported | 2026-05-06 |
| 36 | Phi-4 (14B-L) | 1615.70 | Phi 4 microsoft-phi-4 | Imported | 2026-05-06 |
| 37 | Gemini 1.5 Flash (8B) | 1611.69 | — | Imported | 2026-05-06 |
| 38 | Gemma 2 (27B-L) | 1610.06 | — | Imported | 2026-05-06 |
| 39 | QwQ (32B-L) | 1598.03 | — | Imported | 2026-05-06 |
| 40 | Sailor2 (20B-L) | 1595.95 | — | Imported | 2026-05-06 |
| 41 | Hermes 3 (70B-L) | 1593.23 | — | Imported | 2026-05-06 |
| 42 | DeepSeek-R1 D-Qwen (14B-L) | 1588.18 | — | Imported | 2026-05-06 |
| 43 | Qwen 2.5 (14B-L) | 1570.72 | — | Imported | 2026-05-06 |
| 44 | Tülu3 (70B-L) | 1569.12 | — | Imported | 2026-05-06 |
| 45 | Open Mixtral 8x22B | 1566.73 | — | Imported | 2026-05-06 |
| 46 | Llama 3.1 (8B-L) | 1561.25 | — | Imported | 2026-05-06 |
| 47 | GPT-3.5 Turbo (0125) | 1560.51 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 48 | DeepSeek-R1 D-Llama (8B-L) | 1560.36 | — | Imported | 2026-05-06 |
| 49 | Gemma 2 (9B-L) | 1559.14 | — | Imported | 2026-05-06 |
| 50 | OpenThinker (7B-L) | 1552.76 | — | Imported | 2026-05-06 |
| 51 | Notus (7B-L) | 1549.73 | — | Imported | 2026-05-06 |
| 52 | GPT-4.1 mini (2025-04-14) | 1547.62 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-06 |
| 53 | Grok 3 Mini Beta | 1546.47 | Grok 3 Mini Beta x-ai-grok-3-mini-beta | Imported | 2026-05-06 |
| 54 | Grok 3 Beta | 1545.77 | Grok 3 Beta x-ai-grok-3-beta | Imported | 2026-05-06 |
| 55 | Grok 3 Fast Beta | 1543.92 | — | Imported | 2026-05-06 |
| 56 | Command R7B Arabic (7B-L) | 1540.88 | — | Imported | 2026-05-06 |
| 57 | Grok 3 Mini Fast Beta | 1540.40 | — | Imported | 2026-05-06 |
| 58 | o4-mini (2025-04-16) | 1538.33 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 59 | Exaone 3.5 (32B-L) | 1535.44 | — | Imported | 2026-05-06 |
| 60 | Mistral Small (22B-L) | 1533.44 | — | Imported | 2026-05-06 |
| 61 | GPT-4.1 nano (2025-04-14) | 1533.06 | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-06 |
| 62 | Falcon3 (10B-L) | 1532.01 | — | Imported | 2026-05-06 |
| 63 | GPT-4.1 (2025-04-14) | 1520.39 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 64 | Gemini 2.5 Pro (03-25) | 1517.98 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 65 | Mistral (7B-L) | 1511.10 | — | Imported | 2026-05-06 |
| 66 | Gemini 2.0 Flash-Lite (001) | 1508.34 | Gemini 2.0 Flash Lite google-gemini-2.0-flash-lite-001 | Imported | 2026-05-06 |
| 67 | OLMo 2 (13B-L) | 1501.88 | — | Imported | 2026-05-06 |
| 68 | OLMo 2 (7B-L) | 1501.59 | — | Imported | 2026-05-06 |
| 69 | Claude 3.7 Sonnet (20250219) | 1500.76 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 70 | Llama 4 Scout (107B) | 1500.45 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 71 | Pixtral-12B (2409) | 1490.38 | — | Imported | 2026-05-06 |
| 72 | Nous Hermes 2 (11B-L) | 1488.67 | — | Imported | 2026-05-06 |
| 73 | Yi 1.5 (34B-L) | 1485.99 | — | Imported | 2026-05-06 |
| 74 | Mistral Small 3.1 | 1484.82 | — | Imported | 2026-05-06 |
| 75 | Qwen 2.5 (7B-L) | 1477.35 | — | Imported | 2026-05-06 |
| 76 | Phi-4-mini (3.8B-L) | 1477.03 | — | Imported | 2026-05-06 |
| 77 | Llama 4 Maverick (400B) | 1473.88 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 78 | Yi Large | 1473.21 | — | Imported | 2026-05-06 |
| 79 | Granite 3.2 (8B-L) | 1446.65 | — | Imported | 2026-05-06 |
| 80 | Aya Expanse (32B-L) | 1445.04 | — | Imported | 2026-05-06 |
| 81 | Marco-o1-CoT (7B-L) | 1443.46 | — | Imported | 2026-05-06 |
| 82 | Aya (35B-L) | 1436.83 | — | Imported | 2026-05-06 |
| 83 | Granite 3.1 (8B-L) | 1429.95 | — | Imported | 2026-05-06 |
| 84 | Gemma 3 (4B-L) | 1428.70 | — | Imported | 2026-05-06 |
| 85 | Aya Expanse (8B-L) | 1425.23 | — | Imported | 2026-05-06 |
| 86 | Mistral NeMo (12B-L) | 1420.94 | Mistral: Mistral Nemo mistralai-mistral-nemo | Imported | 2026-05-06 |
| 87 | Orca 2 (7B-L) | 1415.85 | — | Imported | 2026-05-06 |
| 88 | Nemotron-Mini (4B-L) | 1414.69 | — | Imported | 2026-05-06 |
| 89 | Claude 3.5 Haiku (20241022) | 1413.90 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-06 |
| 90 | Mistral OpenOrca (7B-L) | 1396.97 | — | Imported | 2026-05-06 |
| 91 | Tülu3 (8B-L) | 1396.65 | — | Imported | 2026-05-06 |
| 92 | Hermes 3 (8B-L) | 1386.51 | — | Imported | 2026-05-06 |
| 93 | Yi 1.5 (9B-L) | 1385.39 | — | Imported | 2026-05-06 |
| 94 | Claude 3.5 Sonnet (20241022) | 1384.79 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 95 | Dolphin 3.0 (8B-L) | 1381.14 | — | Imported | 2026-05-06 |
| 96 | Exaone 3.5 (8B-L) | 1371.68 | — | Imported | 2026-05-06 |
| 97 | Ministral-8B (2410) | 1346.45 | — | Imported | 2026-05-06 |
| 98 | Llama 3.2 (3B-L) | 1314.58 | — | Imported | 2026-05-06 |
| 99 | Codestral Mamba (7B) | 1312.34 | — | Imported | 2026-05-06 |
| 100 | Nous Hermes 2 Mixtral (47B-L) | 1281.37 | — | Imported | 2026-05-06 |
| 101 | Solar Pro (22B-L) | 1224.78 | — | Imported | 2026-05-06 |
| 102 | DeepSeek-R1 D-Qwen (7B-L) | 1212.76 | — | Imported | 2026-05-06 |
| 103 | Phi-3 Medium (14B-L) | 1209.37 | — | Imported | 2026-05-06 |
| 104 | Perspective 0.55 | 1180.28 | — | Imported | 2026-05-06 |
| 105 | Perspective 0.60 | 1094.77 | — | Imported | 2026-05-06 |
| 106 | Yi 1.5 (6B-L) | 1086.23 | — | Imported | 2026-05-06 |
| 107 | Granite 3 MoE (3B-L) | 1084.42 | — | Imported | 2026-05-06 |
| 108 | Perspective 0.70 | 1055.31 | — | Imported | 2026-05-06 |
| 109 | DeepSeek-R1 D-Qwen (1.5B-L) | 951.93 | — | Imported | 2026-05-06 |
| 110 | DeepScaleR (1.5B-L) | 892.60 | — | Imported | 2026-05-06 |
| 111 | Perspective 0.80 | 869.91 | — | Imported | 2026-05-06 |
| 112 | Granite 3.1 MoE (3B-L) | 758.13 | — | Imported | 2026-05-06 |
No matching rows.