AfroBench
Comprehensive benchmark evaluating language models across African languages, tasks, and datasets spanning question answering, NLU, NLG, reasoning, and knowledge.
12rows
average_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Average score, Dataset coverage (lower is better), Category qa, Category nlu, Category nlg, Category reasoning, Category knowledge, Task xqa, Task rc, Task ner, Task nli, Task intent, Task topic, Task senti, Task hate, Task pos, Task mt en fr xx, Task adr, Task mt xx en fr, Task summ, Task math, Task arc e, Task mmlu
| Rank | Subject | Average score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o (Aug) | 59.64 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 2 | Gemini 1.5 pro | 58.49 | — | Imported | 2026-05-06 |
| 3 | Gemma2 27b | 47.92 | Gemma 2 27B google-gemma-2-27b-it | Imported | 2026-05-06 |
| 4 | LLaMa3.1 70B | 43.52 | — | Imported | 2026-05-06 |
| 5 | Gemma2 9b | 43.10 | — | Imported | 2026-05-06 |
| 6 | Aya-101 13B | 40.34 | — | Imported | 2026-05-06 |
| 7 | LLaMAX3 8B | 30.14 | — | Imported | 2026-05-06 |
| 8 | LLaMa3.1 8B | 29.53 | — | Imported | 2026-05-06 |
| 9 | Gemma1.1 7b | 29.09 | — | Imported | 2026-05-06 |
| 10 | LLaMa3 8B | 28.83 | — | Imported | 2026-05-06 |
| 11 | LLaMa2 7b | 22.49 | — | Imported | 2026-05-06 |
| 12 | AfroLLaMa 8B | 19.79 | — | Imported | 2026-05-06 |
No matching rows.