AfroBench-Lite

Cost-efficient AfroBench subset for African-language model evaluation, covering representative task and dataset scores with expanded recent model coverage.

24rows
average_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Average score, Dataset coverage (lower is better), Task nli, Task intent, Task mt en fr xx, Task mmlu, Task math, Task topic, Task rc

Latest Results

Rows are parsed from the public AfroBench leaderboard JSON. Score is a transparent macro-average over available dataset-level source scores; per-dataset values are preserved in metadata.

Rank Subject Average score Model Match Provenance Sampled
1 GPT-5 (Aug) 77.74 GPT-5
openai-gpt-5
Imported 2026-05-06
2 Gemini 3 Pro 76.01 Gemini 3
google-gemini-3
Imported 2026-05-06
3 Gemini-2.5 Pro 74.53 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
4 Claude 4.5 Sonnet 70.60 Imported 2026-05-06
5 Claude 4.0 Sonnet 68.69 Imported 2026-05-06
6 Gemini-2.5 Flash 66.71 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
7 Gemini-2.0 Flash 66.43 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-06
8 GPT-4o (Aug) 65.80 GPT-4o
openai-gpt-4o
Imported 2026-05-06
9 GPT-4.1 (April) 65.67 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
10 Gemini 1.5 pro 62.79 Imported 2026-05-06
11 Claude 3.7 Sonnet 60.26 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
12 Gemma 3 27b 51.01 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-06
13 LLaMa 4 405B 49.74 Imported 2026-05-06
14 Gemma2 27b 44.87 Gemma 2 27B
google-gemma-2-27b-it
Imported 2026-05-06
15 Aya-101 13B 43.39 Imported 2026-05-06
16 Gemma2 9b 40.79 Imported 2026-05-06
17 LLaMa3.1 70B 40 Imported 2026-05-06
18 LLaMAX3 8B 28.81 Imported 2026-05-06
19 LLaMa3.1 8B 25.83 Imported 2026-05-06
20 Gemma1.1 7b 23.69 Imported 2026-05-06
21 LLaMa3 8B 21.44 Imported 2026-05-06
22 Lugha-Llama 8B 21 Imported 2026-05-06
23 LLaMa2 7b 16.23 Imported 2026-05-06
24 AfroLLaMa 8B 16.09 Imported 2026-05-06