BIG-bench

BIG-bench: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

45rows
scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Mean 0-shot BBL normalized aggregate score, BBL tasks with score

Latest Results

Rows aggregate only the official 24 BIG-bench Lite tasks. Score is the mean 0-shot normalized_aggregate_score across models with at least 20 task-level score files.

Rank Subject Mean 0-shot BBL normalized aggregate score Model Match Provenance Sampled
1 PaLM 535b 8.110951766293224 Imported 2026-05-27
2 PaLM 64b 8.110951766293224 Imported 2026-05-27
3 BIG-G T=0 128b 6.1313369188035844 Imported 2026-05-27
4 BIG-G T=1 128b 4.749815324485326 Imported 2026-05-27
5 GPT GPT-3 200B 4.33420647370985 Imported 2026-05-27
6 PaLM 8b 2.2958612109975043 Imported 2026-05-27
7 BIG-G sparse 8b 1.98668561205419 Imported 2026-05-27
8 GPT GPT-3 13B 1.7159057637127149 Imported 2026-05-27
9 BIG-G sparse 4b 1.214916999070823 Imported 2026-05-27
10 BIG-G sparse 1b 0.9256436847247252 Imported 2026-05-27
11 GPT GPT-3 Large 0.8828082835515451 Imported 2026-05-27
12 BIG-G sparse 2b 0.7736681100837126 Imported 2026-05-27
13 BIG-G T=0 422m 0.6536233298099993 Imported 2026-05-27
14 BIG-G T=0 2b 0.6515524639597468 Imported 2026-05-27
15 GPT GPT-3 Medium 0.5665739313501229 Imported 2026-05-27
16 BIG-G T=1 422m 0.22298572852315526 Imported 2026-05-27
17 BIG-G T=0 8b 0.20896960425384803 Imported 2026-05-27
18 BIG-G T=1 2b 0.11927073936659742 Imported 2026-05-27
19 GPT GPT-3 6B -0.1152289129777575 Imported 2026-05-27
20 BIG-G T=0 27b -0.5892522010368552 Imported 2026-05-27
21 BIG-G T=1 8b -0.6611304959661063 Imported 2026-05-27
22 GPT GPT-3 3B -0.9680061097547866 Imported 2026-05-27
23 BIG-G T=1 27b -1.014172727247787 Imported 2026-05-27
24 BIG-G T=0 1b -1.1087326803307993 Imported 2026-05-27
25 BIG-G sparse 244m -1.3591392781278604 Imported 2026-05-27
26 BIG-G sparse 422m -1.4120758399992488 Imported 2026-05-27
27 GPT GPT-3 XL -1.4656781764406794 Imported 2026-05-27
28 BIG-G T=1 1b -1.59626485053329 Imported 2026-05-27
29 BIG-G T=0 4b -1.6861500677442491 Imported 2026-05-27
30 BIG-G T=1 4b -2.1612443984786878 Imported 2026-05-27
31 BIG-G sparse 125m -2.258837805630525 Imported 2026-05-27
32 BIG-G T=0 244m -2.7880594662835847 Imported 2026-05-27
33 BIG-G T=1 244m -3.211352945982389 Imported 2026-05-27
34 BIG-G T=0 125m -3.3462197283451425 Imported 2026-05-27
35 BIG-G sparse 53m -3.4431018447440978 Imported 2026-05-27
36 BIG-G T=0 53m -3.493421065167617 Imported 2026-05-27
37 BIG-G T=1 125m -3.8687270017239115 Imported 2026-05-27
38 BIG-G T=1 53m -4.0344746445074575 Imported 2026-05-27
39 BIG-G sparse 16m -4.117850721925652 Imported 2026-05-27
40 GPT GPT-3 Small -4.3076924219420745 Imported 2026-05-27
41 BIG-G T=0 2m -5.196798030379155 Imported 2026-05-27
42 BIG-G T=1 2m -5.198660560708598 Imported 2026-05-27
43 BIG-G sparse 2m -5.774974657838226 Imported 2026-05-27
44 BIG-G T=0 16m -5.991345985095776 Imported 2026-05-27
45 BIG-G T=1 16m -6.2718949106801345 Imported 2026-05-27