BIG-bench
BIG-bench: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
45rows
scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Mean 0-shot BBL normalized aggregate score, BBL tasks with score
| Rank | Subject | Mean 0-shot BBL normalized aggregate score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | PaLM 535b | 8.110951766293224 | — | Imported | 2026-05-27 |
| 2 | PaLM 64b | 8.110951766293224 | — | Imported | 2026-05-27 |
| 3 | BIG-G T=0 128b | 6.1313369188035844 | — | Imported | 2026-05-27 |
| 4 | BIG-G T=1 128b | 4.749815324485326 | — | Imported | 2026-05-27 |
| 5 | GPT GPT-3 200B | 4.33420647370985 | — | Imported | 2026-05-27 |
| 6 | PaLM 8b | 2.2958612109975043 | — | Imported | 2026-05-27 |
| 7 | BIG-G sparse 8b | 1.98668561205419 | — | Imported | 2026-05-27 |
| 8 | GPT GPT-3 13B | 1.7159057637127149 | — | Imported | 2026-05-27 |
| 9 | BIG-G sparse 4b | 1.214916999070823 | — | Imported | 2026-05-27 |
| 10 | BIG-G sparse 1b | 0.9256436847247252 | — | Imported | 2026-05-27 |
| 11 | GPT GPT-3 Large | 0.8828082835515451 | — | Imported | 2026-05-27 |
| 12 | BIG-G sparse 2b | 0.7736681100837126 | — | Imported | 2026-05-27 |
| 13 | BIG-G T=0 422m | 0.6536233298099993 | — | Imported | 2026-05-27 |
| 14 | BIG-G T=0 2b | 0.6515524639597468 | — | Imported | 2026-05-27 |
| 15 | GPT GPT-3 Medium | 0.5665739313501229 | — | Imported | 2026-05-27 |
| 16 | BIG-G T=1 422m | 0.22298572852315526 | — | Imported | 2026-05-27 |
| 17 | BIG-G T=0 8b | 0.20896960425384803 | — | Imported | 2026-05-27 |
| 18 | BIG-G T=1 2b | 0.11927073936659742 | — | Imported | 2026-05-27 |
| 19 | GPT GPT-3 6B | -0.1152289129777575 | — | Imported | 2026-05-27 |
| 20 | BIG-G T=0 27b | -0.5892522010368552 | — | Imported | 2026-05-27 |
| 21 | BIG-G T=1 8b | -0.6611304959661063 | — | Imported | 2026-05-27 |
| 22 | GPT GPT-3 3B | -0.9680061097547866 | — | Imported | 2026-05-27 |
| 23 | BIG-G T=1 27b | -1.014172727247787 | — | Imported | 2026-05-27 |
| 24 | BIG-G T=0 1b | -1.1087326803307993 | — | Imported | 2026-05-27 |
| 25 | BIG-G sparse 244m | -1.3591392781278604 | — | Imported | 2026-05-27 |
| 26 | BIG-G sparse 422m | -1.4120758399992488 | — | Imported | 2026-05-27 |
| 27 | GPT GPT-3 XL | -1.4656781764406794 | — | Imported | 2026-05-27 |
| 28 | BIG-G T=1 1b | -1.59626485053329 | — | Imported | 2026-05-27 |
| 29 | BIG-G T=0 4b | -1.6861500677442491 | — | Imported | 2026-05-27 |
| 30 | BIG-G T=1 4b | -2.1612443984786878 | — | Imported | 2026-05-27 |
| 31 | BIG-G sparse 125m | -2.258837805630525 | — | Imported | 2026-05-27 |
| 32 | BIG-G T=0 244m | -2.7880594662835847 | — | Imported | 2026-05-27 |
| 33 | BIG-G T=1 244m | -3.211352945982389 | — | Imported | 2026-05-27 |
| 34 | BIG-G T=0 125m | -3.3462197283451425 | — | Imported | 2026-05-27 |
| 35 | BIG-G sparse 53m | -3.4431018447440978 | — | Imported | 2026-05-27 |
| 36 | BIG-G T=0 53m | -3.493421065167617 | — | Imported | 2026-05-27 |
| 37 | BIG-G T=1 125m | -3.8687270017239115 | — | Imported | 2026-05-27 |
| 38 | BIG-G T=1 53m | -4.0344746445074575 | — | Imported | 2026-05-27 |
| 39 | BIG-G sparse 16m | -4.117850721925652 | — | Imported | 2026-05-27 |
| 40 | GPT GPT-3 Small | -4.3076924219420745 | — | Imported | 2026-05-27 |
| 41 | BIG-G T=0 2m | -5.196798030379155 | — | Imported | 2026-05-27 |
| 42 | BIG-G T=1 2m | -5.198660560708598 | — | Imported | 2026-05-27 |
| 43 | BIG-G sparse 2m | -5.774974657838226 | — | Imported | 2026-05-27 |
| 44 | BIG-G T=0 16m | -5.991345985095776 | — | Imported | 2026-05-27 |
| 45 | BIG-G T=1 16m | -6.2718949106801345 | — | Imported | 2026-05-27 |
No matching rows.