JEEBench
JEEBench: Measures mathematical reasoning, symbolic problem solving, proof construction, or competition-style problem solving.
11rows
scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Chemistry score, Mathematics score, Physics score, Integer score, Single-correct score, Multi-correct score, Numeric score, Total score
| Rank | Subject | Total score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4+CoT+SC@8 | 0.389 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 2 | GPT-4+CoT | 0.350 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | GPT-4+CoT+Self Critique | 0.339 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 4 | GPT-4 | 0.309 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 5 | GPT-4+ (1-shot) CoT | 0.292 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 6 | GPT-3.5 | 0.177 | — | Imported | 2026-05-27 |
| 7 | PaLM2 | 0.153 | — | Imported | 2026-05-27 |
| 8 | GPT-3 | 0.122 | — | Imported | 2026-05-27 |
| 9 | Random | 0.105 | — | Imported | 2026-05-27 |
| 10 | Falcon7B-Instruct | 0.098 | — | Imported | 2026-05-27 |
| 11 | Alpaca-LoRA | 0.089 | — | Imported | 2026-05-27 |
No matching rows.