JEEBench

JEEBench: Measures mathematical reasoning, symbolic problem solving, proof construction, or competition-style problem solving.

11rows
scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Chemistry score, Mathematics score, Physics score, Integer score, Single-correct score, Multi-correct score, Numeric score, Total score

Latest Results

Rows are transcribed from the public JEEBench paper Table 2. Primary score is the paper's overall Total score.

Rank Subject Total score Model Match Provenance Sampled
1 GPT-4+CoT+SC@8 0.389 GPT-4
openai-gpt-4
Imported 2026-05-27
2 GPT-4+CoT 0.350 GPT-4
openai-gpt-4
Imported 2026-05-27
3 GPT-4+CoT+Self Critique 0.339 GPT-4
openai-gpt-4
Imported 2026-05-27
4 GPT-4 0.309 GPT-4
openai-gpt-4
Imported 2026-05-27
5 GPT-4+ (1-shot) CoT 0.292 GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
6 GPT-3.5 0.177 Imported 2026-05-27
7 PaLM2 0.153 Imported 2026-05-27
8 GPT-3 0.122 Imported 2026-05-27
9 Random 0.105 Imported 2026-05-27
10 Falcon7B-Instruct 0.098 Imported 2026-05-27
11 Alpaca-LoRA 0.089 Imported 2026-05-27