TaxBench

TaxBench evaluates AI models on real-world tax tasks from Rivet's active tax workflows, spanning tax knowledge and judgment, tax calculations, and agentic data-retrieval question answering.

16rows
mean_pass5primary metric
2026-05-27sampled

Metadata

Metrics

Mean pass^5 (computed), Mean pass@1 (computed), Tax Knowledge pass@1, Tax Knowledge pass^5, Tax Calculations pass@1, Tax Calculations pass^5, Data Retrieval pass@1, Data Retrieval pass^5

Latest Results

Rows manually transcribed from the public TaxBench leaderboard on 2026-05-27. Source total-score columns rendered as 0, so mean_pass5 and mean_pass1 are unweighted BenchmarkList-computed means across the three displayed categories and used only for ordering.

Rank Subject Mean pass^5 (computed) Model Match Provenance Sampled
1 GPT 5.5 Pro 29.27% mean pass^5 GPT-5.5 Pro
openai-gpt-5.5-pro
Imported 2026-05-27
2 GPT 5.4 Pro 27.03% mean pass^5 GPT-5.4 Pro
openai-gpt-5.4-pro
Imported 2026-05-27
3 GPT 5.5 24.43% mean pass^5 GPT-5.5
openai-gpt-5.5
Imported 2026-05-27
4 Claude Opus 4.6 21.37% mean pass^5 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-27
5 Gemini 3.1 Pro 20.10% mean pass^5 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-27
6 Grok 4.1 Fast Reasoning 19.63% mean pass^5 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-27
7 GPT 5.2 Pro 17.77% mean pass^5 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-27
8 Gemini 3.1 Flash 15.93% mean pass^5 Imported 2026-05-27
9 Grok 4.2 Reasoning 15.20% mean pass^5 Imported 2026-05-27
10 Claude Opus 4.7 14.37% mean pass^5 Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-27
11 Grok 4.2 12.23% mean pass^5 Imported 2026-05-27
12 Claude Sonnet 4.6 11.20% mean pass^5 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-27
13 GPT 5.4 9.33% mean pass^5 GPT-5.4
openai-gpt-5.4
Imported 2026-05-27
14 Gemini 2.5 Pro 9.00% mean pass^5 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27
15 Claude Sonnet 4.5 8.03% mean pass^5 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
16 GPT 5.2 4.60% mean pass^5 GPT-5.2
openai-gpt-5.2
Imported 2026-05-27