WebMainBench

Human-annotated web main-content extraction benchmark evaluating extractors and model-backed pipelines on full-page ROUGE-N F1 plus fine-grained text, code, formula, and table metrics.

14rows
rouge_n_f1_allprimary metric
2026-05-06sampled

Metadata

Metrics

ROUGE-N F1 All, ROUGE-N F1 Simple, ROUGE-N F1 Mid, ROUGE-N F1 Hard, Overall, Text Edit, Code Edit, Formula Edit, Table Edit, Table TEDS

Latest Results

Rows are parsed from static Markdown leaderboard tables in the public dataset README. Full-dataset ROUGE-N F1 and 545-sample fine-grained tracks are preserved as separate source rows.

Rank Subject ROUGE-N F1 All Model Match Provenance Sampled
1 DeepSeek-V3.2 0.91 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
2 GPT-5 0.90 GPT-5
openai-gpt-5
Imported 2026-05-06
3 Gemini-2.5-Pro 0.90 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
4 Dripper_fallback 0.89 Imported 2026-05-06
5 Dripper (0.6B) 0.88 Imported 2026-05-06
6 mineru-html 0.83 Imported 2026-05-06
7 magic-html 0.71 Imported 2026-05-06
8 Readability 0.65 Imported 2026-05-06
9 Trafilatura 0.64 Imported 2026-05-06
10 Resiliparse 0.63 Imported 2026-05-06
11 magic-html 0.51 Imported 2026-05-06
12 trafilatura (md) 0.39 Imported 2026-05-06
13 resiliparse 0.30 Imported 2026-05-06
14 trafilatura (txt) 0.27 Imported 2026-05-06