OmniAI OCR Leaderboard

OCR and data extraction leaderboard comparing traditional OCR providers and multimodal LLM systems on 1,000 pages.

8rows
json_accuracyprimary metric
2026-05-06sampled

Metadata

Metrics

JSON Accuracy, Text Similarity, Page Latency (lower is better), Cost per 1000 Pages (lower is better), Input Tokens, Output Tokens

Latest Results

Rows are parsed from the public Hugging Face dataset overview CSV. Source OCR system display names are preserved.

Rank Subject JSON Accuracy Model Match Provenance Sampled
1 omniai 0.92 Imported 2026-05-06
2 gemini-2.0-flash-001 0.86 Imported 2026-05-06
3 azure-document-intelligence 0.85 Imported 2026-05-06
4 azure-gpt-4o 0.75 Imported 2026-05-06
5 aws-textract 0.74 Imported 2026-05-06
6 claude-3-5-sonnet-20241022 0.69 Imported 2026-05-06
7 google-document-ai 0.68 Imported 2026-05-06
8 unstructured 0.51 Imported 2026-05-06