HealthAdminBench

Healthcare administration agent benchmark for prior authorization, appeals, durable medical equipment, payer portals, fax, and EHR-adjacent workflows.

7rows
task_success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Task Success Rate, Score, Max Score, Avg. Steps (lower is better), Avg. Time (lower is better)

Latest Results

Rows are aggregated from task-level public results for the website's default Task Description + Portal Guidance prompt and screenshot observation mode.

Rank Subject Task Success Rate Model Match Provenance Sampled
1 openai-cua 84.94% Imported 2026-05-27
2 anthropic-cua 81.77% Imported 2026-05-27
3 gemini-3.1 73.39% Imported 2026-05-27
4 qwen-3 58.38% Imported 2026-05-27
5 kimi-k2-5 55.98% Imported 2026-05-27
6 claude-opus-4-6 49.24% Imported 2026-05-27
7 gpt-5.4 43.49% Imported 2026-05-27