MedVidBench
Medical and surgical video understanding benchmark for video large language models, covering 6,245 test samples across eight tasks including temporal action localization, spatiotemporal grounding, captioning, next-action prediction, CVS assessment, video summary, region captioning, and surgical skill assessment.
1rows
average_normalized_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Average Normalized Score, CVS Accuracy, Next Action Accuracy, Skill Assessment Accuracy, Spatiotemporal Grounding mIoU, Temporal Action Grounding mIoU@0.3, Temporal Action Grounding mIoU@0.5, Dense Video Captioning F1, Dense Video Captioning LLM Judge, Video Summary LLM Judge, Region Caption LLM Judge
| Rank | Subject | Average Normalized Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | uAI-NEXUS-MedVLM-1.0a-7B-RL | 44.75 | — | Imported | 2026-05-06 |
No matching rows.