V-STaR
Spatio-temporal reasoning benchmark for Video-LLMs, reporting category-level modified arithmetic mean (mAM) and modified logarithmic geometric mean (mLGM) over video question chains.
14rows
mean_mamprimary metric
2026-05-06sampled
Metadata
Metrics
Mean mAM, Mean mLGM
| Rank | Subject | Mean mAM | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini-2-Flash | 27.43 | — | Imported | 2026-05-06 |
| 2 | GPT-4o | 26.26 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 3 | Qwen2.5-VL | 22.81 | — | Imported | 2026-05-06 |
| 4 | Video-Llama3 | 22.62 | — | Imported | 2026-05-06 |
| 5 | Video-CCAM-v1.2 | 22.05 | — | Imported | 2026-05-06 |
| 6 | Llava-Video | 21.93 | — | Imported | 2026-05-06 |
| 7 | InternVL-2.5 | 17.57 | — | Imported | 2026-05-06 |
| 8 | Sa2VA | 17.43 | — | Imported | 2026-05-06 |
| 9 | VTimeLLM | 17.22 | — | Imported | 2026-05-06 |
| 10 | Qwen2-VL | 17.06 | — | Imported | 2026-05-06 |
| 11 | VideoChat2 | 16.41 | — | Imported | 2026-05-06 |
| 12 | Oryx-1.5 | 13.75 | — | Imported | 2026-05-06 |
| 13 | TRACE | 13.15 | — | Imported | 2026-05-06 |
| 14 | TimeChat | 13.01 | — | Imported | 2026-05-06 |
No matching rows.