CG-Bench
Clue-grounded long-video question-answering benchmark evaluating MCQ accuracy, clue-grounding credibility metrics, and open-ended answer accuracy.
22rows
open_ended_accuracyprimary metric
2026-05-28sampled
Metadata
Metrics
MCQ Clue Accuracy, MCQ Long-Video Accuracy, Credibility mIoU, Credibility Recall@IoU, Credibility Accuracy@IoU, Credibility CRR, Open-Ended Accuracy
| Rank | Subject | Open-Ended Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o-08-06 | 39.2% open-ended acc. / 44.9% MCQ long acc. | GPT-4o openai-gpt-4o | Imported | 2026-05-28 |
| 2 | Claude3.5-Sonnet | 35.6% open-ended acc. / 40.3% MCQ long acc. | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-28 |
| 3 | InternVL2.5 | 34.2% open-ended acc. / 44.2% MCQ long acc. | — | Imported | 2026-05-28 |
| 4 | Qwen2-VL | 33.7% open-ended acc. / 45.3% MCQ long acc. | — | Imported | 2026-05-28 |
| 5 | Gemini-1.5-Pro | 28.7% open-ended acc. / 37.8% MCQ long acc. | — | Imported | 2026-05-28 |
| 6 | VITA | 28% open-ended acc. / 33% MCQ long acc. | — | Imported | 2026-05-28 |
| 7 | MiniCPM-v2.6 | 26.3% open-ended acc. / 29.9% MCQ long acc. | — | Imported | 2026-05-28 |
| 8 | Kangaroo | 25.9% open-ended acc. / 31.2% MCQ long acc. | — | Imported | 2026-05-28 |
| 9 | LLaVA-OneVision | 25% open-ended acc. / 30.9% MCQ long acc. | — | Imported | 2026-05-28 |
| 10 | GPT-4mini-08-06 | 24.9% open-ended acc. / 32.6% MCQ long acc. | GPT-4 openai-gpt-4 | Imported | 2026-05-28 |
| 11 | Video-CCAM | 24.8% open-ended acc. / 29.1% MCQ long acc. | — | Imported | 2026-05-28 |
| 12 | Gemini-1.5-Flash | 24.6% open-ended acc. / 33.5% MCQ long acc. | — | Imported | 2026-05-28 |
| 13 | LongVA | 24.2% open-ended acc. / 28.7% MCQ long acc. | — | Imported | 2026-05-28 |
| 14 | ViLA | 23.8% open-ended acc. / 28.1% MCQ long acc. | — | Imported | 2026-05-28 |
| 15 | InternVL-Chat-v1.5 | 22.9% open-ended acc. / 28.5% MCQ long acc. | — | Imported | 2026-05-28 |
| 16 | Chat-UniVi-v1.5 | 21.8% open-ended acc. / 26.7% MCQ long acc. | — | Imported | 2026-05-28 |
| 17 | ShareGPT4Video | 21.5% open-ended acc. / 27.1% MCQ long acc. | — | Imported | 2026-05-28 |
| 18 | Qwen-VL-Chat | 20.1% open-ended acc. / 20.7% MCQ long acc. | — | Imported | 2026-05-28 |
| 19 | ST-LLM | 20% open-ended acc. / 24.7% MCQ long acc. | — | Imported | 2026-05-28 |
| 20 | Videochat2 | 18.4% open-ended acc. / 19.1% MCQ long acc. | — | Imported | 2026-05-28 |
| 21 | VideoLLaMA | 16% open-ended acc. / 18% MCQ long acc. | — | Imported | 2026-05-28 |
| 22 | Video-LLaVA | 12% open-ended acc. / 16.8% MCQ long acc. | — | Imported | 2026-05-28 |
No matching rows.