POPE
POPE: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.
15rows
f1primary metric
2026-05-27sampled
Metadata
Metrics
F1, Accuracy, Precision, Recall, Yes rate
| Rank | Subject | F1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | InstructBLIP (Random) | 89.29 | — | Imported | 2026-05-27 |
| 2 | InstructBLIP (Popular) | 83.45 | — | Imported | 2026-05-27 |
| 3 | MiniGPT-4 (Random) | 78.86 | — | Imported | 2026-05-27 |
| 4 | InstructBLIP (Adversarial) | 78.45 | — | Imported | 2026-05-27 |
| 5 | MiniGPT-4 (Popular) | 72.21 | — | Imported | 2026-05-27 |
| 6 | MiniGPT-4 (Adversarial) | 71.37 | — | Imported | 2026-05-27 |
| 7 | LLaVA (Random) | 68.65 | — | Imported | 2026-05-27 |
| 8 | mPLUG-Owl (Random) | 68.06 | — | Imported | 2026-05-27 |
| 9 | LLaVA (Popular) | 67.72 | — | Imported | 2026-05-27 |
| 10 | LLaVA (Adversarial) | 66.98 | — | Imported | 2026-05-27 |
| 11 | mPLUG-Owl (Adversarial) | 66.82 | — | Imported | 2026-05-27 |
| 12 | mPLUG-Owl (Popular) | 66.79 | — | Imported | 2026-05-27 |
| 13 | MultiModal-GPT (Random) | 66.68 | — | Imported | 2026-05-27 |
| 14 | MultiModal-GPT (Adversarial) | 66.67 | — | Imported | 2026-05-27 |
| 15 | MultiModal-GPT (Popular) | 66.67 | — | Imported | 2026-05-27 |
No matching rows.