POPE

POPE: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.

15rows
f1primary metric
2026-05-27sampled

Metadata

Metrics

F1, Accuracy, Precision, Recall, Yes rate

Latest Results

Rows are parsed from the POPE paper arXiv LaTeX MSCOCO validation results table.

Rank Subject F1 Model Match Provenance Sampled
1 InstructBLIP (Random) 89.29 Imported 2026-05-27
2 InstructBLIP (Popular) 83.45 Imported 2026-05-27
3 MiniGPT-4 (Random) 78.86 Imported 2026-05-27
4 InstructBLIP (Adversarial) 78.45 Imported 2026-05-27
5 MiniGPT-4 (Popular) 72.21 Imported 2026-05-27
6 MiniGPT-4 (Adversarial) 71.37 Imported 2026-05-27
7 LLaVA (Random) 68.65 Imported 2026-05-27
8 mPLUG-Owl (Random) 68.06 Imported 2026-05-27
9 LLaVA (Popular) 67.72 Imported 2026-05-27
10 LLaVA (Adversarial) 66.98 Imported 2026-05-27
11 mPLUG-Owl (Adversarial) 66.82 Imported 2026-05-27
12 mPLUG-Owl (Popular) 66.79 Imported 2026-05-27
13 MultiModal-GPT (Random) 66.68 Imported 2026-05-27
14 MultiModal-GPT (Adversarial) 66.67 Imported 2026-05-27
15 MultiModal-GPT (Popular) 66.67 Imported 2026-05-27