LLaVA-Bench in the Wild

LLaVA-Bench in the Wild: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.

8rows
llava_bench_wildprimary metric
2026-05-27sampled

Metadata

Metrics

MMMU, MathVista, VQAv2, GQA, VizWiz, SQA, TextVQA, POPE, MME, MM-Bench, MM-Bench-CN, SEED-IMG, LLaVA-Bench-Wild, MM-Vet, SEED

Latest Results

Rows are parsed from the public LLaVA model zoo tables containing LLaVA-Bench-Wild.

Rank Subject LLaVA-Bench-Wild Model Match Provenance Sampled
1 LLaVA-1.6 / Hermes-Yi-34B / full_ft-1e 89.6 Imported 2026-05-27
2 LLaVA-1.6 / Vicuna-13B / full_ft-1e 87.3 Imported 2026-05-27
3 LLaVA-1.6 / Mistral-7B / full_ft-1e 83.2 Imported 2026-05-27
4 LLaVA-1.6 / Vicuna-7B / full_ft-1e 81.6 Imported 2026-05-27
5 LLaVA-1.5 / 13B / full_ft-1e 72.5 Imported 2026-05-27
6 LLaVA-1.5 / 13B / lora-1e 69.5 Imported 2026-05-27
7 LLaVA-1.5 / 7B / lora-1e 67.9 Imported 2026-05-27
8 LLaVA-1.5 / 7B / full_ft-1e 65.4 Imported 2026-05-27