MaCBench

Chemistry and materials multimodal benchmark evaluating VLMs across lab, molecular, crystallography, MOF, spectroscopy, patent-figure, table-QA, and XRD tasks.

11rows
overall_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Score, afm-image, chem-lab-basic, chem-lab-comparison, chem-lab-equipments, chirality, cif-atomic-species, cif-crystal-system, cif-density, cif-symmetry, cif-volume, electronic-structure, handdrawn-molecules, isomers, mof-adsorption-strength-comparison, mof-adsorption-strength-order, mof-capacity-comparison, mof-capacity-order, mof-capacity-value, mof-henry-constant-comparison, mof-henry-constant-order, mof-working-capacity-comparison, mof-working-capacity-order, mof-working-capacity-value, org-schema-wo-smiles, org-schema, organic-molecules, spectral-analysis, tables-qa, us-patent-figures, us-patent-plots, xrd-pattern-matching, xrd-pattern-shape, xrd-peak-position, xrd-relative-intensity

Latest Results

Rows are parsed from the public Hugging Face dataset-server rows API for the latest MaCBench-Results split. Source model names and IDs are preserved.

Rank Subject Overall Score Model Match Provenance Sampled
1 llama-4-maverick-17b-128e-instruct 0.70 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
2 Claude-3.5-Sonnet 0.67 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
3 llama-4-scout-17b-16e-instruct 0.63 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
4 Gemini-1.5-Pro 0.57 Imported 2026-05-06
5 mistralai/Pixtral-Large-Instruct 0.57 Imported 2026-05-06
6 GPT-4o 0.54 GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-06
7 Mistral-Small-3.1-24B-Instruct 0.53 Mistral: Mistral Small 3.1 24B
mistralai-mistral-small-3.1-24b-instruct
Imported 2026-05-06
8 grok-2-vision-1212 0.46 Imported 2026-05-06
9 Llama 3.2 90B Vision 0.36 Imported 2026-05-06
10 llama-3.2-11b-vision-preview 0.32 Imported 2026-05-06
11 JanusPro-7B 0.20 Imported 2026-05-06