Health Memory Arena

Event-driven longitudinal health-agent benchmark over synthetic patient trajectories, evaluating lookup, trend, comparison, anomaly, and explanation capabilities.

17rows
total_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Total Score, Lookup, Trend, Comparison, Anomaly, Explanation

Showing 2 latest source slices.

Latest Results

Rows ranked by highest total score. Source agent display names and submitted dates are preserved.

Rank Subject Total Score Model Match Provenance Sampled
1 Mirobody (smart) 61.40 Imported 2026-05-06
2 Mirobody (expert) 58.20 Imported 2026-05-06
3 LLM-only (gemini-3.1) 57.90 Imported 2026-05-06
4 Mirobody (general) 54.80 Imported 2026-05-06
5 HippoRAG (k=10) 52.20 Imported 2026-05-06
6 LLM-only (gpt-5.4) 51.70 Imported 2026-05-06
7 Dyggraph 51.60 Imported 2026-05-06
8 LLM-only (glm-5) 47.10 Imported 2026-05-06
9 LLM-only (sonnet-4.6) 41.40 Imported 2026-05-06
10 LLM-only (minimax-m2.5) 39.70 Imported 2026-05-06
1 Mirobody (smart-general) 62.10 Imported 2026-05-06
2 Mirobody (smart-expert) 59 Imported 2026-05-06
3 LLM-only (sonnet-4.6) 57.60 Imported 2026-05-06
4 LLM-only (gpt-5.4) 53.10 Imported 2026-05-06
5 LLM-only (glm-5.1) 53 Imported 2026-05-06
6 LLM-only (minimax-m2.7) 48.50 Imported 2026-05-06
7 Mirobody (general) 38.40 Imported 2026-05-06