Hindsight LLM Memory Leaderboard

LLM leaderboard for Hindsight agent-memory operations, measuring retain(), reflect(), and quality performance over memory extraction and recall workloads.

25rows
quality_accuracyprimary metric
2026-05-06sampled

Metadata

Metrics

Quality Accuracy, Reflect Accuracy, Reflect Avg Latency (lower is better), Retain Success Rate, Retain Avg Latency (lower is better), Retain Throughput, Retain Tokens per Fact (lower is better), Retain Tests

Latest Results

Rows are imported from per-model JSON files in the public Hindsight benchmark results repository. Provider and model IDs are preserved.

Rank Subject Quality Accuracy Model Match Provenance Sampled
1 openai/gpt-5-mini 89.70 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
2 openai/gpt-4.1-nano 87.20 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-06
3 openai/gpt-5.4 86.80 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
4 openai/gpt-4.1-mini 86.40 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
5 openai/gpt-5.4-mini 86.40 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-06
6 ollama-cloud/gemma4-31b 86 Imported 2026-05-06
7 gemini/gemini-2.5-flash 85.50 Imported 2026-05-06
8 groq/llama-3.3-70b-versatile 85.50 Imported 2026-05-06
9 gemini/gemini-2.5-flash-lite 84.70 Imported 2026-05-06
10 groq/openai-gpt-oss-120b 84.70 Imported 2026-05-06
11 groq/llama-3.1-8b-instant 84.30 Imported 2026-05-06
12 groq/openai-gpt-oss-20b 83.90 Imported 2026-05-06
13 openai/gpt-5-nano 83.90 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
14 openai/gpt-5.4-nano 83.90 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-06
15 gemini/gemini-3-flash-preview 83.50 Imported 2026-05-06
16 openai/gpt-5.2 83.50 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
17 openai/gpt-4o-mini 81 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
18 local-ollama/deepseek-r1-1.5b Imported 2026-05-06
19 local-ollama/gemma3-1b Imported 2026-05-06
20 local-ollama/gemma3-270m Imported 2026-05-06
21 local-ollama/granite3.1-dense-2b Imported 2026-05-06
22 local-ollama/llama3.2-latest Imported 2026-05-06
23 local-ollama/qwen2.5-0.5b Imported 2026-05-06
24 local-ollama/qwen2.5-3b Imported 2026-05-06
25 local-ollama/smollm2-1.7b Imported 2026-05-06