Hindsight LLM Memory Leaderboard
LLM leaderboard for Hindsight agent-memory operations, measuring retain(), reflect(), and quality performance over memory extraction and recall workloads.
25rows
quality_accuracyprimary metric
2026-05-06sampled
Metadata
Metrics
Quality Accuracy, Reflect Accuracy, Reflect Avg Latency (lower is better), Retain Success Rate, Retain Avg Latency (lower is better), Retain Throughput, Retain Tokens per Fact (lower is better), Retain Tests
| Rank | Subject | Quality Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | openai/gpt-5-mini | 89.70 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 2 | openai/gpt-4.1-nano | 87.20 | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-06 |
| 3 | openai/gpt-5.4 | 86.80 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 4 | openai/gpt-4.1-mini | 86.40 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-06 |
| 5 | openai/gpt-5.4-mini | 86.40 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 6 | ollama-cloud/gemma4-31b | 86 | — | Imported | 2026-05-06 |
| 7 | gemini/gemini-2.5-flash | 85.50 | — | Imported | 2026-05-06 |
| 8 | groq/llama-3.3-70b-versatile | 85.50 | — | Imported | 2026-05-06 |
| 9 | gemini/gemini-2.5-flash-lite | 84.70 | — | Imported | 2026-05-06 |
| 10 | groq/openai-gpt-oss-120b | 84.70 | — | Imported | 2026-05-06 |
| 11 | groq/llama-3.1-8b-instant | 84.30 | — | Imported | 2026-05-06 |
| 12 | groq/openai-gpt-oss-20b | 83.90 | — | Imported | 2026-05-06 |
| 13 | openai/gpt-5-nano | 83.90 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-06 |
| 14 | openai/gpt-5.4-nano | 83.90 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 15 | gemini/gemini-3-flash-preview | 83.50 | — | Imported | 2026-05-06 |
| 16 | openai/gpt-5.2 | 83.50 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 17 | openai/gpt-4o-mini | 81 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 18 | local-ollama/deepseek-r1-1.5b | — | — | Imported | 2026-05-06 |
| 19 | local-ollama/gemma3-1b | — | — | Imported | 2026-05-06 |
| 20 | local-ollama/gemma3-270m | — | — | Imported | 2026-05-06 |
| 21 | local-ollama/granite3.1-dense-2b | — | — | Imported | 2026-05-06 |
| 22 | local-ollama/llama3.2-latest | — | — | Imported | 2026-05-06 |
| 23 | local-ollama/qwen2.5-0.5b | — | — | Imported | 2026-05-06 |
| 24 | local-ollama/qwen2.5-3b | — | — | Imported | 2026-05-06 |
| 25 | local-ollama/smollm2-1.7b | — | — | Imported | 2026-05-06 |
No matching rows.