GISA
General information-seeking assistant benchmark for structured item, set, list, and table answers from search-capable systems.
16rows
overallprimary metric
2026-05-27sampled
Metadata
Metrics
Overall, Item Exact Match, Set Exact Match, Set F1, List Exact Match, List F1, List Order, Table Exact Match, Table Row F1, Table Item F1
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude 4.5 Sonnet (thinking) | 19.3 | — | Imported | 2026-05-27 |
| 2 | Qwen3-Max (thinking) | 17.96 | — | Imported | 2026-05-27 |
| 3 | Claude 4.5 Sonnet (non-thinking) | 16.36 | — | Imported | 2026-05-27 |
| 4 | GPT-5.2 (thinking) | 15.82 | — | Imported | 2026-05-27 |
| 5 | Kimi K2.5 (thinking) | 15.55 | — | Imported | 2026-05-27 |
| 6 | Gemini 3 Pro (high) | 15.28 | — | Imported | 2026-05-27 |
| 7 | Gemini 3 Pro (low) | 14.74 | — | Imported | 2026-05-27 |
| 8 | DeepSeek-V3.2 (thinking) | 14.47 | — | Imported | 2026-05-27 |
| 9 | GLM-4.7 (thinking) | 14.21 | — | Imported | 2026-05-27 |
| 10 | Seed-1.8 (thinking) | 13.4 | — | Imported | 2026-05-27 |
| 11 | DeepSeek-V3.2 (non-thinking) | 11.53 | — | Imported | 2026-05-27 |
| 12 | Qwen3-235B-A22B (thinking) | 9.65 | — | Imported | 2026-05-27 |
| 13 | Google Search AI Mode | 9.38 | — | Imported | 2026-05-27 |
| 14 | OpenAI o4 Mini Deep Research | 7.78 | o4 Mini Deep Research openai-o4-mini-deep-research | Imported | 2026-05-27 |
| 15 | Perplexity Sonar Pro Search | 7.51 | Sonar Pro Search perplexity-sonar-pro-search | Imported | 2026-05-27 |
| 16 | GPT-4o Search Preview | 5.63 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
No matching rows.