Global PIQA
Global PIQA is a multilingual commonsense reasoning benchmark that evaluates physical interaction knowledge across 100 languages and cultures. It tests AI systems' understanding of physical world knowledge in diverse cultural contexts through multiple choice questions about everyday situations requiring physical commonsense.
17rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Normalized Score
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen3.7 Max | 91.4% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.6 Max | 91.2% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 3 | DeepSeek V4 Pro Max | 90.5% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 4 | Qwen3.6 Plus | 89.8% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 5 | GLM-5.1 Thinking | 89.5% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 6 | Kimi K2.6 Thinking | 89.2% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 1 | Gemini 3 Pro | 0.93 | Gemini 3 google-gemini-3 | Self-reported | 2026-05-06 |
| 2 | Gemini 3 Flash | 0.93 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Self-reported | 2026-05-06 |
| 3 | Qwen3.6 Plus | 0.90 | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-06 |
| 3 | Qwen3.5-397B-A17B | 0.90 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Self-reported | 2026-05-06 |
| 5 | Qwen3.5-122B-A10B | 0.88 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Self-reported | 2026-05-06 |
| 6 | Qwen3.5-27B | 0.88 | Qwen3.5-27B qwen-qwen3.5-27b | Self-reported | 2026-05-06 |
| 7 | Qwen3.5-35B-A3B | 0.87 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Self-reported | 2026-05-06 |
| 8 | Qwen3.5-9B | 0.83 | Qwen3.5-9B qwen-qwen3.5-9b | Self-reported | 2026-05-06 |
| 9 | Qwen3.5-4B | 0.79 | — | Self-reported | 2026-05-06 |
| 10 | Qwen3.5-2B | 0.69 | — | Self-reported | 2026-05-06 |
| 11 | Qwen3.5-0.8B | 0.59 | — | Self-reported | 2026-05-06 |
No matching rows.