React Native Evals
Open evaluation framework from Callstack measuring AI model performance on real-world React Native development tasks across navigation, animation, async state, lists, and React Native APIs.
19rows
overall_score_pctprimary metric
2026-05-28sampled
Metadata
Metrics
Overall Score, Requirements Passed, Requirements Total, Tokens Used (lower is better), Cost (lower is better), Navigation Score, Animation Score, Async State Score, Lists Score, React Native Apis Score
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Composer 2 | 96.0806% overall | — | Imported | 2026-05-28 |
| 2 | Composer 2 Fast | 94.8718% overall | — | Imported | 2026-05-28 |
| 3 | Apex | 87.8022% overall | — | Imported | 2026-05-28 |
| 4 | GPT 5.4 | 85.348% overall | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 5 | GPT 5.5 | 84.652% overall | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 6 | Claude Opus 4.6 | 84.1026% overall | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 7 | Claude Opus 4.7 | 82.7839% overall | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 8 | Claude Sonnet 4.6 | 80.6227% overall | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 9 | Gemini 3.1 Pro Preview | 78.9011% overall | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 10 | Kimi K2.5 | 77.1795% overall | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 11 | Gemma 4 31B It | 75.2381% overall | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-28 |
| 12 | GLM 5 | 74.8352% overall | GLM 5 z-ai-glm-5 | Imported | 2026-05-28 |
| 13 | Grok 4 | 72.6277% overall | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 14 | GPT OSS 120B | 71.6289% overall | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 15 | DeepSeek V3.2 | 71.4827% overall | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-28 |
| 16 | Minimax M2.7 | 71.3553% overall | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-28 |
| 17 | GPT OSS 20B | 71.0222% overall | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-28 |
| 18 | Qwen2.5 Coder 32B Instruct | 51.2244% overall | Qwen2.5 Coder 32B Instruct qwen-qwen-2.5-coder-32b-instruct | Imported | 2026-05-28 |
| 19 | DeepSeek R1 Distill Qwen 32B | 44.359% overall | R1 Distill Qwen 32B deepseek-deepseek-r1-distill-qwen-32b | Imported | 2026-05-28 |
No matching rows.