DS-1000
DS-1000: Measures model capability on programming, code generation, code repair, or repository-level software tasks.
5rows
overall_meanprimary metric
2026-05-27sampled
Metadata
Metrics
Overall mean, Matplotlib mean, Numpy mean, Pandas mean, Pytorch mean, Scipy mean, Sklearn mean, Tensorflow mean, Difficult-Rewrite mean, Origin mean, Semantic mean, Surface mean
| Rank | Subject | Overall mean | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4-turbo-2024-04-09 | 0.539 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-27 |
| 2 | gpt-4-0613 | 0.51 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | gpt-3.5-turbo-0125 | 0.394 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 4 | Codex002 | 0.388 | — | Imported | 2026-05-27 |
| 5 | gpt-3.5-turbo-0613 | 0.386 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
No matching rows.