DS-1000

DS-1000: Measures model capability on programming, code generation, code repair, or repository-level software tasks.

5rows
overall_meanprimary metric
2026-05-27sampled

Metadata

Metrics

Overall mean, Matplotlib mean, Numpy mean, Pandas mean, Pytorch mean, Scipy mean, Sklearn mean, Tensorflow mean, Difficult-Rewrite mean, Origin mean, Semantic mean, Surface mean

Latest Results

Rows are parsed from the public DS-1000 result summary text files checked into the official repository.

Rank Subject Overall mean Model Match Provenance Sampled
1 gpt-4-turbo-2024-04-09 0.539 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-27
2 gpt-4-0613 0.51 GPT-4
openai-gpt-4
Imported 2026-05-27
3 gpt-3.5-turbo-0125 0.394 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
4 Codex002 0.388 Imported 2026-05-27
5 gpt-3.5-turbo-0613 0.386 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27