NVIDIA ComputeEval

NVIDIA benchmark for evaluating LLM-generated CUDA and CUDA Python code on correctness and optional GPU performance across kernels, runtime APIs, memory management, parallel algorithms, and GPU libraries.

0rows
pass_rateprimary metric
sampled

Metadata

Metrics

Pass Rate, Speedup vs Baseline

Latest Results

Rank Subject Pass Rate Model Match Provenance Sampled