ResearchGym

Autonomous AI-research-agent benchmark spanning the full ML research cycle across five realistic research tasks.

3rows
normalized_sota_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Normalized SOTA Score, Continual Learning, Materials Tokenization, Cross-Modal Retrieval, Time-Series Explanation, Improving Replay Buffers

Latest Results

Rows parsed from the ResearchGym static leaderboard. Scores are normalized against task SOTA baselines; 100% equals matching SOTA.

Rank Subject Normalized SOTA Score Model Match Provenance Sampled
1 RG-Agent (gpt-5-high) 76.3% Imported 2026-05-27
2 Codex (gpt-5.2-codex xhigh) 62.1% Imported 2026-05-27
3 Claude Code (claude-opus-4.5) 24.0% Imported 2026-05-27