ResearchGym
Autonomous AI-research-agent benchmark spanning the full ML research cycle across five realistic research tasks.
3rows
normalized_sota_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Normalized SOTA Score, Continual Learning, Materials Tokenization, Cross-Modal Retrieval, Time-Series Explanation, Improving Replay Buffers
| Rank | Subject | Normalized SOTA Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | RG-Agent (gpt-5-high) | 76.3% | — | Imported | 2026-05-27 |
| 2 | Codex (gpt-5.2-codex xhigh) | 62.1% | — | Imported | 2026-05-27 |
| 3 | Claude Code (claude-opus-4.5) | 24.0% | — | Imported | 2026-05-27 |
No matching rows.