OSUniverse | BenchmarkList

Metadata

Total Score, Paper, Wood, Bronze, Silver, Gold

Rank	Subject	Total Score	Model Match	Provenance	Sampled
1	Computer Use Agent with computer-use-preview-2025-03-11	47.8%	—	Imported	2026-05-27
2	Claude Computer Use with claude-3-5-sonnet-20241022	28.36%	—	Imported	2026-05-27
3	AgentDesk-based ReACT with claude-3-5-sonnet-20241022	23.44%	—	Imported	2026-05-27
4	QWEN-based ReACT with qwen2.5-vl-72b-instruct	18.64%	—	Imported	2026-05-27
5	AgentDesk-based ReACT with gemini-2.5-pro-exp-03-25	9.59%	—	Imported	2026-05-27
6	AgentDesk-based ReACT with gemini-2.0-flash-001	8.26%	—	Imported	2026-05-27
7	AgentDesk-based ReACT with gpt-4o-2024-11-20	6.79%	—	Imported	2026-05-27
8	AgentDesk-based ReACT with gemini-1.5-pro-002	6.12%	—	Imported	2026-05-27