OfficeQA Pro

OpenAI launch-post benchmark for professional office question-answering tasks.

4rows
scoreprimary metric
2026-04-23sampled

Metadata

Metrics

Score

Latest Results

Provider-published launch-post benchmark scores parsed from OpenAI's evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced. OpenAI notes GPT evals were run with reasoning effort set to xhigh in a research environment.

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.5 54.1% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
2 GPT-5.4 53.2% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
3 Claude Opus 4.7 43.6% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
4 Gemini 3.1 Pro Preview 18.1% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23