Seneca-TRBench

Seneca-TRBench evaluates LLM Turkish language proficiency with MCQ structural-linguistics questions and GPT-4o-judged short-answer tasks.

5rows
combined_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Combined Score, MCQ Score, SAQ Score

Latest Results

Rows are parsed from the public dataset README Top Models table. The README reports 20 evaluated models but publishes only the top five rows in this table.

Rank Subject Combined Score Model Match Provenance Sampled
1 gpt-5 93.50 GPT-5
openai-gpt-5
Imported 2026-05-06
2 gpt-5-nano 92.90 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
3 gpt-5-mini 92.40 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
4 claude-opus-4-1-20250805 90.06 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
5 claude-sonnet-4-5-20250929 88.78 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06