K-12EduBench | BenchmarkList

Metadata

Lambda_js Avg, Accuracy Avg

Rank	Subject	Lambda_js Avg	Model Match	Provenance	Sampled
1	Doubao-Pro-32K	81.67	—	Imported	2026-05-27
2	DeepSeek-V3	79.67	DeepSeek V3 deepseek-deepseek-chat	Imported	2026-05-27
3	Doubao-Lite-32K	74.22	—	Imported	2026-05-27
4	GeneralV3.5	72.83	—	Imported	2026-05-27
5	ERNIE-Bot	71.71	—	Imported	2026-05-27
6	EduChat-R1-32B	71.41	—	Imported	2026-05-27
7	Baichuan4-Air	71.28	—	Imported	2026-05-27
8	GLM-4-AirX	70.28	—	Imported	2026-05-27
9	Yi-Lightning	69.87	—	Imported	2026-05-27
10	DeepSeek-R1	69.13	R1 deepseek-r1	Imported	2026-05-27
11	Hunyuan-Standard	68.28	—	Imported	2026-05-27
12	Gemini-1.5-Pro	67.88	—	Imported	2026-05-27
13	Qwen-Turbo	65.31	Qwen-Turbo qwen-qwen-turbo	Imported	2026-05-27
14	Grok-2	64.10	—	Imported	2026-05-27
15	Gemini-2.0-Flash	64.05	Gemini 2.0 Flash google-gemini-2.0-flash	Imported	2026-05-27
16	Grok-3	63.91	GROK Grok 3 xaigrok-3	Imported	2026-05-27
17	Claude-3.7-Sonnet	61.20	Claude 3.7 Sonnet anthropic-claude-3.7-sonnet	Imported	2026-05-27
18	GPT-4-Turbo	55.94	GPT-4 Turbo openai-gpt-4-turbo	Imported	2026-05-27
19	O1-Mini	54.46	—	Imported	2026-05-27
20	LLaMA-3.1-70B	49.65	—	Imported	2026-05-27
21	Claude-3.5-Haiku	44.98	Claude 3.5 Haiku anthropic-claude-3.5-haiku	Imported	2026-05-27
22	LLaMA-3.1-8B	21.91	—	Imported	2026-05-27
23	EduChat-SFT-13B	19.64	—	Imported	2026-05-27