Dynamic-SUPERB

Dynamic speech-language-model benchmark and leaderboard for speech instruction following across many audio tasks.

10rows
higher_better_percent_averageprimary metric
2026-05-27sampled

Metadata

Metrics

Higher-Better Percent Average, Higher-Better Task Count

Latest Results

Rows are BenchmarkList-derived aggregates from the public Dynamic-SUPERB CSV: average of higher-better percentage-valued task metrics. Source CSV also contains lower-better metrics that are preserved only through source metadata, not folded into this aggregate score.

Rank Subject Higher-Better Percent Average Model Match Provenance Sampled
1 DeSTA2.5-Audio 46.18 Imported 2026-05-27
2 Qwen2-Audio-7B-Instruct 39.88 Imported 2026-05-27
3 Qwen-Audio-Chat 36.59 Imported 2026-05-27
4 Whisper-LLaMA 31.62 Imported 2026-05-27
5 SALMONN-13B 29.99 Imported 2026-05-27
6 SALMONN-7B 29.46 Imported 2026-05-27
7 WavLLM 29.10 Imported 2026-05-27
8 MU-LLaMA 22.25 Imported 2026-05-27
9 LTU-AS 18.84 Imported 2026-05-27
10 GAMA-IT 17.70 Imported 2026-05-27