AudioBench
Audio-language benchmark covering speech, sound, music, ASR, QA, translation, and audio reasoning tasks across many task-level metrics.
13rows
aggregate_non_wer_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Aggregate Non-WER Score, Average Llama 3 70B Judge, Average GPT-4o Judge, Average String Match, Average BLEU, Average METEOR, Average WER (lower is better)
| Rank | Subject | Aggregate Non-WER Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4o-audio | 57.1219 | GPT-4o Audio openai-gpt-4o-audio-preview | Imported | 2026-05-27 |
| 2 | gemini-1.5-flash | 47.951 | — | Imported | 2026-05-27 |
| 3 | MERaLiON-AudioLLM-Whisper-SEA-LION | 47.8887 | — | Imported | 2026-05-27 |
| 4 | phi_4_multimodal_instruct | 37.3843 | — | Imported | 2026-05-27 |
| 5 | seallms_audio_7b | 36.7351 | — | Imported | 2026-05-27 |
| 6 | Qwen2-Audio-7B-Instruct | 36.6041 | — | Imported | 2026-05-27 |
| 7 | cascade_whisper_large_v3_llama_3_8b_instruct | 36.3648 | — | Imported | 2026-05-27 |
| 8 | cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct | 32.4654 | — | Imported | 2026-05-27 |
| 9 | Qwen-Audio-Chat | 30.5138 | — | Imported | 2026-05-27 |
| 10 | Marco-LLM-ST | 30.0295 | — | Imported | 2026-05-27 |
| 11 | SALMONN_7B | 29.4734 | — | Imported | 2026-05-27 |
| 12 | WavLLM_fairseq | 20.4118 | — | Imported | 2026-05-27 |
| 13 | whisper_large_v3 | 13.8762 | — | Imported | 2026-05-27 |
No matching rows.