AIR-Bench
Audio instruction benchmark evaluating large audio-language models on generative comprehension across chat and foundation audio tasks.
18rows
averageprimary metric
2026-05-27sampled
Metadata
Metrics
Chat Average, Foundation Average, Speech, Sound, Music, Mixed Audio
| Rank | Subject | Chat Average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen-Audio-Turbo (foundation) | 58.285 | — | Imported | 2026-05-27 |
| 2 | Qwen-Audio (foundation) | 54.595 | — | Imported | 2026-05-27 |
| 3 | Whisper+GPT 4 (foundation) | 53.5889 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 4 | Pandagpt (foundation) | 39.72 | — | Imported | 2026-05-27 |
| 5 | SALMONN (foundation) | 36.53 | — | Imported | 2026-05-27 |
| 6 | BLSP (foundation) | 32.16 | — | Imported | 2026-05-27 |
| 7 | Next-gpt (foundation) | 31.765 | — | Imported | 2026-05-27 |
| 8 | SpeechGPT (foundation) | 30.885 | — | Imported | 2026-05-27 |
| 9 | Qwen2-Audio (chat) | 6.93 | — | Imported | 2026-05-27 |
| 10 | Qwen-Audio-Turbo (chat) | 6.34 | — | Imported | 2026-05-27 |
| 11 | SALMONN (chat) | 6.11 | — | Imported | 2026-05-27 |
| 12 | Qwen-Audio (chat) | 6.08 | — | Imported | 2026-05-27 |
| 13 | Gemini-1.5-pro (chat) | 5.7 | — | Imported | 2026-05-27 |
| 14 | BLSP (chat) | 5.33 | — | Imported | 2026-05-27 |
| 15 | Pandagpt (chat) | 4.25 | — | Imported | 2026-05-27 |
| 16 | Next-gpt (chat) | 4.13 | — | Imported | 2026-05-27 |
| 17 | SpeechGPT (chat) | 1.15 | — | Imported | 2026-05-27 |
| 18 | Macaw-LLM (chat) | 1.01 | — | Imported | 2026-05-27 |
No matching rows.