ALME Benchmark

Audio-text conflict benchmark measuring whether audio-language models follow the audio signal instead of conflicting text.

1rows
tdr_allprimary metric
2026-05-27sampled

Metadata

Metrics

Text Dominance Ratio, all stimuli (lower is better), Total stimuli, TDR EN (lower is better), TDR DE (lower is better), TDR FR (lower is better), TDR IT (lower is better), TDR PT (lower is better), TDR AR (lower is better), TDR JA (lower is better), TDR ZH (lower is better), TDR adjective_swap (lower is better), TDR negation_add (lower is better), TDR negation_remove (lower is better), TDR number_swap (lower is better), TDR time_swap (lower is better)

Latest Results

Rows are parsed from the public ALME README reference-results Markdown tables. TDR is Text Dominance Ratio; lower is better because it measures following incorrect text over correct audio.

Rank Subject Text Dominance Ratio, all stimuli Model Match Provenance Sampled
1 Ultravox v0.6 48.8% Imported 2026-05-27