Clotho-AQA

Clotho-AQA: Evaluates temporal, video, speech, or audio understanding beyond static text and image inputs.

6rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Majority-votes accuracy, Unfiltered accuracy, Unanimous accuracy, Top-1 accuracy, Top-5 accuracy, Top-10 accuracy

Latest Results

Rows are parsed from the Clotho-AQA paper arXiv LaTeX baseline tables for binary yes/no and single-word multiclass classifiers.

Rank Subject Score Model Match Provenance Sampled
1 Binary classifier (Question only) 64.4% Imported 2026-05-27
2 Binary classifier (Audio + question) 63.2% Imported 2026-05-27
3 Binary classifier (Audio only) 58.2% Imported 2026-05-27
4 Single-word multiclass classifier (Question only) 55.7% Imported 2026-05-27
5 Single-word multiclass classifier (Audio + question) 54.2% Imported 2026-05-27
6 Single-word multiclass classifier (Audio only) 3.2% Imported 2026-05-27