AudioTokenBench: 3,150 Audio Samples for Tokenizer Evaluation

Name: AudioTokenBench: 3,150 Audio Samples for Tokenizer Evaluation
Creator: bosonai
Published: 2025-07-28T18:34:15
Keywords: Sound Events, Audiophile, Benchmark, Audio, Audio Evaluation

by bosonaiUpdated 10mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

3150 audio samples at 24kHz, created by bosonai and last updated on 2025-07-28. The dataset is designed for evaluating the HiggsTokenizer and contains four subsets: Speech, Music, Sound Event, and Audiophile. The Speech, Music, and Sound Event subsets each contain 1,000 ten-second clips, while the Audiophile subset contains 150 thirty-second high-fidelity clips.

Use Cases

Benchmark audio tokenizer reconstruction quality based on the 24kHz audio samples.
Evaluate model performance across different audio domains based on the Speech, Music, Sound Event, and Audiophile subsets.
Test audio generation fidelity on high-quality samples based on the curated Audiophile clips.

Strengths

Contains 3,150 total audio samples, providing a substantial evaluation corpus.
Includes four distinct subsets (Speech, Music, Sound Event, Audiophile) for domain-specific testing.
Audiophile subset features 150 thirty-second clips curated from high-fidelity test discs.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for individual subsets beyond the totals is unknown, which may limit suitability assessment.

Provenance

Source: Samples sourced from DAPS, MUSDB, AudioSet, and high-fidelity test discs.
Collection Method: Clips were randomly sampled from source datasets or curated from test discs.
Time Range: null
Freshness: Last updated 2025-07-28 22:03:10; freshness should be verified.
Geography: null

License is unknown; restrictions should be verified before use.

Audio Sound Events Audiophile Benchmark Audio Evaluation

Related Datasets

Quality Score

D39

Description

39

Source

39

Reputation

46

Access

26

Community

69 downloads

8 likes

0 views

Dataset Info

Author: bosonai
Created: Jul 28, 2025
Updated: Jul 28, 2025
Last synced: Jun 8, 2026

Access

26

Community

69 downloads

8 likes

0 views

Dataset Info

Author: bosonai
Created: Jul 28, 2025
Updated: Jul 28, 2025
Last synced: Jun 8, 2026

AudioTokenBench: 3,150 Audio Samples for Tokenizer Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info