Name: TTS Human Preferences: 2,700 Audio Pairs with 40,500 Annotations
Creator: datapointai
Published: 2026-03-06T15:14:03
Keywords: Size Categories1 Kn10 K, Text To Speech, Librarypolars, Rlhf, Modalityaudio, OPTIMIZED-PARQUET, Languageen, Modalitytext, Audio Quality, Librarymlcroissant, Task Categoriesaudio Classification, Librarydatasets, Librarypandas, Preference Data, Licensecc By 40, Parquet, Human Preferences, Regionus, Dpo

Description

Encompassing 2,700 pairs of text-to-speech audio renderings with 15 human preference annotations per pair. Produced by datapointai and updated in March 2026, it provides comparative naturalness ratings for audio generated from identical text prompts. The collection totals 40,500 individual human judgments to support high-confidence audio quality evaluation.

Use Cases

Training Direct Preference Optimization (DPO) models using the human preference annotations to align TTS output
Benchmarking TTS engine performance by comparing the two audio renderings for naturalness
Reinforcement Learning from Human Feedback (RLHF) for improving audio synthesis quality

Strengths

High annotation density with 15 human judgments per record to mitigate individual rater bias
Total of 40,500 annotations across 2,700 unique audio pairs
Permissive CC BY 4.0 licensing for commercial and research use

Limitations

Small sample size of 2,700 unique text prompts compared to large-scale TTS corpora
Geographic bias likely restricted to US-based English speakers as indicated by metadata tags
Subjective nature of 'naturalness' labels may lead to variance in preference results

Provenance

Source: datapointai
Collection Method: Human annotation of model-generated audio renderings
Freshness: Last updated March 2026.
Geography: United States

The dataset is provided in Parquet format and is optimized for use with the Hugging Face Datasets library. Users should account for the 15-fold redundancy in annotations when calculating consensus scores.

OPTIMIZED-PARQUET Parquet Size Categories1 Kn10 K Text To Speech Librarypolars Rlhf Modalityaudio Languageen Modalitytext Audio Quality Librarymlcroissant Task Categoriesaudio Classification Librarydatasets Librarypandas Preference Data Licensecc By 40 Human Preferences Regionus Dpo

TTS Human Preferences: 2,700 Audio Pairs with 40,500 Annotations

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info