Name: INTP: 250,000 Synthetic Speech Intelligibility Preference Pairs
Creator: amphion
Published: 2025-05-27T03:57:08
Keywords: Task Categoriestext To Speech, Librarypolars, Languagezh, Task Categoriesreinforcement Learning, Librarydask, Modalityaudio, Languageen, Modalitytext, Size Categories100 Kn1 M, Librarymlcroissant, Librarydatasets, Parquet, Licensecc By Nc 40, Arxiv250115442, Regionus, Arxiv240900750, Arxiv241006885

Description

Amphion released the INTP dataset in late 2024, providing 250,000 synthetic speech preference pairs totaling over 2,000 hours of audio. The collection spans English and Chinese languages across diverse scenarios including regular speech, repeated phrases, and code-switching contexts for speech intelligibility research.

Use Cases

Training reward models for Reinforcement Learning from Human Feedback (RLHF) in speech synthesis using preference pairs
Optimizing intelligibility in code-switching TTS systems using English-Chinese audio samples
Benchmarking cross-lingual synthesis quality across diverse synthetic domains

Strengths

250,000 preference pairs
Over 2,000 hours of audio content
Includes complex code-switching and cross-lingual scenarios

Limitations

Synthetic data origin may lack the nuanced variability of natural human speech
Restricted to English and Chinese language pairs
Non-commercial license restricts industrial application

Provenance

Source: Amphion (Arxiv 2501.15442, 2409.00750, 2410.06885)
Collection Method: Synthetic generation using diverse TTS model integration
Freshness: Last updated July 2025; reflects recent research in synthetic speech preference modeling.

Released under the CC BY-NC 4.0 license, which prohibits commercial use. Users should refer to the associated Arxiv publications for specific methodology on how the preference pairs were generated.

Parquet Task Categoriestext To Speech Librarypolars Languagezh Task Categoriesreinforcement Learning Librarydask Modalityaudio Languageen Modalitytext Size Categories100 Kn1 M Librarymlcroissant Librarydatasets Licensecc By Nc 40 Arxiv250115442 Regionus Arxiv240900750 Arxiv241006885

INTP: 250,000 Synthetic Speech Intelligibility Preference Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info