Name: HiFiTTS-2: Large-Scale High Bandwidth Speech Metadata from LibriVox
Creator: nvidia
Published: 2025-05-12T23:54:45
Keywords: Text To Speech, Librarypolars, Size Categories10 Mn100 M, Speech Synthesis, Modalitytext, Modalitytabular, Librarymlcroissant, Librarydatasets, Librarypandas, Licensecc By 40, Tabular, Audiobooks, Audio, Regionus, Large Scale, JSON, Arxiv250604152, Audio Processing

Description

HiFiTTS-2 is a large-scale speech dataset from NVIDIA, containing metadata for approximately 36.7 thousand hours of audio derived from LibriVox audiobooks. The metadata includes estimated bandwidth and corresponds to audio from 5 thousand speakers, recorded at a 48 kHz sampling rate. The dataset was last updated on the platform in November 2025.

Use Cases

Training high-fidelity text-to-speech models based on the large-scale, high-bandwidth audio data.
Benchmarking speech synthesis quality based on estimated bandwidth metadata.
Studying speaker diversity and characteristics based on data from 5,000 speakers.
Preprocessing and filtering audio datasets based on bandwidth and source metadata.

Strengths

Large scale with metadata for approximately 36.7 thousand hours of audio.
High speaker diversity with data from 5 thousand speakers.
High-quality audio source with a 48 kHz sampling rate.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset contains metadata only; the actual audio files must be downloaded separately from LibriVox.

Provenance

Source: NVIDIA, derived from LibriVox audiobooks.
Collection Method: Derived from publicly available LibriVox audiobooks.
Time Range: null
Freshness: Last updated 2025-11-18 23:42:07; freshness should be verified.
Geography: null

License is unknown; users must verify licensing terms before use. Audio files are not included in the dataset and must be downloaded separately from LibriVox.

Tabular Audio JSON Text To Speech Librarypolars Size Categories10 Mn100 M Speech Synthesis Modalitytext Modalitytabular Librarymlcroissant Librarydatasets Librarypandas Licensecc By 40 Audiobooks Regionus Large Scale Arxiv250604152 Audio Processing

HiFiTTS-2: Large-Scale High Bandwidth Speech Metadata from LibriVox

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info