Name: Nonverbal TTS Filtered Tokens: Pre-Extracted Audio Codec Tokens for TTS Training
Creator: somu9
Published: 2026-05-18T06:09:05
Keywords: Text To Speech, Speech Synthesis, Audio Codec, Audio, Pre Tokenized, Audio Tokens

Description

Pre-extracted audio codec tokens for TTS training, containing 6,082 samples totaling 15.6 hours of audio. The dataset was created by author somu9 and was last updated on 2026-05-18. It uses the MOSS-Audio-Tokenizer-Nano codec at a sample rate of 48,000 Hz and a frame rate of 12.5 Hz.

Use Cases

Training text-to-speech models based on pre-extracted audio codec tokens.
Fine-tuning speech synthesis systems based on the MOSS-Audio-Tokenizer-Nano codec format.
Researching audio tokenization and codec performance for TTS applications.
Developing or benchmarking TTS pipelines that require a frame rate of 12.5 Hz.

Strengths

Contains 6,082 pre-processed audio samples, providing a substantial starting point for model training.
Offers 15.6 total hours of audio data, which is a concrete volume for speech synthesis tasks.
Specifies technical parameters including a 48,000 Hz sample rate and 12.5 Hz frame rate, allowing for precise pipeline integration.
Provides average sample duration (9.2 seconds) and frames per sample (115.3), aiding in batch size and memory planning.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
The license is unknown, which may restrict commercial or research use.
The dataset's source and collection methodology are not detailed, making bias assessment difficult.

Provenance

Source: huggingface datasets platform, uploaded by author somu9.
Collection Method: Pre-extracted from audio using the MOSS-Audio-Tokenizer-Nano codec.
Freshness: Last updated 2026-05-18 06:09:08; freshness should be verified.

License is unknown, which is a critical restriction for use. The full data format description is truncated and requires visiting the Hugging Face dataset page.

Audio Text To Speech Speech Synthesis Audio Codec Pre Tokenized Audio Tokens

Nonverbal TTS Filtered Tokens: Pre-Extracted Audio Codec Tokens for TTS Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info