Name: CommonVoice22 Sidon Dacvae: Speech Audio Converted to VAE Latents
Creator: TTS-AGI
Published: 2026-03-20T21:57:39
Keywords: Machine Learning, Tts Research, Speech Processing, Audio, Audio Representation, Multimodal

Description

CommonVoice 22 speech data enhanced by Sidon and converted into DAC VAE latent representations. The dataset is provided by TTS-AGI and was last updated on March 22, 2026. Each sample includes original FLAC audio, a corresponding latent vector, and metadata.

Use Cases

Training speech synthesis models based on pre-computed DAC VAE latent features.
Fine-tuning voice conversion systems using the paired audio and latent representations.
Benchmarking audio representation learning algorithms on a standardized speech dataset.
Analyzing speech characteristics using the provided metadata and characters-per-second metrics.

Strengths

Each sample includes three aligned files: original audio, latent representation, and metadata, ensuring data consistency.
Latent vectors are stored as numpy arrays with a defined shape of [T_latent, 128], providing a structured feature format.
Data is sharded into tar files of approximately 2GB each, which may facilitate distributed processing.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The original CommonVoice data may reflect geographic or demographic biases inherent to its crowd-sourced collection.

Provenance

Source: sarulab-speech/commonvoice22_sidon on Hugging Face
Collection Method: Conversion of the CommonVoice 22 (Sidon-enhanced) dataset to DAC VAE latents.
Freshness: Last updated 2026-03-22 13:31:51; freshness should be verified.

License is unknown; users must verify licensing terms before use. Requires tools to handle .tar shards, .npy, and .flac files.

Audio Multimodal Machine Learning Tts Research Speech Processing Audio Representation

CommonVoice22 Sidon Dacvae: Speech Audio Converted to VAE Latents

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info