Sign in to view source links and access this dataset
Description
Voice Annotation Data v2 is a curated dataset of 18,632 audio samples, comprising 9,391 positive and 9,241 negative examples across 58 voice dimensions. The dataset was created by TTS-AGI and was last updated on April 15, 2026. Each dimension includes up to 25 positive examples of audio that fits the category and 25 negative examples confirmed not to fit, using the Gemini 2.0 Flash model.
Use Cases
Training voice attribute classifiers based on the 58 annotated dimensions.
Evaluating the performance of TTS models against human- and AI-annotated positive/negative examples.
Developing data curation pipelines for audio datasets using the confirmed negative example methodology.
Studying voice characteristics and their perception across a multi-dimensional taxonomy.
Strengths
Contains 18,632 total audio samples, providing a substantial collection for model training.
Includes balanced positive and negative examples (9,391 and 9,241) for 58 distinct voice dimensions.
Each dimension includes up to 25 confirmed negative examples, which may improve classification robustness.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
TTS-AGI via Hugging Face
Collection Method
Curated audio samples annotated with positive/negative labels, with negatives confirmed by the Gemini 2.0 Flash model.
Time Range
null
Freshness
Last updated 2026-04-15 12:52:50; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before application.