7second-commonvoice: Short Audio Clips for Speech Recognition
Available on 1 platform
Sign in to view source links and access this dataset
Description
7second-commonvoice is a dataset hosted on Kaggle, likely derived from the Mozilla Common Voice project. The dataset appears to contain audio data, as suggested by the platform tags 'Audio Data' and 'Speech Recognition'. The exact number of samples, file formats, and specific content are unknown from the provided metadata.
Use Cases
Training a speech-to-text model on short audio utterances (inferred from domain, verify after download)
Benchmarking ASR system performance on a subset of Common Voice data (inferred from domain, verify after download)
Fine-tuning a pre-trained model for specific acoustic conditions (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science and machine learning.
Associated with the Mozilla Common Voice project, a known open-source speech data initiative.
Limitations
Metadata is minimal; actual content requires verification after download.
Row count, file formats, and column definitions are unknown, which limits suitability assessment.
License, author, and last update information are unavailable.
Provenance
Source
Mozilla Common Voice project (inferred from title and platform tags).
Collection Method
Likely crowdsourced via the Common Voice platform.
Time Range
null
Freshness
Last updated date is unknown; freshness unverified.
Geography
null
License is unknown; users must verify usage rights after download.