Testnew: A 1,025-Hour Speech Dataset with Speaker Breakdown

Name: Testnew: A 1,025-Hour Speech Dataset with Speaker Breakdown
Creator: setfunctionenvironment
Published: 2025-07-17T17:33:32
Keywords: Speaker Identification, Speech Processing, Audio

by setfunctionenvironmentUpdated 10mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

556,667 audio files totaling 1,024.71 hours of speech data, with an average clip length of 6.63 seconds. The dataset includes a breakdown of clips by speaker, with the top contributor, 'Despina', accounting for 60,150 clips or 11.5% of the total duration. It was uploaded by 'setfunctionenvironment' to Hugging Face and last updated on July 18, 2025.

Use Cases

Training automatic speech recognition (ASR) models based on the large volume of short audio clips.
Developing speaker diarization or identification systems based on the provided speaker breakdown and clip counts.
Benchmarking audio preprocessing pipelines based on the varied clip durations, from 0.41 to 44.97 seconds.
Analyzing speaker distribution and potential biases in speech data based on the provided top-speaker statistics.

Strengths

Large scale with over 556,000 audio files.
Substantial total duration of 1,024.71 hours.
Provides detailed speaker-level statistics, including clip counts and duration percentages.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
The source, collection method, and license are unknown, limiting reproducibility and use-case assessment.

Provenance

Source: huggingface
Freshness: Last updated 2025-07-18 00:27:04; freshness should be verified.

License is unknown, which may restrict commercial or research use.

Audio Speaker Identification Speech Processing

Related Datasets

Quality Score

C44

Description

48

Source

41

Reputation

54

Access

26

Community

152 downloads

131 likes

0 views

Dataset Info

Author: setfunctionenvironment
Created: Jul 17, 2025
Updated: Jul 18, 2025
Last synced: Apr 29, 2026

Access

26

Community

152 downloads

131 likes

0 views

Dataset Info

Author: setfunctionenvironment
Created: Jul 17, 2025
Updated: Jul 18, 2025
Last synced: Apr 29, 2026

Testnew: A 1,025-Hour Speech Dataset with Speaker Breakdown

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info