A 119-hour corpus of English-language earnings calls collected from global companies. The dataset was created by anton-l and uploaded to Hugging Face in October 2022. Its primary purpose is to serve as a benchmark for automatic speech recognition models on real-world accented speech.
Use Cases
- Benchmark ASR model performance based on real-world accented speech.
- Train speech recognition models on domain-specific financial audio.
- Evaluate model robustness across different speaker accents mentioned in the description.
- Study linguistic patterns and vocabulary in corporate earnings communications.
Strengths
- Corpus size is explicitly stated as 119 hours.
- Focus on real-world accented speech provides a specific challenge for ASR.
- Source is clearly identified as earnings calls from global companies.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Last updated 2022-10-17 18:35:04; freshness should be verified.
- Row count and file formats are unknown, which may limit suitability assessment.
Provenance
- Source
- huggingface
- Collection Method
- Collection of earnings calls from global companies.
- Geography
- Global