EdAcc (The Edinburgh International Accents of English Corpus) is an automatic speech recognition dataset composed of 40 hours of English dyadic conversations. It was created by edinburghcstr and includes speakers with a diverse set of first and second-language English accents, along with linguistic background profiles. The dataset was last updated on February 22,我们发现了一个错误。
Use Cases
- Benchmarking ASR model performance on diverse English accents based on the described dyadic conversations.
- Training accent-robust speech recognition systems based on the wide range of first and second-language English varieties.
- Analyzing the impact of speaker background on ASR accuracy based on the included linguistic profiles.
- Studying conversational speech patterns across different English accents.
Strengths
- 40 hours of audio data provides a substantial corpus for analysis.
- Includes a diverse set of English accents from both first and second-language speakers.
- Contains linguistic background profiles for each speaker, adding contextual metadata.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and file formats are unknown, which may limit suitability assessment.
- Data may reflect geographic or demographic bias inherent to the collection method.
Provenance
- Source
- edinburghcstr
- Collection Method
- Likely recorded dyadic conversations between speakers.
- Time Range
- null
- Freshness
- Last updated 2024-02 22 14:24:42.
- Geography
- null