Sign in to view source links and access this dataset
Description
A derived version of the Technical Indian English (TIE) dataset, which contains approximately 8,000 hours of speech from around 9,800 technical lectures in English. The original content was sourced from the NPTEL platform, with lectures averaging 50 minutes each and delivered by instructors from various regions across India. The dataset was created by author 'raianand' and was last updated on the Hugging Face platform in November 2024.
Use Cases
Train automatic speech recognition (ASR) models based on technical lecture audio.
Develop accent and dialect models for Indian English based on speech from diverse regional instructors.
Create educational tools for technical subjects using segmented lecture audio clips.
Conduct linguistic analysis of technical terminology usage in spoken English.
Strengths
Original dataset contains a large scale of approximately 8,000 hours of speech.
Source material consists of around 9,800 lectures, providing substantial volume.
Lectures cover a wide range of technical subjects, offering domain-specific speech data.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
The specific derivation method and content of the 'shorts' version are not detailed in the provided input.
Provenance
Source
NPTEL platform.
Collection Method
Sourced from technical lecture recordings.
Freshness
Last updated 2024-11-16 07:43:44; freshness should be verified.
Geography
India
License is unknown; users must verify terms of use before downloading.