Name: Icelandic University Lectures With Audio And Text
Creator: tiro-is
Published: 2022-07-13T16:41:14
Keywords: Regionus

Description

Kennslurómur is a collection of audio recordings and corresponding text from instructional lectures recorded in courses at the University of Reykjavík and the University of Iceland. The dataset is intended for training speech recognition models, with recordings provided by lecturers, processed by a speech recognizer, and subsequently proofread by students and a professional proofreader.

Use Cases

Train an Icelandic speech recognition model using the audio recordings and corresponding text transcripts.
Develop a forced alignment tool to map text transcripts to specific timestamps in the lecture audio.
Analyze lecture content and vocabulary for linguistic research on academic Icelandic.
Fine-tune a language model on the proofread text corpus for domain-specific natural language processing tasks.

Strengths

Data originates from two major Icelandic universities, providing a source of academic Icelandic.
Text transcripts underwent multiple rounds of correction by students and a professional proofreader.

Limitations

The dataset size, number of rows, and audio duration are unknown, limiting assessment of its scale for model training.
Content is restricted to academic lectures, which may not generalize to other domains or colloquial speech.
Potential for speaker bias as the recordings are from a limited number of lecturers.

Provenance

Source: University of Reykjavík and University of Iceland.
Collection Method: Lectures were recorded, transcribed via speech recognition, and the text was corrected by students and a professional proofreader.
Freshness: Last updated on 2022-08-22.
Geography: Iceland.

The full description and specific details are available only on the linked dataset page. License information is unknown.

Regionus

Icelandic University Lectures With Audio And Text

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info