Description

Japanese audio data contains 266 hours of speech processed by Scribe v1 for automatic speech recognition and classified using Facebook's audio aesthetics model as a prefilter. The dataset is derived from the Japanese portion of the Emilia Yodas collection and is licensed under CC BY 4.0. It includes text transcriptions and aesthetic scores for audio events.

Use Cases

Train or fine-tune automatic speech recognition (ASR) models using the Japanese speech data and its Scribe v1 transcriptions.
Develop audio quality classifiers by leveraging the Facebook audio aesthetics scores as a prefilter or training signal.
Analyze the relationship between audio aesthetic scores and transcription accuracy across different audio events.
Build Japanese language models or text-to-speech systems using the transcribed audio corpus.

Strengths

Contains 266 hours of Japanese audio data, providing a substantial volume for model training.
Includes both automatic speech recognition outputs from Scribe v1 and audio aesthetic scores from a Facebook model.
Derived from the established Emilia Yodas dataset, indicating a structured source.

Limitations

The dataset is described as a 'v1' version, suggesting it is an initial release that may contain inconsistencies or require refinement.
Specific details on audio quality, speaker demographics, or transcription accuracy are not provided in the input.
The sample data and exact column structure are unavailable, limiting pre-download assessment.

Provenance

Source: Japanese portion of the Emilia Yodas dataset from Hugging Face.
Collection Method: Audio processed via Scribe v1 (ElevenLabs STT/ASR) and classified using Facebook audio aesthetics model.
Freshness: Last updated on 2025-04-14.
Geography: Japan (Language: Japanese).

The creator notes the dataset is at a 'v1' stage and invites collaboration via a provided Discord link. Full transaction timestamps from Scribe v1 are available under a CC BY 4.0 NC license from a separate location.

Parquet Librarypolars Librarydask Modalitytext Size Categories100 Kn1 M Librarymlcroissant Librarydatasets Licensecc By 40 Regionus

Japanese Audio Events with Speech Recognition and Aesthetic Scores

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info