VoxBox: Bilingual Speech Corpus with Annotations and Metadata

Name: VoxBox: Bilingual Speech Corpus with Annotations and Metadata
Creator: SparkAudio
Published: 2025-04-07T02:04:49
Keywords: Bilingual Speech, Speech Corpus, Audio, Natural Language Processing, Audio Transcription, Multimodal, Speech Metadata

by SparkAudioUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

VoxBox is a curated collection of bilingual speech corpora annotated with clean transcriptions and metadata. The dataset was created by SparkAudio and was last updated on April 15, 2025. It includes audio files and JSONL metadata files organized by sub-corpus, such as aishell-3, casia, commonvoice_cn, and wenetspeech4tts.

Use Cases

Train automatic speech recognition models based on the clean transcriptions mentioned in the description.
Develop text-to-speech synthesis systems based on the bilingual audio corpora.
Analyze speech patterns based on the included metadata such as age, gender, and emotion.
Build multilingual speech processing pipelines based on the bilingual nature of the dataset.

Strengths

Curated collection of multiple established speech corpora, including aishell-3 and wenetspeech4tts.
Includes metadata attributes such as age, gender, and emotion for each speech sample.
Provides clean transcriptions for the audio data.

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: SparkAudio
Collection Method: Curated collection from multiple bilingual speech corpora.
Freshness: Last updated 2025-04-15 07:43:25; freshness should be verified.

License is unknown and should be verified before use.

Audio Multimodal Bilingual Speech Speech Corpus Natural Language Processing Audio Transcription Speech Metadata

Related Datasets

Quality Score

C40

Description

39

Source

44

Reputation

46

Access

26

Community

17.4K downloads

71 likes

0 views

Dataset Info

Author: SparkAudio
Created: Apr 7, 2025
Updated: Apr 15, 2025
Last synced: Jun 8, 2026

Access

26

Community

17.4K downloads

71 likes

0 views

Dataset Info

Author: SparkAudio
Created: Apr 7, 2025
Updated: Apr 15, 2025
Last synced: Jun 8, 2026

VoxBox: Bilingual Speech Corpus with Annotations and Metadata

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info