Name: VibraVox: French Speech Corpus via Body-Conduction Transducers
Creator: Cnam-LMSSC
Published: 2023-10-18T19:15:20
Keywords: Size Categories10 Kn100 K, Language Creatorsexpert Generated, Task Categoriestext To Speech, Librarypolars, Language Creatorscrowdsourced, Librarydask, Task Categoriesaudio To Audio, Modalitytext, Task Idsspeaker Identification, Librarymlcroissant, Task Categoriesaudio Classification, Librarydatasets, Arxiv240711828, Licensecc By 40, Parquet, Task Categoriesautomatic Speech Recognition, Multilingualitymonolingual, Annotations Creatorsexpert Generated

Description

VibraVox contains between 10,000 and 100,000 French speech recordings captured using body-conduction transducers. Developed by Cnam-LMSSC and documented in Arxiv 2407.11828, this dataset provides a specialized audio-text corpus for speech processing research. It includes expert-generated and crowdsourced annotations for various audio-centric machine learning tasks.

Use Cases

Automatic Speech Recognition (ASR) using body-conducted audio signals and text transcripts
Speaker Identification based on the unique acoustic signatures of body-conduction transducers
Audio-to-Audio synthesis to map or enhance body-conducted speech into standard audio formats

Strengths

Scale of 10,000 to 100,000 records
Expert-generated annotations for high-quality labeling
CC BY 4.0 permissive license for research and commercial use

Limitations

Restricted to the French language
Acoustic properties are specific to body-conduction hardware and may not generalize to standard air-conduction microphone models

Provenance

Source: Cnam-LMSSC
Collection Method: Sensor-based recording using body-conduction transducers with expert and crowdsourced annotation
Freshness: Last updated November 2025; based on research published in 2024.
Geography: France

The dataset is provided in Parquet format. Users should refer to Arxiv paper 2407.11828 for technical specifications regarding the specific transducers used during collection.

Parquet Size Categories10 Kn100 K Language Creatorsexpert Generated Task Categoriestext To Speech Librarypolars Language Creatorscrowdsourced Librarydask Task Categoriesaudio To Audio Modalitytext Task Idsspeaker Identification Librarymlcroissant Task Categoriesaudio Classification Librarydatasets Arxiv240711828 Licensecc By 40 Task Categoriesautomatic Speech Recognition Multilingualitymonolingual Annotations Creatorsexpert Generated

VibraVox: French Speech Corpus via Body-Conduction Transducers

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info