Sign in to view source links and access this dataset
Description
WavCaps is a dataset for audio-language multimodal research, with audio clips sourced from FreeSound, BBC Sound Effects, SoundBible, and the AudioSet Strongly-labelled Subset. The dataset was created by cvssp and last updated on Hugging Face in July 2023. It uses ChatGPT to assist in generating weakly-labelled captions for the audio content.
Use Cases
Training audio captioning models based on the weakly-labelled text descriptions.
Researching multimodal alignment between audio and language based on the paired clips and captions.
Benchmarking sound event detection systems based on the subset sourced from AudioSet.
Developing weakly-supervised learning methods for audio understanding.
Strengths
Audio clips are sourced from multiple established repositories, including FreeSound, BBC Sound Effects, and SoundBible.
Incorporates a subset from the well-known AudioSet dataset for sound event detection.
Uses a modern AI-assisted method (ChatGPT) for generating captions, which may improve scale and diversity.
Limitations
Row count, column names, and file formats are unknown, which limits suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
The 'weakly-labelled' nature of the captions suggests potential noise or inaccuracies in the text annotations.
Provenance
Source
Audio clips sourced from FreeSound, BBC Sound Effects, SoundBible, and AudioSet Strongly-labelled Subset.