Name: WavCaps: ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset
Creator: cvssp
Published: 2023-04-12T08:09:04
Keywords: Weakly Supervised, Sound Event Detection, Multimodal Learning, Audio, Multimodal

Description

WavCaps is a dataset for audio-language multimodal research, with audio clips sourced from FreeSound, BBC Sound Effects, SoundBible, and the AudioSet Strongly-labelled Subset. The dataset was created by cvssp and last updated on Hugging Face in July 2023. It uses ChatGPT to assist in generating weakly-labelled captions for the audio content.

Use Cases

Training audio captioning models based on the weakly-labelled text descriptions.
Researching multimodal alignment between audio and language based on the paired clips and captions.
Benchmarking sound event detection systems based on the subset sourced from AudioSet.
Developing weakly-supervised learning methods for audio understanding.

Strengths

Audio clips are sourced from multiple established repositories, including FreeSound, BBC Sound Effects, and SoundBible.
Incorporates a subset from the well-known AudioSet dataset for sound event detection.
Uses a modern AI-assisted method (ChatGPT) for generating captions, which may improve scale and diversity.

Limitations

Row count, column names, and file formats are unknown, which limits suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
The 'weakly-labelled' nature of the captions suggests potential noise or inaccuracies in the text annotations.

Provenance

Source: Audio clips sourced from FreeSound, BBC Sound Effects, SoundBible, and AudioSet Strongly-labelled Subset.
Collection Method: ChatGPT-assisted weakly-labelled caption generation.
Time Range: null
Freshness: Last updated 2023-07-06 13:28:10; freshness should be verified.
Geography: null

License is unknown; restrictions should be verified before use.

Audio Multimodal Weakly Supervised Sound Event Detection Multimodal Learning

WavCaps: ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info