DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Audiocaption: Audio-Text Pairs and Baselines for Audio Description | DataSalon

Home Multimodal & LLMAudiocaption: Audio-Text Pairs and Baselines for Audio Description

Multimodal & LLM

Audiocaption: Audio-Text Pairs and Baselines for Audio Description

Name: Audiocaption: Audio-Text Pairs and Baselines for Audio Description
Creator: RicherMans
Published: 2018-10-18T02:26:58
License: MIT
Keywords: Baseline, Audiocaption

by RicherMans·Updated 2y ago

Available on 1 platform

Description

This audio-text dataset provides paired audio signals and descriptive captions for the first Audiocaption task, released by RicherMans in 2024. It serves as a benchmark for automated audio description systems and includes baseline code for performance evaluation.

Use Cases

Training neural networks to map audio signals to natural language captions
Evaluating the accuracy of the provided baseline models on the Audiocaption task
Analyzing the relationship between acoustic features and descriptive text labels

Strengths

Includes baseline models for the Audiocaption task
Released under the MIT license

Limitations

The total record count and dataset size are not documented in the source metadata
Geographic and environmental diversity of the audio recordings is unknown

Provenance

Source: RicherMans GitHub repository
Freshness: Last updated July 2024.

Users should refer to the GitHub repository for the baseline implementation and data loading scripts.

Baseline Audiocaption

Related Datasets

Quality Score

D24

Description

Source

Reputation

Quality Score

D24

Description

Source

Reputation

Access

Community

79 likes

0 views

Dataset Info

License: MIT
Author: RicherMans
Created: Oct 18, 2018
Updated: Jul 25, 2024
Language: Python
Last synced: Apr 30, 2026

Access

Community

79 likes

0 views

Dataset Info

License: MIT
Author: RicherMans
Created: Oct 18, 2018
Updated: Jul 25, 2024
Language: Python
Last synced: Apr 30, 2026

Audiocaption: Audio-Text Pairs and Baselines for Audio Description

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info