AF-Chat is a fine-tuning dataset of approximately 75,000 multi-turn conversations involving audio clips, created by NVIDIA. The conversations are multi-audio, with an average of 4.6 clips and 6.2 turns per conversation, spanning speech, environmental sounds, and music. The dataset was last updated on July 21, 2025.
Use Cases
- Fine-tuning audio-text conversational models based on multi-turn dialogues.
- Training models for audio question answering based on multi-audio context.
- Developing reasoning capabilities for LLMs based on combined speech, sound, and music inputs.
Strengths
- Approximately 75,000 high-quality conversation examples.
- Multi-audio context with an average of 4.6 clips and 6.2 turns per conversation.
- Audio sourced from established datasets like YouTube8m and AudioSet.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- NVIDIA, with audio sourced from YouTube8m and AudioSet.
- Collection Method
- Likely curated and synthesized from existing audio datasets for fine-tuning.
- Time Range
- null
- Freshness
- Last updated 2025-07-21 17:49:49.
- Geography
- null