Sign in to view source links and access this dataset
Description
A collection of conversational turns with audio recordings and transcripts. The dataset includes columns for conversation identifiers, speaker agents, prompts sent to a Gemini Live model, spoken transcripts, and audio durations. It was created by ShiniChien and last updated on May 18, 2026.
Use Cases
Train or evaluate text-to-speech models based on the provided audio and transcript pairs.
Analyze conversational patterns and agent behavior based on turn-by-turn dialogue data.
Develop multimodal AI systems that integrate speech and text based on the synchronized audio and transcript fields.
Benchmark speech generation quality using the specified TTS voice and duration metadata.
Strengths
Includes synchronized audio files (WAV format) and text transcripts for each conversational turn.
Contains structured metadata such as conversation IDs, turn indices, agent names, and prompt instructions.
Limitations
Dataset size, row count, and file formats beyond audio are unknown, limiting suitability assessment.
Column-level documentation beyond the provided list is absent; field semantics may require further inference.
Freshness should be verified as the last update timestamp is in the future (2026-05-18).
Provenance
Source
huggingface
Collection Method
Likely generated from interactions with a Gemini Live model.