Name: Simulated Medical Exam Conversations with Speech Metadata, 25,706 Examples
Creator: WhissleAI
Published: 2024-09-26T10:17:20
Keywords: Simulated Conversations, Medical Speech, Healthcare, Asr Training, Audio, Synthetic, Multimodal, Speech Metadata

Description

25,706 simulated patient-physician conversations in English, focusing on respiratory exams, with audio provided as 16 kHz WAV files. The dataset was created by WhissleAI for training automatic speech recognition models and was last updated on June 1, 2026. It includes annotations for speaker changes, emotions, intents, and roles.

Use Cases

Training ASR models to transcribe medical conversations based on the described audio and annotations.
Developing models to identify speaker roles (patient/physician) based on the annotated metadata.
Analyzing emotional and intent patterns in clinical dialogue based on the described speech annotations.
Creating synthetic training data for healthcare NLP systems based on the simulated interview structure.

Strengths

25,706 examples provide a substantial corpus for model training.
Includes multiple annotation layers such as speaker changes, emotions, intents, and roles.
Audio is provided in a standard 16 kHz WAV format.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Data is simulated, which may not fully capture the nuances of real-world clinical interactions.
The description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: WhissleAI via Hugging Face.
Collection Method: Simulated medical interviews with a respiratory focus.
Time Range: null
Freshness: Last updated 2026-06-01 14:33:41; freshness should be verified.
Geography: null

null

Audio Multimodal Simulated Conversations Medical Speech Healthcare Asr Training Synthetic Speech Metadata

Simulated Medical Exam Conversations with Speech Metadata, 25,706 Examples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info