Name: Synthetic Medical Speech Dataset for Clinical ASR Fine-Tuning
Creator: intelmedica
Published: 2026-04-04T15:45:27
Keywords: Clinical Terminology, Speech Synthesis, Medical Speech, Healthcare, Asr Training, Text, Audio, Synthetic, Synthetic Audio

Description

A synthetic medical speech dataset contains 101,475 audio-text pairs totaling 184.1 hours of 16 kHz mono speech. It was generated by IntelMedica using the Kokoro-82M TTS system with 19 voices across three English accent groups, focusing on clinical and nursing terminology. The dataset version was noted in April 2026.

Use Cases

Fine-tune Whisper-based ASR models on clinical terminology using the synthetic audio-text pairs.
Evaluate ASR model performance on medical terms sourced from RxNorm API and FDA data referenced in the dataset.
Train multi-accent medical ASR systems leveraging the 19 distinct synthetic voices across three English accent groups.
Benchmark speech synthesis quality for medical domain terms using the Kokoro-82M TTS generated audio.
Augment real medical speech datasets with the 184.1 hours of synthetic speech to improve model robustness.

Strengths

101,475 audio-text pairs provides a substantial corpus for training.
184.1 hours of speech offers extensive audio data for model fine-tuning.
Focus on clinical and nursing terminology from specific sources like RxNorm and FDA adds domain relevance.

Limitations

All data is synthetically generated, which may not capture nuances of real human medical speech.
Limited to three English accent groups, potentially lacking global dialect coverage.
Companion version v1 has more samples (125,500) and hours (~257), suggesting v2 is a focused subset.

Provenance

Source: IntelMedica.
Collection Method: Synthetically generated using the Kokoro-82M text-to-speech system.
Freshness: Dataset page was updated in April 2026.
Geography: English accent groups, specific regions not detailed.

License details are unknown and should be verified before use. It is a companion to a larger v1 dataset with different scope. Full description requires visiting the Hugging Face dataset page.

Text Audio Clinical Terminology Speech Synthesis Medical Speech Healthcare Asr Training Synthetic Synthetic Audio

Synthetic Medical Speech Dataset for Clinical ASR Fine-Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info