Name: Customer Service Persian Diarization Dataset with 80 Hours of Synthetic Speech
Creator: atiyehghm
Published: 2026-02-18T08:06:11
Keywords: Customer Service, Multi Speaker, Persian Language, Audio, Synthetic Speech, Synthetic, Speech Diarization

Description

The customer_service_persian_diarization_dataset is a synthetic multi-speaker speech dataset designed for training and evaluating speaker diarization models in Persian (Farsi). It contains approximately 80 hours of audio, built using utterances from a customer service dataset and processed through a synthesis framework to simulate realistic conversational dynamics. The dataset was created by atiyehghm and was last updated on the platform in February 2026.

Use Cases

Training speaker diarization models based on multi-speaker Persian audio.
Evaluating the performance of speech processing systems on synthetic conversational data.
Developing and benchmarking models for customer service call analysis in Persian.
Studying the characteristics of synthetic speech for simulating realistic dialogue dynamics.

Strengths

Approximately 80 hours of total audio duration provides a substantial resource for model training.
Synthetic generation framework likely allows for controlled simulation of conversational dynamics.

Limitations

Dataset is synthetic, which may not fully capture the acoustic and conversational nuances of real-world recordings.
Column-level documentation is absent; field semantics must be inferred after download.
The description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface user atiyehghm
Collection Method: Synthesized from utterances in a customer service dataset using a processing framework.
Freshness: Last updated 2026-02-18 08:28:21; freshness should be verified.

License is unknown; terms of use must be verified before application.

Audio Customer Service Multi Speaker Persian Language Synthetic Speech Synthetic Speech Diarization

Customer Service Persian Diarization Dataset with 80 Hours of Synthetic Speech

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info