Arabic Voice Agent End-of-Turn 5M: 5 Million Synthetic Dialogue Turns

Name: Arabic Voice Agent End-of-Turn 5M: 5 Million Synthetic Dialogue Turns
Creator: sarjai
Published: 2026-05-25T13:26:19
Keywords: Arabic Speech, Turn Taking, Dialogue Systems, Text, Voice Agents, Audio, Synthetic Data, Synthetic

by sarjaiUpdated 22d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

sarjai's Arabic Voice Agent End-of-Turn 5M dataset contains 5,000,000 synthetic Arabic voice-agent end-of-turn examples for training and evaluating turn-taking models. Each row is a two-turn text situation with an agent_turn and a simulated user_stt_text surface, labeled for whether the user has finished speaking. The dataset was last updated on Hugging Face in May 2026.

Use Cases

Train turn-taking prediction models based on labeled end-of-turn examples.
Evaluate dialogue system performance based on simulated user speech text.
Benchmark models for Arabic conversational AI based on the two-turn text structure.
Study synthetic dialogue generation patterns for voice agents.

Strengths

Contains 5,000,000 total examples, providing a large-scale resource.
Includes a dedicated validation split of 50,000 rows and a test split of 50,000 rows.
Focuses on a specific, high-value NLP task for Arabic conversational AI.

Limitations

Data is synthetic, which may not fully reflect real-world conversational patterns.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: sarjai on Hugging Face.
Collection Method: Synthetically generated.
Time Range: null
Freshness: Last updated 2026-05-25 13:28:43; freshness should be verified.
Geography: null

null

Text Audio Arabic Speech Turn Taking Dialogue Systems Voice Agents Synthetic Data Synthetic

Related Datasets

Quality Score

D40

Description

42

Source

44

Reputation

35

Access

26

Community

1 likes

0 views

Dataset Info

Author: sarjai
Created: May 25, 2026
Updated: May 25, 2026
Last synced: May 26, 2026

Access

26

Community

1 likes

0 views

Dataset Info

Author: sarjai
Created: May 25, 2026
Updated: May 25, 2026
Last synced: May 26, 2026

Arabic Voice Agent End-of-Turn 5M: 5 Million Synthetic Dialogue Turns

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info