Mantooq2: 7,168 Hours of Validated Arabic Audiobook Speech

Name: Mantooq2: 7,168 Hours of Validated Arabic Audiobook Speech
Creator: AlgoRythmetic
Published: 2026-04-19T06:31:09
Keywords: Arabic Speech, Audiobook, Audio, Language Data, Speech Recognition

by AlgoRythmeticUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Arabic speech data comprising 7,168 hours of validated audio across approximately 3,957,670 segments from 1,229 books. The dataset, created by AlgoRythmetic, was last updated on April 21, 2026. Audio is provided in 16 kHz mono FLAC format and is organized into 2,639 parquet shards.

Use Cases

Train Arabic automatic speech recognition (ASR) models based on the large volume of validated speech segments.
Develop text-to-speech (TTS) systems for Arabic using the audiobook content mentioned in the description.
Fine-tune language models on Modern Standard Arabic and dialectal speech data.
Conduct linguistic analysis of Arabic prosody and phonetics using the segmented audiobook data.

Strengths

Large scale with 7,168 hours of validated audio.
Rigorous validation process applied to every shard, checking schema, row groups, FLAC integrity, and metadata.
Content sourced from 1,229 books, providing diverse textual material.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
The description indicates a genre breakdown is not fully visible, limiting content assessment.
Data may reflect source bias inherent to the specific collection of 1,229 books.

Provenance

Source: huggingface, author AlgoRythmetic
Collection Method: Likely extracted and processed from audiobook sources.
Time Range: null
Freshness: Last updated 2026-04-21 10:05:49
Geography: null

null

Audio Arabic Speech Audiobook Language Data Speech Recognition

Related Datasets

Quality Score

C41

Description

48

Source

39

Reputation

41

Access

22

Community

64 downloads

1 likes

0 views

Dataset Info

Author: AlgoRythmetic
Created: Apr 19, 2026
Updated: Apr 21, 2026
Last synced: May 7, 2026

Access

22

Community

64 downloads

1 likes

0 views

Dataset Info

Author: AlgoRythmetic
Created: Apr 19, 2026
Updated: Apr 21, 2026
Last synced: May 7, 2026

Mantooq2: 7,168 Hours of Validated Arabic Audiobook Speech

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info