Name: BAS4R: A Multi-Condition Bangla Speech Dataset for Anti-Spoofing Research
Creator: Al Arian Ahmad
Published: 2026-02-11T17:10:11
Keywords: Bangla Speech, Gender Analysis, ZIP, Engineering, Computer and Information Science, Speech Anti Spoofing, Replay Attack Detection, Benchmark, Audio, Large Scale, Natural Language Processing, Audio Processing, Synthetic

Description

120,125 audio files totaling 143.88 hours comprise this dataset for Bangla speech analysis. BAS4R contains both authentic and spoofed speech from 200 native speakers across ten Bangladeshi districts. Al Arian Ahmad contributed this dataset to Harvard Dataverse, with a last update recorded on 2026-05-22.

Use Cases

Train anti-spoofing classifiers based on systematically generated spoofed speech samples.
Evaluate speaker verification robustness based on recordings under realistic acoustic and channel-degraded conditions.
Develop gender-aware voice analysis models based on speech from 110 male and 90 female participants.
Research accent-robust spoofing detection based on regional pronunciation variability from ten districts.

Strengths

Large scale with 120,125 audio files totaling approximately 143.88 hours of speech.
Structured organization into five major spoofing categories with defined file counts (e.g., 28,830 files per spoofing category).
Diverse speaker pool of 200 native Bangla speakers from ten districts, capturing regional linguistic diversity.
Systematically generated spoofed samples covering multiple conditions like GSM codec, telephone transmission, and pitch shift.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for certain modeling tasks.

Provenance

Source: Harvard Dataverse
Collection Method: Speech samples collected from 200 native Bangla speakers under controlled and realistic acoustic conditions; spoofed samples generated via physical replay setups, communication channels, effect-based modifications, and signal-processing transformations.
Time Range: null
Freshness: Last updated 2026-05-22 13:50:41; freshness should be verified.
Geography: Ten districts of Bangladesh: Barishal, Chapainawabganj, Chittagong, Habiganj, Kishoreganj, Kushtia, Naogaon, Narail, Pabna, and Sylhet.

null

Audio ZIP Bangla Speech Gender Analysis Engineering Computer and Information Science Speech Anti Spoofing Replay Attack Detection Benchmark Large Scale Natural Language Processing Audio Processing Synthetic

BAS4R: A Multi-Condition Bangla Speech Dataset for Anti-Spoofing Research

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info