DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Gigaspeech2 Typhoon: 1,000 Thai Speech Samples for ASR Benchmarking | DataSalon

Home Speech & AudioGigaspeech2 Typhoon: 1,000 Thai Speech Samples for ASR Benchmarking

Speech & Audio

Gigaspeech2 Typhoon: 1,000 Thai Speech Samples for ASR Benchmarking

Name: Gigaspeech2 Typhoon: 1,000 Thai Speech Samples for ASR Benchmarking
Creator: typhoon-ai
Published: 2026-01-16T09:19:14
Keywords: Audio Metadata, Benchmarking, Audio, Natural Language Processing, Speech Recognition, Thai Language

by typhoon-ai·Updated 2mo ago

Available on 1 platform

Description

A metadata-only reference dataset containing 1,000 test samples for Thai speech recognition benchmarking. The dataset, created by typhoon-ai, provides audio IDs and human transcriptions derived from the Gigaspeech2 corpus, with the last update recorded on 2026-05-18. Each audio_id links to the original Gigaspeech2 dataset for audio file retrieval.

Use Cases

Benchmark ASR model accuracy based on the provided human transcriptions
Evaluate model performance on Thai speech using the standardized test set
Link metadata to original audio files from the Gigaspeech2 corpus for model training or validation

Strengths

Provides a standardized benchmark with 1,000 test samples specifically for Thai ASR
Includes human transcriptions for ground-truth evaluation
Offers direct links to the original Gigaspeech2 audio files via audio_id

Limitations

Row count for the full dataset is unknown, which may limit suitability assessment
Column-level documentation is absent; field semantics must be inferred after download

Provenance

Source: Derived from the Gigaspeech2 corpus.
Collection Method: Metadata and transcriptions extracted from a larger speech corpus.
Time Range: null
Freshness: Last updated 2026-05-18 04:22:03; freshness should be verified
Geography: null

License is unknown; users must verify terms before use.

Audio Audio Metadata Benchmarking Natural Language Processing Speech Recognition Thai Language

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

71 downloads

1 likes

0 views

Dataset Info

Author: typhoon-ai
Created: Jan 16, 2026
Updated: May 18, 2026
Last synced: Jun 12, 2026

Access

Community

71 downloads

1 likes

0 views

Dataset Info

Author: typhoon-ai
Created: Jan 16, 2026
Updated: May 18, 2026
Last synced: Jun 12, 2026

Gigaspeech2 Typhoon: 1,000 Thai Speech Samples for ASR Benchmarking

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info