Name: Zero-Shot Text-to-Speech Robustness Benchmark Across Acoustic Regimes
Creator: KRAFTON
Published: 2026-04-01T16:14:58
Keywords: Size Categories1 Kn10 K, Text To Speech, Task Categoriestext To Speech, Librarypolars, English, Languageen, Licensecc By Nc Nd 40, Speech Synthesis, Modalitytext, Zero Shot Learning, CSV, Modalitytabular, Librarymlcroissant, Evaluation, Librarydatasets, Benchmark, Librarypandas, Tabular, Audio, Zero Shot Tts, Regionus, Robustness, Robustness Evaluation

Description

A 2026 benchmark from KRAFTON provides 6,000 prompt–text pairs for evaluating zero-shot text-to-speech models. It covers four acoustic regimes: Clean, Noisy, Wild, and Emotional, using prompts from 12 different datasets. This framework aims to assess model robustness in realistic and challenging recording scenarios.

Use Cases

Benchmarking TTS model performance on prompts from the 'Wild' acoustic regime to test generalization to unconstrained recordings.
Evaluating emotional speech synthesis capabilities using the 'Emotional' regime's prompt–text pairs.
Assessing model robustness to background noise by comparing outputs on 'Clean' versus 'Noisy' regime prompts.
Analyzing zero-shot TTS performance across 12 distinct source datasets aggregated within the benchmark.

Strengths

6,000 prompt–text pairs for evaluation
Covers 4 distinct acoustic regimes (Clean, Noisy, Wild, Emotional)
Aggregates prompts from 12 different source datasets

Limitations

Specific row counts, column details, and sample data are unavailable for review
The technical report detailing methodology is listed as 'Coming soon'
License information is not explicitly provided in the input

Provenance

Source: KRAFTON, aggregated from 12 datasets.
Collection Method: Curated as an evaluation benchmark from existing TTS datasets.
Freshness: Last updated on 2026-04-02.
Geography: Region tag indicates 'us', but full spatial coverage is unspecified.

The full description and technical report are hosted externally; license details should be verified on the source page before use.

Zero-Shot Text-to-Speech Robustness Benchmark Across Acoustic Regimes

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info