Description

Voicebench Ja contains 4 subsets created by applying speech synthesis to samples from three Japanese text benchmarks: Elyza-tasks-100, M-IFEval, and JamC-QA. The dataset was constructed by SB Intuitions using their internal TTS model and JVS corpus audio prompts to quantitatively evaluate performance gaps between audio and text inputs for language models. It was last updated on March 30, 2026.

Use Cases

Benchmarking language model performance on audio inputs versus text inputs using the Elyza-tasks-100 subset.
Evaluating the robustness of TTS models on complex reasoning tasks from the M-IFEval subset.
Assessing question-answering capabilities of speech-enabled models using the JamC-QA subset.
Analyzing the impact of synthesized speech prompts from the JVS corpus on model inference.

Strengths

Derived from three established Japanese text benchmarks (Elyza-tasks-100, M-IFEval, JamC-QA).
Includes 4 distinct subsets for multi-faceted evaluation.
Uses a consistent TTS model and JVS corpus prompts for synthesis.

Limitations

Sample size for the Elyza subset is only 36 items.
Unknown total row count and dataset size.
Potential bias from using a single, proprietary TTS model for all synthesis.

Provenance

Source: SB Intuitions, derived from Elyza-tasks-100, M-IFEval, and JamC-QA benchmarks.
Collection Method: Samples from text benchmarks were synthesized into audio using an internal TTS model with prompts from the JVS corpus.
Freshness: Last updated March 30, 2026.
Geography: Japan (Japanese language focus).

License is unknown; the full description is truncated and requires visiting the Hugging Face page.

Text Audio Parquet Size Categories1 Kn10 K Librarypolars Librarydask Arxiv190806248 Speech Synthesis Modalitytext Librarymlcroissant Licensecc By Sa 40 Benchmarking Librarydatasets Arxiv260312565 Arxiv250204688 Regionus Japanese Language

Japanese Speech Synthesis Benchmark for Language Model Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info