Speech2Latex: 66,000 Audio Samples of Mathematical Expressions

Name: Speech2Latex: 66,000 Audio Samples of Mathematical Expressions
Creator: marsianin500
Published: 2024-11-01T15:31:43
Keywords: Mathematics, Latex, Speech To Text, Multilingual, Audio, Large Scale, Multimodal

by marsianin500Updated 8mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

66,000 human-annotated audio samples of spoken mathematical equations and sentences in English and Russian form the Speech2LaTeX dataset. It is the first fully open-source large-scale dataset for converting spoken math to LaTeX, drawn from diverse scientific domains. The dataset was created by marsianin500 and last updated on November 16, 2025.

Use Cases

Training speech recognition models for mathematical notation based on annotated audio samples.
Developing multimodal AI systems for scientific content understanding based on the dataset's focus.
Benchmarking models for converting spoken English and Russian math into LaTeX code.
Researching cross-lingual audio processing for technical domains based on the bilingual content.

Strengths

Over 66,000 audio samples provide a substantial corpus for training.
Human-annotated labels ensure high-quality ground truth for LaTeX conversion.
Bilingual content in English and Russian supports multilingual model development.
Drawn from diverse scientific domains, offering varied mathematical contexts.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2025-11-16 21:58:46; freshness should be verified.

Provenance

Source: huggingface
Collection Method: Human-annotated audio samples drawn from diverse scientific domains.
Freshness: Last updated 2025-11-16 21:58:46.

License is unknown; users should verify terms before use.

Audio Multimodal Multilingual Mathematics Latex Speech To Text Large Scale

Related Datasets

Quality Score

D40

Description

39

Source

41

Reputation

48

Access

26

Community

386 downloads

6 likes

0 views

Dataset Info

Author: marsianin500
Created: Nov 1, 2024
Updated: Nov 16, 2025
Last synced: May 9, 2026

Access

26

Community

386 downloads

6 likes

0 views

Dataset Info

Author: marsianin500
Created: Nov 1, 2024
Updated: Nov 16, 2025
Last synced: May 9, 2026

Speech2Latex: 66,000 Audio Samples of Mathematical Expressions

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info