DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,018 datasets

Speech & Audio

ASR Language Model Training Corpus Version 2.0

Asr Book Lm V2.0 is a text corpus for training language models in automatic speech recognition systems. The dataset was created by author Jiejie and was last updated on March 14, 2022. Its size is categorized as 1K<n<10K, indicating it contains between 1,000 and 10,000 entries.

TextParquetSize Categories1 Kn10 KLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasLanguage ModelSpeech ProcessingRegionusAutomatic Speech RecognitionText Corpus+1

0 views

Speech & Audio

Noise Classification of Finistère Land Transport Infrastructure

BUREAU DE RECHERCHES GÉOLOGIQUES ET MINIÈRES provides a dataset of land transport axes classified by noise level in Finistère, France. The data applies the Prefectural Decree of Finistère Sound Classification No 2004-0101 and a Morbihan decree for Guilligomarc’h. It is intended for planning document study offices and was last updated on February 2, 2022.

AudioGeospatial🇫🇷 FranceEnvironmental PlanningNoise ClassificationTransport Infrastructure+1

0 views

Speech & Audio

LJSpeech: Single Speaker English Speech Dataset

13,100 short audio clips and corresponding transcriptions featuring a single speaker reading from 7 non-fiction books. The dataset totals approximately 24 hours of audio with individual clip durations ranging from 1 to 10 seconds.

Regionus+1

0 views

Speech & Audio

Automatic Speech Recognition Book Corpus for Language Modeling

A text corpus for language modeling, sourced from books and curated for automatic speech recognition tasks. The dataset was created by author Jiejie and last updated in March 2022.

TextParquetLibrarypolarsSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasLanguage ModelRegionusSpeech RecognitionText Corpus+1

0 views

Speech & Audio

LibriSpeech English Speech Corpus for ASR Testing

The LibriSpeech ASR Test dataset contains approximately 1000 hours of 16 kHz English speech derived from LibriVox audiobooks. It was prepared by Vassil Panayotov with assistance from Daniel Povey and is carefully segmented and aligned.

Regionus+1

0 views

Speech & Audio

Open STT: Open Speech-to-Text Dataset

20,000+ hours of Russian speech audio paired with text transcriptions across domains like YouTube, audiobooks, and radio. The collection includes over 2 million utterances categorized by source and acoustic conditions.

RussianSpeech To TextSTTSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Book Corpus for Speech Recognition Language Modeling

A text corpus for language modeling in automatic speech recognition systems, created by Jiejie and hosted on Hugging Face. The dataset was last updated in February 2022. Its size is categorized as 1K to 10K entries.

TextParquetSize Categories1 Kn10 KLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasLanguage ModelSpeech ProcessingRegionusAutomatic Speech RecognitionText Corpus+1

0 views

Speech & Audio

Quran Speech Recognition Dataset from Kaggle

For Quran speech recognition, sourced from Kaggle by author Nuwaisir. The dataset contains text modality data, with specific row and column counts unknown.

CSVSize Categories10 Kn100 KLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionus+1

0 views

Speech & Audio

Preprocessed Reddit Text for Automatic Speech Recognition

Preprocessed text data sourced from Reddit, intended for training or evaluating Automatic Speech Recognition (ASR) systems. The dataset was created by DDSC and last updated on the Hugging Face platform in February 2022. Its size is indicated as between 1 million and 10 million entries.

TextParquetLibrarypolarsSize Categories1 Mn10 MPreprocessed TextModalitytextSocial Media TextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Financial Statements of Kharkiv Municipal Cultural Institution

Financial statements for the municipal cultural institution Kharkiv Specialized Music and Theatre Library named after K.S. Stanislavsky. The dataset was published on the Kharkiv Open Data Portal and automatically placed on the Unified State Open Data Portal of Ukraine. It was last updated on 2021-12-24.

TabularAudioUkraineCultural InstitutionsFinancial StatementsFinanceOpen DataMunicipal Data+1

0 views

Speech & Audio

Maltese Speech Corpus from Common Voice

CommonVoice Mt 8 Processed is a Maltese language audio dataset derived from Mozilla's Common Voice project. The dataset was processed and uploaded by RuudVelo in February 2022. It contains audio recordings paired with corresponding transcriptions for speech technology development.

TextAudioMultilingualMachine TranslationAudio DatasetAudio TextRegionusMultilingual AudioSpeech Recognition+1

0 views

Speech & Audio

Dutch Speech Audio with Transcripts from Common Voice

Common Voice NL 8 Processed is a Dutch-language subset of Mozilla's crowdsourced speech corpus. The dataset was uploaded to Hugging Face by user RuudVelo in February 2022, indicating processing of the eighth version of the Dutch Common Voice data. It contains audio clips paired with text transcriptions for speech technology development.

TextAudioCrowdsourced DataCrowdsourced AudioDutch LanguageRegionusAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Vietnamese Text-to-Speech Processed Audio Dataset

A dataset for Vietnamese text-to-speech synthesis, processed and uploaded to HuggingFace by user geninhu in January 2022. It contains processed audio and corresponding text data, as indicated by platform tags. The specific size and number of samples are not detailed in the available metadata.

TextAudioParquetSize Categories10 Kn100 KText To SpeechLibrarypolarsSpeech SynthesisModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusVietnamese LanguageProcessed Audio+1

0 views

Speech & Audio

Question-Answer Pairs for Verbal Predicate-Argument Structure

Encompassing question-answer pairs designed to model verbal predicate-argument structure. The train split originates from the QASRL Bank (QASRL-v2/LS), constructed via crowdsourcing, while the dev and test splits are from QASRL-GS (Gold Standard).

ModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Utility Consumption Data for Dnipro Children's Music School No. 10

Dnipro, Ukraine, provides data on the consumption of communal resources by the city's communal institution of culture, Dnipro Children's Music School No. 10. The dataset likely contains utility usage metrics, such as water or electricity consumption, for the school. It was published on the States site of Ukraine and last updated on December 3, 2021.

TabularAudioDniproPublic UtilitiesUkraineCultural InstitutionsResource Consumption+1

0 views

Speech & Audio

Gujarati OpenSLR: Gujarati Speech Recognition Corpus

Gujarati speech recordings and transcriptions categorized for Automatic Speech Recognition (ASR). This dataset provides audio-text pairs sourced from the OpenSLR repository to facilitate public access to Gujarati language resources.

Regionus+1

0 views

Speech & Audio

Asr Files: Automatic Speech Recognition Files

Comprising audio files for automatic speech recognition (ASR). It is categorized as containing under 1,000 samples and is associated with the US region. The dataset was last updated in January 2022.

AUDIOFOLDERModalityaudioSize Categoriesn1 KLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Sanskrit Speech Recognition Corpus

84 hours of Sanskrit audio data for training automatic speech recognition models, uploaded by user 'addy88' to Hugging Face in December 2021. The dataset is categorized as containing 10K to 100K samples and includes text transcriptions.

TextAudioParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsRegionusLow Resource LanguageSanskrit SpeechAutomatic Speech Recognition+1

0 views

Speech & Audio

SMAPVEX19-22: Soil Moisture and Temperature Measurements in Massachusetts

Petersham, Massachusetts hosts ground-based soil moisture, soil temperature, and air temperature measurements from twenty-five temporary stations. The stations were installed across an area of approximately 23 km by 36 km in May 2019 and operated through 2022. The dataset is produced by NSIDC_CPRD and was last updated in October 2021.

TabularTime SeriesAir TemperatureRemote Sensing ValidationSoil TemperatureSoil MoistureEnvironmental Sensing+1

0 views

Speech & Audio

Sanskrit Speech Recognition Evaluation Dataset

An evaluation dataset for Automatic Speech Recognition (ASR) systems in the Sanskrit language. The dataset was created by user 'addy88' and published on the Hugging Face platform in December 2021. Its specific size and structure are not detailed in the provided metadata.

TabularAudioParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskLanguage ProcessingModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsRegionusSpeech RecognitionSanskritAudio Evaluation+1

0 views

PreviousPage 95 of 101Next