DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,009 datasets

Speech & Audio

Classical Music Compositions In Midi Format

MIDI files represent classical compositions from renowned artists like Bach, Beethoven, Chopin, and Mozart. The collection is organized into directories by composer. It was created by user 'drengskapur' and last updated in July 2024.

AudioCSVSize Categories1 Kn10 KLibrarypolarsLanguageenClassical MusicClassicalModalitytextLibrarymlcroissantMusic AnalysisMusic GenerationLibrarydatasetsMidiComposer AnalysisLibrarypandasRegionusComposersLicensemit+1

0 views

Speech & Audio

Khm Asr Data Test: Khmer Automatic Speech Recognition Data

A dataset named 'Khm Asr Data Test' was published on the HuggingFace platform by author 'rinabuoy' on August 16, 2024. The title suggests it likely contains audio data for testing Khmer language automatic speech recognition (ASR) systems. The dataset's specific content, size, and structure are not detailed in the provided metadata.

AudioKhmer LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

viVoice: Vietnamese Multi-Speaker Speech Synthesis with 100K+ Records

viVoice provides between 100,000 and 1,000,000 Vietnamese audio-text pairs for multi-speaker speech synthesis, released by capleaf in 2024. The dataset is specifically formatted for text-to-speech tasks and is distributed via Parquet files.

ParquetTask Categoriestext To SpeechLibrarypolarsLibrarydaskModalityaudioLicensecc By Nc Sa 40ModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionusLanguagevi+1

0 views

Speech & Audio

VoxLingua107: 107-Language Speech Dataset for Language ID

VoxLingua107 is a speech dataset for training spoken language identification models. It contains 6628 hours of short speech segments sourced from YouTube videos, covering 107 languages. The dataset was created by SEACrowd and was last updated in June 2024.

AudioMultilingualYoutube SourcedLanguage DetectionAudio ClassificationSpeech IdentificationMultilingual Audio+1

0 views

Speech & Audio

MusicScore: A Large-Scale Dataset of Music Score Images with Metadata

MusicScore is a large-scale dataset of music score images paired with textual metadata. It was collected and processed from the International Music Score Library Project (IMSLP) by authors Yuheng Lin, Zheqi Dai, and Qiuqiang Kong. The dataset was last updated on June 20, 2024.

ImageTextAudioImage Text PairsMusic ScoresMusic GenerationImslpComputer VisionLarge Scale+1

0 views

Speech & Audio

Unlabeled English Audiobook Speech for ASR Benchmarking

Libri-light is a dataset of 60,000 hours of unlabeled English speech audio from audiobooks. It serves as a benchmark for training automatic speech recognition systems with limited or no supervision.

Regionus+1

0 views

Speech & Audio

ChartQA: Chart Images with Question-Answering Data

ChartQA is a multimodal dataset hosted by ahmed-masry on Hugging Face, last updated on June 22, 2024. It likely contains chart images paired with textual questions and answers for visual question answering tasks. The dataset requires manual download of a zip file and cannot be loaded directly via the standard datasets library function.

MultimodalChart AnalysisVisual Question Answering+1

0 views

Speech & Audio

YODAS: 369,510 Hours of YouTube Speech and Captions

369,510 hours of speech audio and text captions sourced from YouTube, released by the espnet team in 2024. The dataset pairs audio utterances with either user-uploaded (manual) or system-generated (automatic) captions.

Arxiv240600899Regionus+1

0 views

Speech & Audio

KZSMO Musical School No3 Agreements from 2019 to Present

Agreements for KZSMO "Musical School No3" in the KMR have been concluded from 2019 to the present. The dataset is sourced from the States site of Ukraine and was last updated on June 12, 2024. The specific contents and scale of the agreements are not detailed.

TabularCSVGovernment AgreementsUkraineEducationPublic Contracts+1

0 views

Speech & Audio

Additional Agreements to Music School Contracts in Ukraine, 2019-Present

Additional agreements to contracts from 2019 to the present time for KZSMO 'Musical School No3' KCC. The data originates from the States site of Ukraine and was last updated on 2024-06-12. The specific number of contracts, rows, and file size are not provided in the metadata.

TabularCSVUkraineMusic EducationPublic Contracts+1

0 views

Speech & Audio

Olivenza Folk Music Collection: Love Songs and Agricultural Work Songs

Most songs collected are love songs, touching on themes of nostalgia and saudade as well as lively dances. The collection process involved interviewing people and learning about their lives through songs linked to agricultural work and annual cycles. The dataset was coordinated by Álvarez Pérez, Xosé Afonso and last updated in May 2024.

TextAudioFolkloreOral HistoryEthnomusicologyCultural Heritage+1

0 views

Speech & Audio

LibriSpeech Audio Samples for Speech Recognition Testing

LibriSpeech ASR Dummy is a small-scale dataset from Hugging Face's internal testing, containing audio-text pairs for English speech recognition. It was created by hf-internal-testing and last updated in June 2024. The dataset is categorized as 'n1 K', indicating it contains approximately 1,000 samples.

TextAudioParquetMachine LearningLibrarypolarsModalityaudioSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasEnglish LanguageRegionusAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Jesús López Oral History: Biography and Language Use in San Martín de Trevellu

Álvarez Pérez, Xosé Afonso coordinated this oral history dataset from the e-cienciaDatos Harvested Dataverse, last updated on 2024-05-05. It contains a biographical narrative from an informant, Jesús López, detailing his life, education, and language use in the San Martín de Trevellu/Trevejo area. The description suggests the data covers topics such as schooling, bilingualism between Spanish and the local 'lagarteiro' language, and cultural practices like music and festivals.

TextAudioOral HistorySpanish Portuguese BorderLinguisticsEthnographyBiography+1

0 views

Speech & Audio

Olivenza Folk Music Recordings of José Tomás Sousa

José Tomás Sousa (Olivenza). Folklore musical de Olivenza (I) is a collection of folk music recordings from Olivenza, Spain. The dataset, coordinated by Álvarez Pérez, Xosé Afonso, was last updated on May 5, 2024. It focuses on the 'saias' genre and includes other types like occasional songs, gaios, vira, fados, and corridinhos.

AudioMusical GenresFolk MusicCultural HeritageOlivenza Spain+1

0 views

Speech & Audio

Arabic Speech Corpus: South Levantine Dialect Recordings

Nawar Halabi at the University of Southampton developed this speech corpus as part of PhD work. Recordings were made in a professional studio using the south Levantine Arabic dialect with a Damascian accent. Synthesized speech output from this corpus has reportedly produced a high-quality, natural voice.

AudioAudio DatasetArabic SpeechSpeech CorpusLevantine ArabicNatural Language Processing+1

0 views

Speech & Audio

Hindi Text-to-Speech Audio Samples

A collection of Hindi speech audio files for text-to-speech synthesis, created by the user skywalker290 and hosted on Hugging Face. The dataset was last updated in June 2024 and is categorized as containing between 10,000 and 100,000 samples based on platform tags.

TextAudioParquetSize Categories10 Kn100 KText To SpeechLibrarypolarsHindi LanguageAudio DatasetLibrarydaskModalityaudioSpeech SynthesisModalitytextLibrarymlcroissantHindi SpeechLibrarydatasetsRegionusAudio Generation+1

0 views

Speech & Audio

OpenAI Voices: TTS Audio Samples from Sky and Juniper Models

A collection of text-to-speech audio samples collected from the OpenAI API and app. The dataset includes samples from the Sky and Juniper voices, stored as clean lossless audio files. It was uploaded by leafspark and last updated on May 22, 2024.

AudioAUDIOFOLDERText To SpeechTask Categoriestext To SpeechLanguageenOpenai ApiAudio SamplesSize Categoriesn1 KLibrarymlcroissantLibrarydatasetsRegionusVoice SynthesisLicenseapache 20+1

0 views

Speech & Audio

Audioloader

Two categories of audio data, speech and music, are provided in a format compatible with the PyTorch framework. This dataset serves as a specialized loader for acoustic analysis and machine learning tasks.

AudioPytorch+1

0 views

Speech & Audio

Xhosa Speech Corpus from the NCHLT Project

NCHLT Speech Corpus Xhosa contains audio recordings of the Xhosa language, a major South African language. The dataset was created by Beijuka and uploaded to Hugging Face in June 2024. It is part of the National Centre for Human Language Technology (NCHLT) initiative.

AudioParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalityaudioModalitytextAfrican LanguageLibrarymlcroissantLibrarydatasetsXhosa SpeechRegionusNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

Gloria: Folk Music and Dance Traditions in Olivenza Villages

Descriptive text data on folk music and dance traditions from the Olivenza region, likely documenting cultural practices. The dataset was coordinated by Álvarez Pérez, Xosé Afonso and harvested into the e-cienciaDatos Dataverse platform. It was last updated on May 5, 2024.

TextOral TraditionIberian PeninsulaEthnomusicologyCultural HeritageFolk Dance+1

0 views

PreviousPage 83 of 101Next