Speech &amp; Audio

TabularAudioDniproPublic UtilitiesUkraineCultural InstitutionsResource Consumption+1

Utility Consumption Data for Dnipro Children's Music School No. 10

Dnipro, Ukraine, provides data on the consumption of communal resources by the city's communal institution of culture, Dnipro Children's Music School No. 10. The dataset likely contains utility usage metrics, such as water or electricity consumption, for the school. It was published on the States site of Ukraine and last updated on December 3, 2021.

Gujarati OpenSLR: Gujarati Speech Recognition Corpus

Gujarati speech recordings and transcriptions categorized for Automatic Speech Recognition (ASR). This dataset provides audio-text pairs sourced from the OpenSLR repository to facilitate public access to Gujarati language resources.

AUDIOFOLDERModalityaudioSize Categoriesn1 KLibrarymlcroissantLibrarydatasetsRegionus+1

Asr Files: Automatic Speech Recognition Files

Comprising audio files for automatic speech recognition (ASR). It is categorized as containing under 1,000 samples and is associated with the US region. The dataset was last updated in January 2022.

TextAudioParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsRegionusLow Resource LanguageSanskrit SpeechAutomatic Speech Recognition+1

Sanskrit Speech Recognition Corpus

84 hours of Sanskrit audio data for training automatic speech recognition models, uploaded by user 'addy88' to Hugging Face in December 2021. The dataset is categorized as containing 10K to 100K samples and includes text transcriptions.

TabularTime SeriesAir TemperatureRemote Sensing ValidationSoil TemperatureSoil MoistureEnvironmental Sensing+1

SMAPVEX19-22: Soil Moisture and Temperature Measurements in Massachusetts

Petersham, Massachusetts hosts ground-based soil moisture, soil temperature, and air temperature measurements from twenty-five temporary stations. The stations were installed across an area of approximately 23 km by 36 km in May 2019 and operated through 2022. The dataset is produced by NSIDC_CPRD and was last updated in October 2021.

TabularAudioParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskLanguage ProcessingModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsRegionusSpeech RecognitionSanskritAudio Evaluation+1

Sanskrit Speech Recognition Evaluation Dataset

An evaluation dataset for Automatic Speech Recognition (ASR) systems in the Sanskrit language. The dataset was created by user 'addy88' and published on the Hugging Face platform in December 2021. Its specific size and structure are not detailed in the provided metadata.

Sharif Emotional Speech Dataset (ShEMO)

3,000 semi-natural Persian speech utterances totaling 3 hours and 25 minutes of audio extracted from online radio plays. The collection features 87 native speakers expressing five primary emotional states including anger, fear, happiness, and sadness.

Arxiv190601155Regionus+1

JSONSize Categories10 Kn100 KLibrarydaskModalitytextLibrarymlcroissantLibrarydatasetsRegionus+1

NST Da 16kHz: Danish Speech Dataset (16kHz)

1 Danish speech dataset from Sprakbanken featuring audio recordings sampled at 16kHz. The collection provides acoustic data specifically for the Danish language to support speech recognition and linguistic research.

TextAudioMultimodalJSONLibrarypolarsSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasAudio TextRegionusBook AudioSpeech RecognitionSpoken Language+1

Book Audio Snippets for Speech Recognition

Audio snippets paired with text transcriptions, sourced from book audio recordings. The dataset was created by JesseParvess and uploaded to Hugging Face in December 2021. Platform tags indicate it contains text and audio modalities for speech recognition tasks.

LibriSpeech English Speech Corpus with 1000 Hours of Audio

The LibriSpeech corpus contains approximately 1000 hours of read English speech, sampled at 16 kHz. It was prepared by Vassil Panayotov with assistance from Daniel Povey, derived from audiobooks in the LibriVox project.

TabularAudioCultural OrganizationFinancial StatementsFinancePublic FinanceMunicipal Institutions+1

Financial Statements of a Ukrainian Municipal Children's Music School

The dataset contains the financial statements of the City Municipal Institution of Culture 'Dnipro Children's Music School Ü10'. It was published on the States site of Ukraine and last updated on October 20, 2021. The data is provided in an Excel (.XLSX) file format.

TextSemantic Role LabelingQuestion AnsweringRegionusNatural Language Processing+1

Multi-News Dataset for Semantic Role Labeling via QA

A dataset for Semantic Role Labeling (SRL) constructed from the Multi-News summarization corpus. It was created by the author 'rubenwol' and uploaded to the Hugging Face platform in November 2021. The dataset applies the Question Answer driven Semantic Role Labeling (QA-SRL) framework to news articles.

AudioGeospatial🇫🇷 FranceTransport InfrastructureSound ClassificationRailway Noise+1

Railway Sound Classification in Hérault, France, with Noise Categories

A 2021 update of a geospatial dataset mapping the sound classification of railway and tramway infrastructure in the Hérault department of France. The classification, established by prefectoral decrees in 2014 and 2007, categorizes land transport infrastructure into five noise levels and defines affected areas on either side of the tracks. The data is provided by the Bureau de Recherches Géologiques et Minières (BRGM) as a Web Map Service (WMS).

AudioGeospatialRoad NoiseEnvironmental NoiseUrban PlanningTransport Infrastructure+1

Road Noise Affected Areas in Maine-et-Loire, France

A French departmental map service identifies land sectors impacted by noise from major transport infrastructure, as mandated by national law. The dataset is based on a prefectural classification of roads with over 5,000 vehicles per day, intercity rail lines with over 50 trains daily, and public transport lines with over 100 buses. It was last updated by the Bureau de Recherches Géologiques et Minières on September 3, 2021.

AudioGeospatial🇫🇷 FranceTransportation NoiseRoad InfrastructureEnvironmental Regulation+1

Sound Classification of Road Infrastructures in Maine-et-Loire, France

BUREAU DE RECHERCHES GÉOLOGIQUES ET MINIÈRES provides a dataset mapping the sound classification of land transport infrastructure in Maine-et-Loire department, France. The classification is mandated by French law (Law No. 92-1444 and the Environmental Code) and identifies sectors affected by noise based on traffic characteristics. The dataset was last updated on 2021-09-03.

TabularAudioCultural InstitutionMunicipal BudgetFinancePublic FinanceUtility Consumption+1

Financial Report for a Ukrainian Children's Music School, January-August 2021

A financial report details the consumption of communal resources for the City Municipal Institution of Culture 'Dnipro Children's Music School Ü14'. The data covers the period from January to August 2021 and was published on the States site of Ukraine. The dataset was last updated on September 9, 2021.

Multi-Speaker Sinhala Audio with Manual Quality Checks

Featuring multi-speaker, high-quality transcribed audio data for the Sinhala language, consisting of wave files and a TSV file. The data was manually quality checked and was collected by Google in Sri Lanka and contributed by the Path to Nirvana organization.

AudioMachine LearningAudio DatasetRegionusEnglish SpeechAudio ProcessingSpeech Recognition+1

Librispeech English Speech Audio Samples

Librispeech Local Dummy is an audio dataset for English speech recognition, hosted on Hugging Face by patrickvonplaten. The dataset was last updated on September 28, 2021. Specific details on size, row count, and recording methodology are not provided in the available metadata.

LibriSpeech English Audio Corpus of 1000 Hours

The LibriSpeech corpus contains approximately 1000 hours of read English speech audio, sampled at 16 kHz. It was prepared by Vassil Panayotov with assistance from Daniel Povey, derived from audiobooks in the LibriVox project.

TabularAudioUtility CostsUkraineCultural InstitutionMunicipal ExpenditureFinancePublic Finance+1

Dnipro Children's Music School No. 14 Utility Expenditures, January-June 2021

Financial report of the City Municipal Institution of Culture 'Dnipro Children's Music School No. 14' on communal expenditures for the first half of 2021. The dataset was published on the States site of Ukraine open data platform on 2021-07-04. It likely contains detailed records of utility payments for a municipal cultural institution.