DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

NLP & Text Datasets | DataSalon

All Categories

📝

NLP & Text

Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora

44,786 datasets

NLP & Text

Mannheimia Haemolytica Phenotypic Data on Carbon Utilization and Coculture

Raw data from 2026 experiments investigating phenotypic differences between two genotypes of the ruminant respiratory pathogen Mannheimia haemolytica. The dataset includes optical density measurements for carbon source utilization, iron restriction response, and coculture interactions for 20 bacterial isolates. It was authored by Janet Hill and last updated on April 30, 2026.

TabularExcelBacterial GrowthPhenotypic AnalysisCocultureMicrobiologyRuminant Pathogen+1

0 views

NLP & Text

MURAD: Multi-domain Unified Reverse Arabic Dictionary with 95,000 Word-Definition Pairs

MURAD is an open Arabic lexical dataset containing 95,000 word-definition pairs. It was created by riotu-lab and is designed to support research in computational linguistics and Arabic natural language processing. The dataset spans multiple scientific, religious, and linguistic domains.

TextLexicographyComputational LinguisticsArabic LexiconNatural Language Processing+1

0 views

NLP & Text

Subsidized Health Regime Population in Cali's Central Health Network, by Age and Gender

A dataset characterizing users of the subsidized health regime assigned to the Central Health Network in Cali, Colombia. The data is disaggregated by life cycle stage and gender, with columns indicating healthcare provider, user sex, age group, and municipality. The dataset was last updated on 2026-05-18 and is hosted on the Colombian open data portal.

TabularCSVXMLJSONCali ColombiaHealthcarePopulation DemographicsSubsidized Health Regime+1

0 views

NLP & Text

Ontario Vital Statistics Annual Report

Annual operational statistics compiled from registrations of births, marriages, deaths, still-births, adoptions, and name changes in Ontario. The Government of Ontario's Office of the Registrar General publishes these reports to provide data for research and public policy-making. Each report covers events from a single calendar year.

TabularOntarioPublic PolicyVital StatisticsDemographics+1

0 views

NLP & Text

Alberta Economic Multipliers by Industry and Commodity for 2022

Alberta Economic Multipliers By Industry and Commodity contains economic multipliers used to assess the impacts of changes in final demand or industry output. The Government of Alberta produced the data, which models 220 industries and 273 commodities based on the 2022 NAICS and NAPCS classifications. The dataset was last updated in April 2026.

TabularInput Output AnalysisEconomic MultipliersIndustry ClassificationFinanceAlberta Canada+1

0 views

NLP & Text

SNDRSNIML2CCPCCR: Suomi NPP Cloud-Cleared Radiances for Atmospheric Profiling

Data from the Suomi NPP satellite's CrIS/ATMS instruments, processed with the CLIMCAPS algorithm to produce cloud-cleared radiances. The dataset provides infrared and microwave spectral data from 2211 CrIS channels and 22 ATMS channels, organized into 240 six-minute granules per day. It is used to infer atmospheric state variables, with a latency of 3 to 7 weeks due to its reliance on MERRA-2 reanalysis for initial conditions.

Time SeriesGeospatialInfrared MicrowaveEarth Science Microwave Spectral Engineering MicroCloud Cleared RadiancesEarth Science Microwave Spectral Engineering BrighSatellite SoundingSuomi NppAtmospheric StateEarth Science Infrared Wavelengths Spectral Engine+1

0 views

NLP & Text

Molecular Docking Results of FadA with CDH11 Receptor

Molecular docking results from a study investigating the interaction between Fusobacterium nucleatum adhesin FadA and host receptor cadherin-11 (CDH11). The dataset likely contains computational binding scores or structural parameters. It was authored by Kun Liu and last updated on April 20, 2026.

TabularExcelPulmonary DiseaseMolecular DockingHealthcareFada Cdh11BiochemistryProtein Interaction+1

0 views

NLP & Text

SPURS-2: CTD Salinity and Temperature Profiles from R/V Revelle Cruises

SPURS-2 deployed 64 CTD casts from the R/V Revelle in the eastern tropical Pacific during 2016 and 2017 to study a rainfall-dominated, high-salinity-variability region. This NASA-funded project combines these in-situ vertical profiles with satellite data from Aquarius, SMAP, and SMOS to characterize near-surface salinity dynamics. The data provide continuous conductivity, temperature, and depth measurements calibrated with IAPSO standard seawater.

Time SeriesGeospatialOceanographyEarth Science Ocean Temperature Oceans TemperatureSpurs 2Tropical PacificSalinity ProfilesCtd DataEarth Science Salinity Density Oceans ConductivityEarth Science Salinity Density Oceans Salinity+1

0 views

NLP & Text

SPURS-2: Underway CTD Profiles from Eastern Tropical Pacific Cruises

SPURS-2 uCTD data provides vertical profiles of salinity and temperature from two research vessel cruises in the eastern tropical Pacific Ocean. The dataset contains 763 total casts from the R/V Revelle in August 2016 and October 2017, with observations binned in 6 or 8-meter depth intervals down to 500 meters. It supports the study of near-surface salinity dynamics in a rainfall-dominated region influenced by the North Equatorial Current.

TabularTime SeriesOceanographySalinityEarth Science Ocean Temperature Oceans TemperaturePacific OceanTemperature ProfilesEarth Science Salinity Density Oceans ConductivityEarth Science Salinity Density Oceans SalinityField Campaign+1

0 views

NLP & Text

MIDDEN: Dutch Manufacturing Industry Energy and Decarbonisation Data

The MIDDEN database from the PBL and TNO project contains aggregated information on the current energy and material consumption of the manufacturing industry in the Netherlands, along with possibilities for decarbonising its processes. It is structured into four sections: General Plant Data (GPD), Plant Configuration Data (PCD), Technology Characteristics (TC), and Commodity Data (CD). The dataset is published by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties under a CC-BY-4.0 license.

TabularEnergy ConsumptionIndustrial ProcessesNetherlandsDecarbonisationManufacturing Industry+1

0 views

NLP & Text

Google Trends Network of Fluoride-Related Searches from 2004 to 2025

Google Trends data from 2004 to 2025 maps connections between fluoride-related search topics. Olívia Jorge constructed this network using related queries weighted by Relative Search Volume, analyzing thematic structures with Gephi. The 419.1 KB XLSX file contains the repeated topics used to build the network, published on figshare in April 2026.

TabularExcelSearch BehaviorMisinformationFluorideNetwork AnalysisGoogle Trends+1

0 views

NLP & Text

SPURS-1 Seasoar: High-Resolution Ocean Salinity and Temperature Profiles

SPURS-1 deployed a Seasoar towed vehicle to collect 1144 vertical casts of temperature, conductivity, salinity, and pressure in the subtropical North Atlantic. The dataset provides a 1-meter gridded, highly processed view of ocean structure from a 900 x 800-mile study area centered at 25N, 38W. This in-situ data, collected during a 2013 spring cruise, complements satellite salinity measurements from Aquarius/SAC-D and SMOS.

TabularTime SeriesOceanographySalinityEarth Science Ocean Temperature Oceans TemperatureSea TemperatureSpursEarth Science Salinity Density Oceans ConductivityConductivityEarth Science Salinity Density Oceans Salinity+1

0 views

NLP & Text

Municipal Welfare and Accommodation Policy Memorandum for Groningen

Groningen municipality's final policy memorandum on welfare and accommodation, building on a concept note adopted in October 2005. The document outlines a revised accommodation policy based on welfare objectives and incorporates feedback from a public consultation process. It is published by the Dutch Ministry of the Interior and Kingdom Relations under a CC-BY-4.0 license.

TextPublic PolicyHousing AccommodationComputer VisionConsultation ProcessMunicipal Welfare+1

0 views

NLP & Text

BORG Reports: Public Space Quality Surveys for Groningen Municipality

Groningen municipality has managed public space using the BORG method since 2001. The data includes annual citizen inspections until 2017 and biannual digital surveys from a population panel from 2018 onward, with the latest survey from 2023. The dataset is published by the Dutch Ministry of the Interior and Kingdom Relations under a CC-BY-4.0 license.

TextUrban MaintenanceMunicipal QualityCitizen SurveysPublic Space Management+1

0 views

NLP & Text

Urban Design Vision for Ribbon Villages in Groningen, Netherlands

Four ribbon villages on the east side of Groningen, namely Noorderhoogebrug, Ruischerbrug, Middelbert, and Engelbert, are the subject of this spatial analysis. The dataset contains a PDF document with an urban design concept vision, including maps and explanatory texts, prepared by the Dutch Ministry of the Interior and Kingdom Relations. It serves as the basis for a new zoning plan to replace an outdated one.

TextGeospatialZoningComputer VisionGroningen NetherlandsUrban PlanningGeospatial Analysis+1

0 views

NLP & Text

Qualitative Themes on Caregiver Experiences for Children with Cerebral Palsy in Ethiopia

Ethiopian family caregivers of children with Cerebral Palsy were interviewed to explore their experiences and support needs. The dataset contains qualitative themes and sub-themes derived from 13 in-depth interviews, analyzed using reflexive thematic analysis. Author Melkitu Melak published the data on figshare in April 2026 under a CC-BY-4.0 license.

TabularAudioExcelHealthcare SupportFinanceCerebral PalsyEthiopiaCaregiver ExperienceQualitative Research+1

0 views

NLP & Text

Ethiopian Caregiver Experiences for Children with Cerebral Palsy

13 family caregiver interviews exploring the caregiving experiences and support needs for children with Cerebral Palsy in Ethiopia. The data was collected via face-to-face, semi-structured interviews in Amharic, transcribed verbatim, and analyzed using reflexive thematic analysis in NVivo version 14. The dataset was authored by Melkitu Melak and last updated on 2026-04 13.

TabularAudioExcelHealthcare ResearchQualitative StudyFinanceCerebral PalsyEthiopiaCaregiver Experience+1

0 views

NLP & Text

IMP 8 UMD EECA: 10.92-Minute Ion and Electron Flux Rates

IMP 8 satellite data from the University of Maryland's Electrostatic Energy-Charge Analyzer (EECA) instrument provides count rates and pulse height data. The dataset enables computation of 10.92-minute resolution fluxes for singly and doubly ionized ions, ions with higher charge states, and 600-860 keV electrons. It was created at the National Space Science Data Center (NSSDC) from summary tapes provided by the University of Maryland.

Time SeriesSpace PhysicsEnergetic ParticlesElectron FluxSatellite DataIon FluxEnergeticparticles+1

0 views

NLP & Text

NLI Soft Labels Final: Soft Labels for Natural Language Inference Distillation

Soft labels generated by the cross-encoder/nli-deberta-v3-small model on the combined SNLI and MultiNLI datasets. The dataset is intended for knowledge distillation into smaller, more efficient Natural Language Inference models. Each JSONL record contains a premise, hypothesis, hard label, and a probability distribution for entailment, neutral, and contradiction.

TextNatural Language InferenceNlp TrainingSoft LabelsKnowledge DistillationSynthetic+1

0 views

NLP & Text

Urban Development Plan 34 Vorderste Hohe: Land-Use Law for Ludwigsfelde, Germany

A binding land-use plan for the Vorderste Hohe residential development on Berliner Weg in the Siethen district of Ludwigsfelde, Germany. The plan transposes the municipal land-use concept into directly applicable law, specifying permitted and inadmissible land uses on the affected base areas. The dataset is provided by the Bundesamt für Kartographie und Geodäsie via the eu_open_data platform.

Geospatial🇩🇪 GermanyMunicipal LawLand UseUrban Planning+1

0 views

PreviousPage 391 of 2235Next