DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

📝

NLP & Text

Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora

49,416 datasets

NLP & Text

Psychiatric Morbidity Among Women Prisoners in England and Wales

A survey of psychiatric morbidity among prisoners aged 16-64 in England and Wales presents information on the mental health of women prisoners. The data is produced by the Office for National Statistics as Official Statistics. The dataset was last updated on 2026-07 08.

TabularMental HealthUk StatisticsHealthcarePsychiatric MorbidityPrisoners+1

0 views

NLP & Text

Mortality Statistics: Deaths in England and Wales by Age, Sex, and Marital Status

Discontinued series DH1 contains key statistics of deaths and death rates for England and Wales. The dataset, produced by the Office for National Statistics, includes breakdowns by age, sex, marital status, place of death, birthplace, and coroner involvement for a specific year.

TabularEngland WalesMortality StatisticsDemographicsPublic Health+1

0 views

NLP & Text

England Dwelling Prices and Transactions, 2001-2008

266,871 property ownership transactions in England recorded by the Land Registry between 2001 and 2008. The dataset provides statistics on prices paid and is published by the Office for National Statistics. Data is available at multiple geographic levels, from national down to Middle Layer Super Output Areas.

TabularTime SeriesGeospatial🇬🇧 United KingdomAdministrative DataProperty PricesHousing TransactionsReal Estate+1

0 views

NLP & Text

Psychiatric Morbidity Among Prisoners in the UK, 1997

UK data from 1997 provides information about the prevalence of psychiatric problems, including substance dependence, among male and female, remand and sentenced prisoners. The dataset is produced by the Office for National Statistics as Official Statistics not designated as National Statistics. It was last updated on the platform in July 2026.

TabularUk StatisticsSubstance DependencePsychiatric MorbidityPrisoners+1

0 views

NLP & Text

Substance Misuse Among Prisoners in England and Wales, ONS Survey

England and Wales survey data on substance abuse from the Office for National Statistics' study of psychiatric morbidity among prisoners. The report presents results from secondary analysis of this survey data. The dataset is designated as National Statistics and was last updated in July 2026.

TabularUk SurveyPsychiatric MorbiditySubstance MisusePrisonersPublic Health+1

0 views

NLP & Text

PA1007: Mineral Extraction in Great Britain, Excluding Deep Mined Coal

Information covering all mines and quarries, except deep mined coal, for mineral extraction in Great Britain. The dataset is produced by the Office for National Statistics and designated as National Statistics. It was last updated on 2026-07-08.

TabularQuarryingMineral ExtractionGreat BritainNational StatisticsMining+1

0 views

NLP & Text

Psychiatric Morbidity Among Young Offenders in England and Wales, 1997

England and Wales survey data presents information on the mental health of young offenders. The dataset is based on a 1997 survey of psychiatric morbidity among prisoners aged 16-64. It was produced by the Office for National Statistics and is designated as Official Statistics.

TabularMental HealthUk SurveyYoung OffendersHealthcarePsychiatric MorbidityPrisoners+1

0 views

NLP & Text

Ownership of R&D Assets from UK Gross Expenditure Surveys

Survey data from the UK's Office for National Statistics tests the assumption that funders of research and development own the resulting assets. The data originates from a pioneering question added to Gross Expenditure on Research and Development surveys. The supporting material was last updated on 2026-07-08.

TabularUk DataResearch DevelopmentEconomic StatisticsAsset Ownership+1

0 views

NLP & Text

SAGA: Synthetic Earnings Data for 50,000 Individuals with Demographic Effects

A synthetic dataset of 50,000 individuals generated from a parametric AR(1) model with education- and sex-specific fixed effects and an empirically calibrated Swedish age-earnings profile. It was created by Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov of Stockholm University for the SAGA manuscript submitted to IEEE TPAMI (2026). The deposit includes synthetic train/cal/test splits and moment validation statistics.

TabularTime SeriesProbabilistic ForecastingEarnings ForecastingDemographic ModelingSynthetic DataSynthetic+1

0 views

NLP & Text

Hawaiian Islands Reef Rugosity Maps from Airborne Imaging Spectroscopy

January 2019 and January 2020 airborne imaging spectroscopy data from the Global Airborne Observatory was used to create high-resolution seafloor rugosity maps for the Main Hawaiian Islands. The dataset includes two map products, fine and coarse rugosity, covering islands such as Maui, Kahoolawe, Lanai, Molokai, Oahu, Kauai, and Niihau, with Hawaii Island data split into quarters. These maps quantify reef habitat complexity, supporting ecological and conservation research.

GeospatialHawaiian IslandsMarine HabitatComputer VisionFinanceCoral Reef RugositySeafloor Complexity+1

0 views

NLP & Text

12,037 Limpet Shell Images and Morphometrics from Alaska to Baja California

Sara S. Kahanamoku from UC Berkeley provides an image and 2D shape library of 12,037 patellogastropod limpet shells. The dataset includes individual shell images, outline coordinates, and morphometric measurements like major axis length and eccentricity. Specimens were collected from 353 localities along a northeastern Pacific latitudinal gradient from Alaska to Baja California, Mexico.

ImageTabularGeospatialMolluscaComputer VisionImage LibraryMorphometricsSyntheticMarine BiologyLatitudinal Gradient+1

0 views

NLP & Text

RRT2: Self-Healing Concrete Inter-Laboratory Test Data from 9 European Labs

Data from the second inter-laboratory testing program (RRT2) of the EU COST action SARCOS, focusing on self-healing concrete with MgO-based expansive minerals. Nine laboratories from seven European countries participated, testing fiber-reinforced concrete specimens using water permeability, capillary absorption, and crack width measurements. Specimens were monitored for self-healing after 1, 3, and 6 months of submersion in water.

TabularCivil EngineeringSelf Healing ConcreteExperimental DataMaterials ScienceInter Laboratory Testing+1

0 views

NLP & Text

Medical Term Normalization Data for Social Media from EMNLP 2015

Data and supplementary information for the paper 'Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages' published at EMNLP 2015. The dataset likely contains a collection of phrases from tweets related to adverse drug reactions, used to map laymen's terms to medical concepts. The research was conducted by Nut Limsopatham at the University of Cambridge.

TextMachine TranslationHealthcareNatural Language ProcessingSocial Media NlpMedical Text NormalizationAdverse Drug ReactionsWord EmbeddingsText Normalization+1

0 views

NLP & Text

GRIT: Global River Topology Network at 30m Resolution

GRIT is a vector-based global river network representing tributary and distributary components, including multi-thread rivers, canals, and delta distributaries. It is the first global hydrography excluding Antarctica and Greenland produced at 30m raster resolution, created by merging Landsat-based river masks with elevation-generated streams. The dataset was authored by Michel Wortmann of the University of Oxford and is available as GeoPackage files.

GraphGeospatial🌍 GlobalGlobal River NetworkRiver TopologyRiver NetworkHydrographyTopologySynthetic+1

0 views

NLP & Text

EASY: Benchmark Results for Efficient Arbiter Synthesis from Multi-threaded Code

Jiany Cheng from Imperial College London created a dataset containing measured results for the EASY (Efficient Arbiter SYnthesis) tool. The data likely includes performance and area metrics for hardware synthesized from multi-threaded C/C++ code using the LegUp HLS tool. The results show up to 87% area savings and up to 39% improvement in execution time for a set of typical application benchmarks.

TabularTime SeriesHardware DesignFormal VerificationFinanceMulti Threaded CodeHigh Level SynthesisFpga+1

0 views

NLP & Text

SEVIRI FRP: Fire Radiative Power Gridded Over Africa

Fire Radiative Power (FRP) data from the Meteosat Second Generation satellite provides a 15-minute temporal resolution measure of radiant heat output from fires across the African continent. The gridded product spatially degrades the native 3 km resolution observations to a 1-degree grid while maintaining the high-frequency sampling. This dataset, produced by applying the Roberts and Wooster detection algorithm, supports the estimation of fuel consumption and smoke emissions from biomass burning.

Time SeriesGeospatial🌍 AfricaGeospatial GridSatellite Remote SensingComputer VisionFire Radiative PowerWildfire Monitoring+1

0 views

NLP & Text

German Voter Preferences for Female Candidates from Open Lists

A survey experiment dataset from a quota-representative sample of 2,640 eligible German voters examines preferences for female candidates under open-list electoral systems. The study, conducted by Lukas Rudolph of ETH Zurich, randomized the share of women on candidate lists and list type to analyze voter behavior. Results suggest voters generally level out gender-imbalanced lists, with specific exceptions among subgroups.

TabularElectoral ReformSurvey ExperimentGerman PoliticsGender RepresentationPolitical Science+1

0 views

NLP & Text

3D Geodata with Level of Detail 2 for Linz, 2020

3D Geodata with Level of Detail 2 2020 for Linz is a 3D city model automatically created from a Digital Surface Model (DOM) via aerial image analysis. The data includes simple building and roof designs, with coordinates in the Gauss-Krüger system M31-5Mio and heights referenced to the Adriatic Sea. The dataset was generated by Cooperation OGD Österreich and Wikimedia Österreich, based on 2019 data flow, with an update scheduled for September 2023/2024.

GeospatialBenchmarkBuilding ModelsComputer VisionUrban Planning3d GeodataDigital Surface ModelSynthetic+1

0 views

NLP & Text

Geoscape G-NAF: Australia's Authoritative Geocoded Address File

Geoscape G-NAF is Australia's authoritative geocoded address file, containing over 15 million physical address records. It is built and maintained by Geoscape Australia using authoritative government data and is updated quarterly. The dataset includes latitude and longitude coordinates but does not contain personal information.

Geospatial🇦🇺 AustraliaZIPGeocodingGovernment DataAddress DataFinanceLarge Scale+1

0 views

NLP & Text

Geoscape G-NAF: Australian Geocoded Address Database

Quarterly updated, the Geoscape Geocoded National Address File (G-NAF) contains over 15.9 million Australian physical address records with latitude and longitude coordinates. It is built and maintained by Geoscape Australia using validated government data and does not contain personal information. The May 2026 release includes 15,901,249 addresses, with 94.69% being principal addresses.

Geospatial🇦🇺 AustraliaZIPGeocodingGovernment DataAddressesFinanceLarge Scale+1

0 views

PreviousPage 39 of 2465Next

NLP & Text Datasets | DataSalon