DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

NLP & Text Datasets | DataSalon

All Categories

📝

NLP & Text

Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora

49,472 datasets

NLP & Text

Pedestrian Counting System: Minute-by-Minute Foot Traffic for the Past Hour

City of Melbourne Open Data provides minute-by-minute directional pedestrian counts for the last hour, updated every 15 minutes. The data originates from sensor devices located across the city and is current as of June 2026. Records are only created when pedestrians pass a sensor, so the dataset may not contain readings for every sensor every minute.

TabularTime SeriesCSVJSONSensorsUrban MobilityTraffic FlowPedestrian CountsFoot TrafficSensor DataPedestrianSafemobility+1

0 views

NLP & Text

Quarterly Financial Reports of Global Affairs Canada, 2015-2016

Global Affairs Canada's Office of the Chief Financial Officer prepares annual financial statements as of March 31. The statements follow the GC 4500 Directive on Accounting Standards, based on Public Sector Accounting Board recommendations. They provide an accounting of the department's administration of public financial affairs and resources for users both within and outside the government.

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic SectorQuarterly Reports+1

0 views

NLP & Text

Quarterly Financial Report for Global Affairs Canada (2016-2017)

Global Affairs Canada's Office of the Chief Financial Officer prepares annual financial statements as of March 31st. These statements follow the GC 4500 Departmental Financial Statements directive, based on Public Sector Accounting Board recommendations. The data provides an accounting of the department's administration of public financial affairs and resources for external readers.

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic Sector Accounting+1

0 views

NLP & Text

Quarterly Financial Report of Global Affairs Canada (2017-2018)

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic Sector Accounting+1

0 views

NLP & Text

Quarterly Financial Report of Global Affairs Canada, 2018-2019

Financial statements of Global Affairs Canada prepared by its Office of the Chief Financial Officer. The statements are prepared annually as of March 31st, following the GC 4500 Departmental Financial Statements directive based on Public Sector Accounting Board recommendations. This dataset covers the 2018-2019 fiscal years and is geared towards external readers who lack access to specialized internal reports.

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic Sector Accounting+1

0 views

NLP & Text

Global Affairs Canada Quarterly Financial Reports (2019-2020)

Financial statements of Global Affairs Canada prepared by its Office of the Chief Financial Officer. The statements are prepared annually as of March 31st, following the GC 4500 Departmental Financial Statements directive based on Public Sector Accounting Board recommendations. The data covers the 2019-2020 fiscal period and is intended for external readers who lack access to specialized internal reports.

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic Sector+1

0 views

NLP & Text

Quarterly Financial Report of Global Affairs Canada, 2020-2021

Global Affairs Canada's Office of the Chief Financial Officer prepares these annual financial statements as of March 31st. The statements follow the GC 4500 Directive on Accounting Standards, developed using recommendations from the Public Sector Accounting Board. They provide an accounting of the Department's administration of public financial affairs and resources for both internal and external users.

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic Sector+1

0 views

NLP & Text

Quarterly Financial Statements for Global Affairs Canada, 2021-2022

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic SectorQuarterly Reports+1

0 views

NLP & Text

Global Affairs Canada Quarterly Financial Reports, 2022-2023

Quarterly financial statements for Global Affairs Canada prepared by its Office of the Chief Financial Officer. The statements are prepared annually as of March 31st, following the Directive on Accounting Standards GC 4500 and PSAB recommendations. This data provides an accounting of the department's administration of public financial affairs and resources for external and internal users.

Tabular🇨🇦 CanadaGovernment FinanceFinancial StatementsFinancePublic Sector+1

0 views

NLP & Text

Modelled Topsoil Properties for Great Britain at 1km Resolution

Great Britain is covered by modelled estimates of key soil properties at a 1km² resolution. The data includes soil pH, carbon concentration (g kg⁻¹), nitrogen concentration (% dry weight), and invertebrate density (individuals m⁻²). These estimates are derived from a Generalized Additive Model using the 2007 Countryside Survey data, incorporating climate, atmospheric deposition, habitat, soil, and spatial predictors.

GeospatialSoil ScienceGreat BritainEnvironmental ModellingGeospatial DataTopsoil Properties+1

0 views

NLP & Text

NERP TE Project 3.4: Monitoring of Southern Cassowary and Spectacled Flying-Fox, 2011-2014

Monitoring programs for two key vertebrate species were implemented between 2011 and 2014. The project collected data on the endangered southern cassowary via dung surveys and DNA fingerprinting, and on the vulnerable spectacled flying-fox via monthly camp surveys in the Wet Tropics Region. Data was aggregated by the Australian Ocean Data Network.

TabularEndangered SpeciesPopulation trendsWildlife MonitoringDna FingerprintingWet Tropics+1

0 views

NLP & Text

Mobile App Reviews with Appraisal-Theory Sentiment Labels

2.3 MB of mobile app review text manually labeled for sentiment based on appraisal theory parameters. The dataset was created by Ruping Zhang and last updated on June 1, 2026. It was used to train a model achieving a mean fold accuracy of 88.63% and ROC AUC of 90.91%.

TextCSVMachine LearningSentiment AnalysisNatural Language ProcessingAppraisal TheoryMobile App Reviews+1

0 views

NLP & Text

PLOS Redefining Publishing: Survey Data from 2025-2026 Open Science Report

Public Library of Science (PLOS) provides survey data supporting its 2026 report 'Redefining Publishing: Practical pathways to open science'. The dataset comprises anonymized responses from three surveys conducted in 2025, totaling 1.2 MB of CSV and PDF files. Funding was received from the Gordon and Betty Moore Foundation and the Robert Wood Johnson Foundation.

TabularCSVOpen ScienceSurvey ResearchResearch Methods+1

0 views

NLP & Text

Leeds HMO and Student Housing Register

Leeds City Council maintains a register of properties classified as Houses in Multiple Occupation (HMOs) or for shared occupation, sourced from council tax data. The dataset includes codes for student-only properties and halls of residence, serving as an evidence base for local planning decisions. It is not intended to be a definitive list of all HMOs in the city.

TabularCouncil TaxHousing RegisterHousing PolicyHmo RegisterHMOUrban PlanningStudent Housing+1

0 views

NLP & Text

Prenatal Hypoxia-Ischemia Rabbit Model for Cerebral Palsy Pain Research

Behavioral and neuroanatomical data from neonatal New Zealand White rabbits subjected to prenatal hypoxia-ischemia or sham surgery. The dataset includes sensory tests for mechanical, hot, and cold sensation, anxiety-like behavior assessments, and spinal cord primary afferent fiber analysis. It was created by Genry, Landon for the RF1NS135580 project and last updated in July 2026.

TabularCerebral Palsy ModelAnimal BehaviorNeuroscienceHypoxia IschemiaPain Research+1

0 views

NLP & Text

Digital Aerial Survey of Marine Wildlife in New York Bight for Offshore Wind Planning

NYSERDA, APEM, and Normandeau Associates conducted quarterly ultra-high resolution aerial digital surveys of marine resources in a 43,745.20 km² offshore planning area in the New York Bight starting in 2016. Each survey collected approximately 300,000 images covering at least 7% of the area using a transect design, with a special grid survey in a wind energy area collecting around 100,000 images. The dataset includes surveys from Summer 2017 through Spring 2018.

ImageGeospatialAerial SurveyMarine WildlifeEnvironmental monitoringBenchmarkOffshore Wind+1

0 views

NLP & Text

Canadian Auditor General 2026 Reports Briefing Package for Parliamentary Hearing

A briefing package prepared for a May 4, 2026 hearing before the Standing Committee on Public Accounts. The document relates to the 2026 reports of the Auditor General of Canada and the Commissioner of the Environment and Sustainable Development. It was published by the Office of the Auditor General of Canada and last updated on July 15, 2026.

TextCanadian GovernmentPublic AccountsGovernment AuditsEnvironmental Sustainability+1

0 views

NLP & Text

UK Bryophyte Accessions from Public Collection for Molecular Research

A collection of 68 wild bryophyte accessions established in axenic culture from 76 samples submitted by the UK public during the 2021 Cambridge Festival. The dataset includes locational data and sex analysis for Marchantia polymorpha lines, deposited by researchers at the Sainsbury Laboratory Cambridge University. It was created as an online outreach project during COVID-19 lockdowns.

TabularGeospatialPlant BiologyPublic ScienceBryophytesMolecular Biology+1

0 views

NLP & Text

Eromanga Basin Hydrogeological Inventory for the Great Artesian Basin

Over 1,250,000 square kilometres of central and eastern Australia are covered by this hydrogeological inventory for the Eromanga Basin, part of the Great Artesian Basin. The dataset, provided by the Australian Ocean Data Network, groups descriptive attributes into themes like location, geology, hydrogeology, and land use. It describes the basin's Mesozoic sedimentary rocks and their complex depositional history influenced by Gondwana's breakup.

Geospatial🇦🇺 AustraliaGeologyGroundwaterHydrogeology+1

0 views

NLP & Text

PV-mSME Model Parton-Level Simulations for LHC Parity Violation Studies

Parton-level simulations in the PV-mSME model with various values of lambdaPV and for the standard model (lambdaPV=0). The dataset includes one example LHE file generated by MadGraph for each sample used in the paper. Files are named `liv_3j_4j_${lambdaPV}_0.lhe.gz` and `liv_rot_${hour}_0.lhe.gz`, with a separate file for the Standard Model.

TabularParity ViolationLhcParticle PhysicsMadgraphSyntheticHigh Energy Physics+1

0 views

PreviousPage 49 of 2469Next