Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,786 datasets
Raw data from 2026 experiments investigating phenotypic differences between two genotypes of the ruminant respiratory pathogen Mannheimia haemolytica. The dataset includes optical density measurements for carbon source utilization, iron restriction response, and coculture interactions for 20 bacterial isolates. It was authored by Janet Hill and last updated on April 30, 2026.
MURAD is an open Arabic lexical dataset containing 95,000 word-definition pairs. It was created by riotu-lab and is designed to support research in computational linguistics and Arabic natural language processing. The dataset spans multiple scientific, religious, and linguistic domains.
A dataset characterizing users of the subsidized health regime assigned to the Central Health Network in Cali, Colombia. The data is disaggregated by life cycle stage and gender, with columns indicating healthcare provider, user sex, age group, and municipality. The dataset was last updated on 2026-05-18 and is hosted on the Colombian open data portal.
Annual operational statistics compiled from registrations of births, marriages, deaths, still-births, adoptions, and name changes in Ontario. The Government of Ontario's Office of the Registrar General publishes these reports to provide data for research and public policy-making. Each report covers events from a single calendar year.
Alberta Economic Multipliers By Industry and Commodity contains economic multipliers used to assess the impacts of changes in final demand or industry output. The Government of Alberta produced the data, which models 220 industries and 273 commodities based on the 2022 NAICS and NAPCS classifications. The dataset was last updated in April 2026.
Data from the Suomi NPP satellite's CrIS/ATMS instruments, processed with the CLIMCAPS algorithm to produce cloud-cleared radiances. The dataset provides infrared and microwave spectral data from 2211 CrIS channels and 22 ATMS channels, organized into 240 six-minute granules per day. It is used to infer atmospheric state variables, with a latency of 3 to 7 weeks due to its reliance on MERRA-2 reanalysis for initial conditions.
Molecular docking results from a study investigating the interaction between Fusobacterium nucleatum adhesin FadA and host receptor cadherin-11 (CDH11). The dataset likely contains computational binding scores or structural parameters. It was authored by Kun Liu and last updated on April 20, 2026.
SPURS-2 deployed 64 CTD casts from the R/V Revelle in the eastern tropical Pacific during 2016 and 2017 to study a rainfall-dominated, high-salinity-variability region. This NASA-funded project combines these in-situ vertical profiles with satellite data from Aquarius, SMAP, and SMOS to characterize near-surface salinity dynamics. The data provide continuous conductivity, temperature, and depth measurements calibrated with IAPSO standard seawater.
SPURS-2 uCTD data provides vertical profiles of salinity and temperature from two research vessel cruises in the eastern tropical Pacific Ocean. The dataset contains 763 total casts from the R/V Revelle in August 2016 and October 2017, with observations binned in 6 or 8-meter depth intervals down to 500 meters. It supports the study of near-surface salinity dynamics in a rainfall-dominated region influenced by the North Equatorial Current.
The MIDDEN database from the PBL and TNO project contains aggregated information on the current energy and material consumption of the manufacturing industry in the Netherlands, along with possibilities for decarbonising its processes. It is structured into four sections: General Plant Data (GPD), Plant Configuration Data (PCD), Technology Characteristics (TC), and Commodity Data (CD). The dataset is published by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties under a CC-BY-4.0 license.
Google Trends data from 2004 to 2025 maps connections between fluoride-related search topics. Olívia Jorge constructed this network using related queries weighted by Relative Search Volume, analyzing thematic structures with Gephi. The 419.1 KB XLSX file contains the repeated topics used to build the network, published on figshare in April 2026.
SPURS-1 deployed a Seasoar towed vehicle to collect 1144 vertical casts of temperature, conductivity, salinity, and pressure in the subtropical North Atlantic. The dataset provides a 1-meter gridded, highly processed view of ocean structure from a 900 x 800-mile study area centered at 25N, 38W. This in-situ data, collected during a 2013 spring cruise, complements satellite salinity measurements from Aquarius/SAC-D and SMOS.
Groningen municipality's final policy memorandum on welfare and accommodation, building on a concept note adopted in October 2005. The document outlines a revised accommodation policy based on welfare objectives and incorporates feedback from a public consultation process. It is published by the Dutch Ministry of the Interior and Kingdom Relations under a CC-BY-4.0 license.
Groningen municipality has managed public space using the BORG method since 2001. The data includes annual citizen inspections until 2017 and biannual digital surveys from a population panel from 2018 onward, with the latest survey from 2023. The dataset is published by the Dutch Ministry of the Interior and Kingdom Relations under a CC-BY-4.0 license.
Four ribbon villages on the east side of Groningen, namely Noorderhoogebrug, Ruischerbrug, Middelbert, and Engelbert, are the subject of this spatial analysis. The dataset contains a PDF document with an urban design concept vision, including maps and explanatory texts, prepared by the Dutch Ministry of the Interior and Kingdom Relations. It serves as the basis for a new zoning plan to replace an outdated one.
Ethiopian family caregivers of children with Cerebral Palsy were interviewed to explore their experiences and support needs. The dataset contains qualitative themes and sub-themes derived from 13 in-depth interviews, analyzed using reflexive thematic analysis. Author Melkitu Melak published the data on figshare in April 2026 under a CC-BY-4.0 license.
13 family caregiver interviews exploring the caregiving experiences and support needs for children with Cerebral Palsy in Ethiopia. The data was collected via face-to-face, semi-structured interviews in Amharic, transcribed verbatim, and analyzed using reflexive thematic analysis in NVivo version 14. The dataset was authored by Melkitu Melak and last updated on 2026-04 13.
IMP 8 satellite data from the University of Maryland's Electrostatic Energy-Charge Analyzer (EECA) instrument provides count rates and pulse height data. The dataset enables computation of 10.92-minute resolution fluxes for singly and doubly ionized ions, ions with higher charge states, and 600-860 keV electrons. It was created at the National Space Science Data Center (NSSDC) from summary tapes provided by the University of Maryland.
Soft labels generated by the cross-encoder/nli-deberta-v3-small model on the combined SNLI and MultiNLI datasets. The dataset is intended for knowledge distillation into smaller, more efficient Natural Language Inference models. Each JSONL record contains a premise, hypothesis, hard label, and a probability distribution for entailment, neutral, and contradiction.
A binding land-use plan for the Vorderste Hohe residential development on Berliner Weg in the Siethen district of Ludwigsfelde, Germany. The plan transposes the municipal land-use concept into directly applicable law, specifying permitted and inadmissible land uses on the affected base areas. The dataset is provided by the Bundesamt für Kartographie und Geodäsie via the eu_open_data platform.