Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,061 datasets
Government and Municipalities of Québec provides detailed taxation and pricing rates for the City of Montreal. The dataset shows rates by building types, including residential, non-residential, and wasteland, applied to annual adjusted property values. The data was last updated on 2026-04-17.
Searchable metadata for papers from top AI venues including NeurIPS, ICML, ICLR, CVPR, ICCV, WACV, ACL, EMNLP, and NAACL. The dataset is hosted by GenAI4ELab and was last updated on June 14, 2026. It includes a full index and per-venue browse views.
A 25.7 KB Excel file from figshare, last updated on 2026-05-23. The dataset relates to research on secondary brain injury following acute ischemic stroke, specifically focusing on the inflammatory response in ischemia-reperfusion injury. It was authored by Bingjie Jiang and is shared under a CC-BY-4.0 license.
Records from January 1, 2009, list building permits issued by the City of Edmonton's Urban Planning & Economy Department for construction and maintenance. The dataset includes permit details, location coordinates, and occupancy dates for residential and non-residential projects, published by data.edmonton.ca. Applicant information is withheld for privacy reasons.
1,281,633 rows of metadata and URLs for images from the Vogue Runway dataset. The dataset includes fields such as image dimensions, designer, season, year, category, file size, aesthetic score, and JSON-encoded tags. It was created by ROSCOSMOS and last updated on Hugging Face in June 2026.
20.7 KB of text files contain data used for the analysis in the paper "Accumulation of CO2 limits energy gain in freely diving grey seals." The dataset includes files for fish energy content, metabolic rates, triglyceride concentrations, and lactate levels from experimental trials. It was authored by Eva-Maria Bonnelycke and last updated in April 2026.
Veedurías Ciudadanas Personería Envigado tracks citizen-led oversight groups monitoring public administration and private entities handling public resources. The dataset includes columns for registration year, municipality, number of members, and the specific object of oversight. It is published on the Colombian open data portal, datos.gov.co, and was last updated in May 2026.
Geospatial data describing the geomorphology and sedimentology of the continental shelf adjacent to Mac Robertson Land in East Antarctica. The dataset, provided by the Australian Ocean Data Network, characterizes a 'scalped shelf' deeply eroded by glaciers and currents during the Quaternary period, exposing underlying basement rock. The record was last updated on 2026-06-16.
The Albany Canyon complex extends 700 km from Cape Leeuwin to east of Esperance, with canyons cutting down up to 2000 meters. Geoscience Australia Data compiled this information on canyon structure and geological history, last updated in May 2026. The data likely contains details on canyon dimensions, thalweg slopes, and the exposed Jurassic and younger rock sequences.
United Nations Human Settlements Programme data tracks the proportion of urban populations with access to services like improved water, sanitation, clean energy, internet, and durable housing. The dataset is provided in XLSX format and was last updated on May 29, 2026. It originates from the UN-Habitat Data and Analytics Section.
Ophiomicros bathursti, a new genus and species of ophiuroid (brittle star), is described from Cenomanian (Upper Cretaceous) strata on Bathurst Island, Northern Territory. The description highlights morphological distinctions, such as unusually large oral plates and small adoral plates, which differentiate it from allied genera like Ophiura and Amphiura. This dataset comprises the formal taxonomic publication detailing the fossil's discovery and classification.
9.5 KB of simulation analysis data supporting a novel data compression method for bridge monitoring. The dataset, authored by Ming Chen and shared on figshare, demonstrates a domain knowledge-based compression method achieving a 75% compression ratio, with a synergistic processing method exceeding 92% compression and 95% data fidelity. The data was last updated on April 15, 2026.
5.5 KB of simulation results evaluating a novel domain-specific data compression algorithm for bridge structural health monitoring. The dataset, authored by Ming Chen and last updated in April 2026, contains error statistics for sparse data after a supplementation process. The described method achieved a 75% compression ratio, exceeding 92% with synergistic processing, while retaining 95% data fidelity.
Bench-easy-6-2026 is an Effortless-to-Easy tier question-answering benchmark designed by Seton Labs to evaluate basic reasoning and generalization in small AI systems. The dataset was last updated on June 22, 2026, according to the platform metadata. Its creator notes the benchmark is intended for research, experiments, or fun, with an acknowledgment that accuracy issues are being addressed.
Seasonal variations in major ions, nutrients, and chlorophyll a were examined at two sites in the upper Swan River estuary. The data likely captures intra-annual variations influenced by riverine discharge, with temperature ranging from 13-29°C and salinity from 3-30. The dataset is provided by Geoscience Australia Data and was last updated in May 2026.
A 5.5 KB dataset from figshare contains experimental data on a novel SIRT5 inhibitor designed using X-ray cocrystal structures. Yingyi Jiang published the data in May 2026, detailing the inhibitor's IC50 of 0.29 μM and its effects on renal function and inflammation markers in mouse models of septic acute kidney injury.
A systematic mapping study analyzing 54 publications from ACM, IEEE, and Scopus on Usability and User Experience evaluation of Generative AI tools in the post-ChatGPT period. The study examined 2,473 publications and identified substantial documentation gaps and terminological fragmentation. The dataset was created by Rafael Pereira and last updated in May 2026.
The Eval Cards Backend Dataset contains pre-computed evaluation data for 5,678 models across 798 benchmarks. Generated by the eval-cards backend pipeline, it powers the Eval Cards frontend and includes 1,321 metric-level evaluations. The dataset was last generated on May 5, 2026.
Simulation data from two-dimensional magnetohydrodynamic (MHD) and runaway electron fluid models for disruption events in the SPARC tokamak. The work provides a systematic comparison and benchmarking of different primary runaway electron sources, including activated tritium beta decay and Compton scattering. The dataset was authored by Datta, R., C. Clauser, N. Ferraro, C. Liu, R. Sweeney, R. A. Tinguely from the Plasma Science and Fusion Center Dataverse.
A protocol for a cross-sectional observational study of 180 Mandarin-speaking children aged 4-6, developed by Cai Wang. The study aims to evaluate a culturally adapted framework for profiling conversational abilities in children with and without Developmental Language Disorder, using audio-video recordings and linguistic annotation. The protocol was last updated in May 2026.