Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,560 datasets
Cauca Department's 2019 public contract accountability data from the General Comptroller's Office. The dataset tracks the number and monetary value of contracts that were and were not reported on. It includes columns for total registered contracts, reported contracts, unreported contracts, and their corresponding values.
Sentinel-5P TROPOMI Level-1B radiance data for band 5 (NIR detector) provides calibrated spectral radiance and irradiance measurements from a nadir-viewing hyperspectral spectrometer. The instrument covers ultraviolet-visible, near-infrared (675nm to 775nm), and shortwave infrared wavelengths with a high spatial resolution of approximately 5.5 km at nadir implemented from August 6, 2019. This dataset is generated by the Koninklijk Nederlands Meteorologisch Instituut (KNMI) processor and is part of the European Space Agency's Copernicus Sentinel mission family.
European bush crickets are the subject of this dataset, which was used to compare morphology and sound production. It contains measurements for six species, analyzed on intra- and interspecific levels. The data is provided by Jan Wille in an XLSX file under a CC-BY-4.0 license.
Annual data from 2005 onward for the Antioquia department in Colombia, containing the number of cases and the rate per thousand inhabitants for general mortality. The dataset is updated annually with the latest year's figures and is provided by www.datos.gov.co. It includes municipality and region names and codes, as well as geographic location data.
Faculty counts for a Colombian university, disaggregated by gender and contract type for the 2019-2 academic period. The dataset includes columns for full-time (PLANTA), contract (CONTRATO), and adjunct (CATEDRATICO) positions. It is hosted on the Colombian open data portal, datos.gov.co, and was last updated in May 2026.
V-RAGBench is a benchmark dataset containing 2,100 open-ended query, evidence chunk, and answer triplets designed for evaluating retrieval and generation in long-video retrieval-augmented generation (VideoRAG). It was created by DISLab and last updated on June 15, 2026. The triplets are built from hour-scale egocentric videos, with queries designed to be answerable only from a specific localized evidence chunk.
Municipal-level data from Colombia tracks investment and management of national social programs across presidential periods. Columns suggest records include payments, target population, and beneficiary counts for programs aimed at vulnerable and victim populations. The data is hosted by the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
10,000 synthetic enterprise IT service desk incident reports engineered for AI training. The dataset was created by author bronc2 and was last updated on the platform in 2026. It is designed for technical applications in corporate IT environments.
A dataset for comparing restaurant AI phone systems based on official pricing and workflow information. It was created by Karmane and last updated on June 11, 2026. The data likely contains details on product features, use cases, and pricing signals for various systems.
Average wait times in seconds and call resolution percentages for a municipal contact centre, broken down by service type including generic services and Colchester Borough Homes. The data is aggregated quarterly and includes a noted increase in contact volume during the first two quarters of 2017 due to changes in waste collection. It is provided by the Government Digital Service under an open government license.
Onrubia-Márquez, Mónica from e-cienciaDatos Harvested Dataverse published data on mesoporous silica nanoparticles for Parkinson's disease treatment. The dataset likely contains results from in vivo experiments using a 6-hydroxydopamine-induced mouse model, comparing the efficacy of nanoparticle-loaded L-dopa to free L-dopa. It was last updated on 2026-05-31.
Records of cattle, equine, swine, goat, and sheep theft under Colombian Penal Code Article 243 (Law 599 of 2000). The dataset includes columns for DEPARTAMENTO, MUNICIPIO, ZONA, CANTIDAD, and FECHA HECHO. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-19.
A 2019-2022 multibeam sonar survey of the Forster Pacific Palms Cape Hawke seabed in NSW, Australia, conducted by the NSW Department of Planning and Environment. The dataset provides 5-meter resolution 32-bit floating point geotiff files of bathymetry and backscatter, processed using Hypack, R2Sonic GUI, POSView, POSPac, Qimera, and FMGT software. It was funded by the SeabedNSW program and HabMap Program to establish a baseline and map seabed type distribution.
A bathymetry survey acquired by Deakin University over two days in 2015 (14/10/2015-15/10/2015) onboard the Motor Vessel Yolla. The data was collected using a Kongsberg EM2040c sonar system and is managed by the Australian Ocean Data Network. It is not to be used for navigational purposes.
North Sea historical data on UK trawl and seine fishing effort, measured in hours, by month and ICES rectangle. The dataset covers six specific years: 1927, 1937, 1947, 1957, 1967, and 1977. It was provided by the Marine Environmental Data & Information Network and last updated in June 2026.
More than 41,000 works from the National Museum of Fine Arts of Quebec (MNBAQ) are available in this dataset. The collections include paintings, sculptures, drawings, photographs, prints, decorative art, videos, installations, and digital art, primarily produced in Quebec or by Quebec artists, with some works dating back to the 17th century. The dataset is provided by the Government and Municipalities of Québec under a CC-BY-4.0 license.
152.6 KB of quantitative data from an online survey using retrospective self-reporting. The dataset, authored by Talia Elgie, was last updated on June 1, 2026. It is shared under a CC-BY-4.0 license on figshare.
Colombian traffic accident records from January 2021 through May 2022, sourced from the national open data portal datos.gov.co. The dataset includes details on incident type, severity, vehicle class, and location. It was last updated on the platform in May 2026.
Greater London Authority data on reports of abuse against vulnerable adults in the London Borough of Redbridge. The dataset provides a detailed breakdown of volume, type, and action taken, produced on a bi-annual basis. The record was last updated on 2026-06-24.
Volume of domestic low carbon technology connections for generation and demand under 1 megawatt. The data is aggregated by Lower Super Output Area (LSOA) and was published by the Greater London Authority. It was last updated on 2026-06-24.