Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,721 datasets
A geological report details mineralization and isotopic dating for the RAM zinc-lead-silver property in southwest Yukon Territory. It provides average assay results from grab samples, including 53.8 gm/t silver, 4.35% zinc, and 2.20% lead. The study includes K-Ar and Rb-Sr isotopic analyses with specific dates for regional rock formations.
From January 1, 2016 onward, this dataset details the budgetary expenditures of Colombia's Superintendencia de Transporte (Transport Superintendency). It is published by www.datos.gov.co and includes columns for budget commitments, obligations, payments, and available appropriations. The data was last updated on May 18, 2026.
A metadata catalog of information assets managed by the Municipal Ombudsman's Office (Personería Municipal) of Floridablanca, Colombia. The dataset includes columns for information title, description, categories, storage medium, and responsible department. It was last updated on 2026-05-18 and is available via the Colombian open data portal www.datos.gov.co.
Nomi Style is a style and personality fine-tuning dataset for the Nomi model series by JallyAI. It is used to teach structured Markdown formatting, targeted emoji usage, and a friendly-yet-precise assistant tone. The dataset is bilingual, containing mixed German and English content in the ShareGPT format.
A spatial dataset from Spatial Services NSW representing topographic landform features as points, lines, and polygons. The data includes classes for Distinctive Land Surface features and Fuzzy Extent features like dunes, plains, and cliffs. It was initially published on 05/02/2020 and supports multiple coordinate reference systems.
PROCESO: GESTIÓN FINANCIERA is the primary process for this registry of information assets. The dataset, published by the Colombian National Attorney General's Office via datos.gov.co, identifies the information held by the institution and where it can be consulted. It was last updated on 2026-05-18.
Prompt Variations and LLM Responses contains prompt variants and model outputs used to evaluate the Stability-Generalization Score (SGS). The dataset was created by author naghamo and was last updated on June 5, 2026. It includes data from six QA and instruction benchmarks, such as TruthfulQA and Natural Questions, and covers responses from eleven large language models.
A 1:10,000 scale raster map series provides a detailed cartographic base for Northern Ireland. These maps include water bodies, rivers, main roads, town names, and townlands, serving as a foundational layer for spatial analysis. The data is published under the Open Government Licence, facilitating reuse as a background or overlay in desktop and web mapping applications.
A series of raster maps at 1:10,000 scale showing base mapping for Northern Ireland. The maps are provided by the Government Digital Service under an open government licence. The dataset includes water bodies, rivers, main roads, town names, and townlands.
data.novascotia.ca provides a list of funeral homes, crematoriums, and cemeteries licensed to provide merchandise or services to the public. The dataset includes licensee names, license types, addresses, and geographic coordinates. It was last updated on 2026-05-20.
GenAI Channel Modeling Datasets are ray-traced, site-specific MIMO channel datasets used in the paper 'Site-Specific MIMO Channel Generation via Diffusion and Flow Matching'. The dataset was authored by PaulAlm and includes files for specific frequency and scenario combinations, such as a 3.5 GHz Line-of-Sight (LoS) scenario. It was last updated on May 29, 2026.
An Excel dataset supports a study on differential growth in juvenile Cherax quadricarinatus crayfish. It contains individual and progeny-level body weight data, digestive enzyme activities, energy reserves in tissues, and lists of differentially expressed proteins. The dataset was authored by Hernan Sacristan and last updated on May 16, 2026.
The dataset contains energy consumption and greenhouse gas emissions data for municipal buildings over 2,000 m² in Montreal. It is collected under Regulation 21-042 on the disclosure and rating of GHG emissions from large buildings, adopted by the City of Montreal in 2021. The data is provided by the Government and Municipalities of Québec and was last updated on April 17, 2026.
A dataset linking violations of Quebec's Regulation 23-016 on building occupancy and maintenance to specific buildings via entries in the land register. It is published by the Government and Municipalities of Québec under a CC-BY-4.0 license and was last updated on April 17, 2026. The data likely contains notices used to prevent owners from avoiding sanctions by selling buildings and notices of regularization issued when violations are corrected.
A curated reference index of public datasets for crack and pavement segmentation tasks. The repository points to canonical sources for images and masks, enabling researchers to download data under original licenses from the original authors. The index is maintained by user 'crackedcity' and was last updated on May 31, 2026.
2.9 GB of supporting materials for reproducing results from a research paper on hybrid language models for data visualization. The repository includes datasets, code, and model artifacts shared under a CC-BY-4.0 license. Author João Pedro Quadrado last updated the materials on May 17, 2026.
Gas Share Energy data from Our World in Data, repackaged by Electric Sheep Africa. The dataset contains 240 observations across 4 African countries, covering the period from 1965 to 2024. It tracks the proportion of energy derived from gas sources over time.
NASA SEDAC data from the 2015 release provides a one-kilometer resolution grid of threatened mammal and amphibian species density within 200 kilometers of the West African coast. The dataset is derived from IUCN Red List vector data, focusing on species classified as vulnerable, endangered, or critically endangered. It was created to support coastal vulnerability mapping in the region.
Math SFT Solutions No CoT V3 is a large-scale supervised fine-tuning dataset designed for mathematical capability adaptation. Version 3 substantially expands mathematical coverage while improving dataset quality through stronger filtering, cleaning, and supervision refinement. The dataset, created by kaushik-harsh-99 and last updated in June 2026, focuses on clean instruction-response pairs without hidden chain-of-thought reasoning.
9.6 MB of prompts and correctness scores for various large language model responses, compiled by Mary Cummings. The dataset is hosted on figshare and was last updated on 2026-05-26. It is released under a CC-BY-4.0 license.