Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,611 datasets
A 14.6 MB bibliographic corpus extracted from Scopus and Web of Science supports the scientometric analysis of Smart Tourism and Smart Destinations research. Marlon Felipe Burbano Fernandez created this dataset to map the intellectual evolution and conceptual boundaries of the field. The collection includes raw exports and a merged, deduplicated analytical corpus.
VINAY-UMRETHE's dataset aggregates 70.2K examples of high-quality outputs from frontier large language models. It is designed for training and distilling LLMs to exhibit advanced Chain-of-Thought, Agentic, Mathematical, and Coding capabilities. The data is formatted into a strict OpenAI messages structure and was last updated on June 9, 2026.
Tuluรก municipality in Colombia consolidates records for public health surveillance of gender-based and intrafamily violence. The dataset includes information on case occurrence, victim characteristics, type of violence, location, and institutional follow-up. It is hosted on the Colombian open data platform www.datos.gov.co and was last updated on 2026-05-18.
Reasoning Gym SFT Dataset contains Supervised Fine-Tuning (SFT) reasoning data procedurally generated using Reasoning Gym environments. The dataset is designed to train reasoning models to explain their step-by-step reasoning chain before outputting a final answer wrapped inside LaTeX \boxed{...}. Author MauroPello published the dataset on Hugging Face, with a last recorded update on 2026-06-06.
hirundo-io created a distilled corpus designed to train language models. The dataset aims to process edge-case, controversial, or complex analytical prompts without triggering over-aligned corporate refusal responses. It was last updated on 2026-06-09.
Supplementary data for a phylogenomic study of the phylum Nematoda, comparing molecular and morphological classifications. The 3.8 GB collection contains processed files and scripts from a multi-stage bioinformatics pipeline, including ortholog alignments and phylogenetic trees. The data was authored by Mohammed Ahmed and published on figshare in April 2026.
Daily and weekly sea ice motion vectors and browse images for the Arctic, derived from multiple sensor data, buoy measurements, and reanalysis forecasts. The dataset is produced by the National Aeronautics and Space Administration. The most recent platform update was recorded in March 2026.
Version 4 of this dataset expands its scale and improves supervision quality through stronger cleaning and deduplication. It is a large-scale mathematics supervised fine-tuning dataset designed for instruction tuning and mathematical capability adaptation. The dataset was created by kaushik-harsh-99 and last updated on June 11, 2026.
Experimental measurements of electrical output from hybrid nanogenerators, authored by Wenjie Qin and last updated on June 3, 2026. The dataset is 4.2 MB in size and is available in XLSX format under a CC0 1.0 public domain license.
A geospatial dataset classifying land in Victoria where special permission is required for geothermal energy operations under the Geothermal Energy Resources Act 2005. The layer is an amalgamation of features from multiple public land management sources, including regional parks, national parks, wildlife reserves, heritage rivers, and water authority land. It was last updated on 2026-04 09 by the Department of Energy, Environment and Climate Action.
SGI-Bench is a scientist-aligned benchmark for evaluating Scientific General Intelligence across the full inquiry cycle. It spans 10 disciplines and contains more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset was created by InternScience and was last updated on 2026-06-02.
A 1955 survey measured seismic velocities from 8,000 ft/sec near the top to 12,200 ft/sec at total depth in the Associated Freney Oilfields Nerrima No. 1 Bore. The Bureau of Mineral Resources conducted the work on the Nerrima Dome in Western Australia's Fitzroy Basin, noting cable breaks in shallow sections. This dataset provides a historical record of formation velocities for a specific exploratory well.
Forest classes delineate global forest status and condition for the year 2020 at approximately 30-meter resolution. The dataset identifies approximately 3.26 billion hectares of forests, categorized as primary, young secondary (โค20 years), and old secondary (>20 years). It was created on the ESA-NASA Multi-mission Analysis and Algorithm Platform (MAAP) to support generating Tier 1 estimates for Aboveground Biomass Density under IPCC guidelines.
The 2004 volume contains four parts covering mineral industry overviews, government activities, geological fieldwork reports, and property descriptions. It is published by the Government of Yukon and includes reports authored by university students. The data is available in HTML and PDF formats under the OGL-CA-2.0 license.
1998 to 2002 placer mining industry report from the Yukon, compiled by the Government of Yukon's Mining Inspection Division. The document likely contains sections on staking activity, gold production by creek, historical articles, and reclamation award details. It includes descriptions, locations, and photographs of mining operations visited by inspectors.
Summaries of placer mining operations active between 2007 and 2009 in Yukon. Information was derived from survey forms completed by miners and from field visits, compiled by the Government of Yukon. Summaries are arranged by drainage basin and include corresponding maps and photos.
Geological data for the northwestern Lansing map area (105 N) in Yukon describes upper Paleozoic stratigraphic units with potential for massive sulphide mineralization. The Government of Yukon published this information, which was last updated on April 17, 2026. The area lacks known mineral occurrences but contains units similar to those at known volcanic-hosted and sedimentary exhalative deposits.
Yukon Exploration and Geology 1997 is a multi-part report on mining and geological activity in Yukon, Canada. It includes an overview of industry activity, summaries of government assistance programs, and geological reports from government and industry geologists. The dataset was published by the Government of Yukon and last updated on 2026-04-17.
22 million dollars were spent on mineral exploration in Yukon in 2004, a significant increase from the previous year. This government report provides overviews of mining, development, exploration, and placer mining activities, including details on specific projects like Wolverine and Carmacks Copper. It was produced by the Government of Yukon and covers the year 2004.
The Mineral Industry Report 1974 is a review of the Yukon mineral industry compiled by the Northern Natural Resources and Environment Branch, Department of Indian and Northern Affairs. Information was gathered from site visits, personal communications, technical reports, trade journals, newspapers, Geological Survey of Canada publications, and monthly reports from four Yukon mining districts. The report is available in HTML and PDF formats under the OGL-CA-2.0 license.