Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
40,799 datasets
Colombian data on the total number of individuals charged with crimes under the Oral Accusatory Penal System (Laws 906 of 2004 and 1098 of 2006) for incidents occurring since 2010. The dataset is published by the Fiscalía General de la Nación (National Attorney General's Office) and includes demographic breakdowns and case details. Data is current up to the last day of the previous month.
A 2016–2024 program by Geoscience Australia developed new predictive models for resource discovery at the national scale. The work redefines traditional prospectivity mapping by integrating geological processes with perspectives on economic viability and social license. The dataset likely contains maps and models from the AU$225 million Exploring for the Future program.
A presentation service provides a uniform topographic map graphic at a scale of 1:250,000 for the Federal Republic of Germany and neighboring foreign countries. The Bundesamt für Kartographie und Geodäsie (BKG) generates the base from high-resolution raster data in print resolution (200 pixels/cm, 508 dpi) in the UTM32 projection. Data sources include official geodata from several German federal states and OpenStreetMap data for other areas.
November 2025 collection of 60 short texts generated by four commercial LLMs (DeepSeek, Grok, Copilot, ChatGPT) in response to a minimal narrative prompt. The dataset was created by Juha Raipola, Maria Mäkelä, Samuli Björninen and Laura Piippo for a narratological study. Each model produced five responses per prompt, with all outputs retained without curation.
A land suitability map for coffee (Coffea arabica L.) in the Meta department of Colombia, produced at a 1:100,000 scale. The dataset results from a 2017 inter-administrative contract (No 202) between UPRA and the Meta Governorate, with zoning completed in December 2018. It classifies areas into high, medium, low, and non-suitable categories based on biophysical, socioeconomic, and socio-ecosystem components, including legal exclusions.
Descriptive performance data for two large language models evaluated on a professional medical examination. The dataset contains results for GPT-4.0 and GPT-5.0 on the 2024 American Academy of Periodontology In-Service Examination, comprising 331 multiple-choice questions. Results are presented as the number and percentage of correct responses for each exam section under two testing conditions, along with completion rates and missing responses.
New York City Parks Department maintains a dataset of canine waste bag dispenser locations across the city. Each record includes the dispenser's location, mounting surface, manufacturer, installation date, and the entity responsible for restocking. The data is collected by borough analysts and managed by the Parks Innovation & Performance Management team, with inactive dispensers removed.
Registers of Scotland maintains this geospatial dataset of cadastral parcels to comply with the INSPIRE Directive. It contains polygon shapes showing the position and indicative extent of surface ownership for each registered property in Scotland, each linked to a Land Register title via a unique `inspire id`. The dataset is a subset of the Cadastral Map and serves as a foundational layer for land and property information.
The ERTO-K study provides supplementary material for a randomized controlled trial investigating the effects of a 6-month detraining period following a 6-month exercise program. The dataset includes outcomes for 33 postmenopausal Korean women with osteosarcopenia, assessed at the end of the intervention and after detraining. It was authored by figshare admin karger and last updated in May 2026.
From 2020-21 to 2023-24, a joint evaluation and audit examined the Global Arctic Leadership Initiative (GALI), a program by Global Affairs Canada designed to bolster Canadian leadership in Arctic international forums. The report details strengths, gaps, and opportunities in program processes and management frameworks. It was published by Global Affairs Canada and last updated on the platform in May 2026.
A community-based screening cohort of 6,755 adults from Luohe, China, collected between 2021 and 2022. The dataset was created by Zhiwei Huang and analyzes the association between a composite metabolic index (TyHGB) and World Health Organization-defined high cardiovascular disease risk. The prevalence of high-risk status in the cohort was 22%.
2021–2022 data from 6,755 adults in the ChinaHEART Luohe screening cohort. This dataset examines the association between the TyHGB metabolic index and World Health Organization-defined cardiovascular disease high-risk status. The research was authored by Zhiwei Huang and shared under a CC-BY-4.0 license.
2021–2022 data from 6,755 adults in the ChinaHEART Luohe screening cohort, used to evaluate the TyHGB metabolic index for cardiovascular disease risk stratification. The dataset was created by Zhiwei Huang and published on figshare under a CC-BY-4.0 license. It supports analysis of the association between the TyHGB index and WHO-defined CVD high-risk status.
December 2018 land suitability assessment for commercial soybean farming in Colombia's Meta department, produced by UPRA and the Governor's Office of Meta. The dataset categorizes land into five aptitude levels based on biophysical, socioeconomic, and socio-ecosystem components. It includes polygon geometry and area calculations for zones deemed high, medium, low, or unsuitable, as well as legally excluded areas.
A research document proposes a framework for managing renewable energy and reducing energy storage investment constraints. The paper describes a bi-level game method for determining energy storage incentives and a master-slave approach for optimization. The document was authored by Yaoqiang Sun and last updated on 2026-05-25.
A land suitability map for oil palm (Elaeis guineensis J.) cultivation in the Nariño department of Colombia, produced at a 1:100,000 scale. The dataset results from a 2017 inter-administrative contract between UPRA and the University of Nariño, incorporating biophysical, socioeconomic, and socio-ecosystem components. It classifies areas into high, medium, low, unsuitable, and legally excluded categories for commercial palm development.
A dataset by Farman Ali, last updated in 2026, investigating the consequences of clean energy generation, green practices, and ethical business on global sustainability. The 4.8 MB CSV file examines how green finance laws handle financial limitations for companies pursuing green innovation.
An Indigenous-led project by the Melaythenner Teeackana Warrana Aboriginal Corporation and the University of Tasmania aims to compile environmental, cultural, and social information for Tebrakunna Country. The project will develop a co-designed wellbeing framework and identify research priorities for Sea Country assessment, cultural burning, and climate impacts. Outputs include written reports to support the Tebrakunna Ranger Program and aspirations for an Indigenous Protected Area.
Supplementary materials and code support the manuscript 'Long-term reservoir surface water dynamics in the Yellow River Basin (1986–2024)'. The dataset includes annual reservoir water body distribution maps from 1986 to 2024 derived from 30 m Landsat imagery using the Modified Dynamic Surface Water Extent algorithm in Google Earth Engine. Author Yebin Zou published these materials on figshare under a CC-BY-4.0 license.
Individual athlete records for participants in the Intercollegiate Sports Games from 2021 to 2025. The dataset includes variables such as sport type, demographic information, socioeconomic stratum, and territorial representation. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated in May 2026.