Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,600 datasets
Netherlands national government data details annual purchasing expenditure on goods and services from suppliers. The Ministry of the Interior and Kingdom Relations collects this information from all ministries and departments to provide Rijkswide procurement insights and report to the House of Representatives. Since 2012, the data has been used to analyze the government's procurement activities and the share awarded to small and medium-sized enterprises.
Enrollment data for students with exceptional talents in official educational institutions in Neiva, Colombia, for the first semester of 2022. The dataset is published by the Colombian government's open data portal, datos.gov.co, and was last updated in May 2026. It contains counts of students broken down by institution, gender, and specific talent categories.
June 2021 data from the Department of Transport and Planning identifies undeveloped land for residential development on the fringe of metropolitan areas. It details subdivision and planning status, site area in hectares, and potential lot yield for key regional centres including Drouin, Warragul, Morwell, Churchill, Moe/Newborough, Traralgon, Mildura, Horsham, Gisborne, Kyneton, Winchelsea, Ballarat, Geelong, and Bendigo. The dataset maps land supply categories such as unzoned englobo land, zoned englobo land, recent subdivision, proposed lots, and lots with a title.
A dataset of patient-generated drug reviews annotated for Aspect Term Extraction and Polarity Detection. The data was created by Gunjan Ansari using an automated, expert-driven rule-based annotation scheme called ATEdrug and was last updated on April 9, 2026. The dataset is focused on three medical conditions: Depression, Arthritis, and Birth control.
ATEdrug is a dataset of patient-generated drug reviews for three medical conditions: Depression, Arthritis, and Birth control. The dataset was annotated for Aspect Term Extraction and Polarity Detection using an expert-driven rule-based approach with minimal human intervention. It was created by Gunjan Ansari and last updated on 2026-04-09.
Australian land data identifies undeveloped parcels for residential development on metropolitan fringes. Each record details subdivision and planning status, site area in hectares, and potential lot yield. The dataset is provided by the Department of Transport and Planning and was last updated in April 2026.
He Chan Noh published a dataset on figshare in May 2026 describing the regioselective Ir-catalyzed intramolecular B(4)–H amidation of ortho-carborane-tethered dioxazolones. The dataset likely contains experimental results for synthesizing carborane-fused boralactams, which mimic oxindoles, under mild conditions. It is a 13.1 MB ZIP file licensed under CC-BY-NC-4.0.
A dataset from figshare describes a chemical synthesis method for carborane-fused boralactams. The dataset, authored by Hee Chan Noh and last updated in May 2026, includes a 44.8 MB ZIP file. The described method accommodates a broad range of carborane substrates and delivers products in high yields with excellent regioselectivity.
Hee Chan Noh published experimental data on figshare in May 2026. The dataset supports a report on an iridium-catalyzed intramolecular B(4)–H amidation of ortho-carborane-tethered dioxazolones. It includes results for meta- and para-carborane derivatives, yielding products that mimic oxindole geometry.
The Bureau of Mineral Resources developed reconnaissance-style maps of surficial cover facies on the Great Barrier Reef. The maps apply a simple bathymetric classification differentiating supratidal, intertidal, and subtidal zones, with variable resolution across the reef. The work was compiled in 1982 and edited in 1983 by B.M. Radke and G.W. D'Addario, with cartography by G.A. Young and R.A. Swoboda.
Sanciones Ejecutoriadas Y No Ejecutoriadas Por Intermediación Laboral Indebida por Dirección Territorial contains sanctions imposed on public or private companies for improperly hiring people to perform permanent core activities. The dataset includes sanctions at both first instance (No ejecutoriadas) and executed (Ejecutoriadas) stages, organized by territorial directorate. It was published on the www.datos.gov.co platform and was last updated on 2026-05-18.
Tomoki Iwakiri's data repository contains source data and Python scripts for the article "AMOC slowdown amplifies North Atlantic salinity variability to unprecedented levels". The 6.0 MB repository includes NC and PY files used to generate the publication's figures. It was last updated on 2026-05-18.
Map images depict hydrographic, morphologic, and edaphic features for the northern Amazon Basin in eastern Ecuador. Hydrographic data are available at two scales from 1:50,000 and 1:250,000 topographic maps generated in 1990 and 1993, while morphological and edaphological data were digitized from a 1:500,000 map published in 1983. The dataset is distributed as three compressed ZIP files.
The Tibetan Plateau is covered by 30-meter resolution raster datasets for wilderness extent and naturalness. The 2.2 GB collection includes files in formats like SHP and NC, generated using a hybrid Boolean–Weighted Linear Combination (WLC) framework. Junzhi Ye created these first operational products for any wilderness region above 4,000 m, last updated on 2026-05-22.
50 spectral bands of calibrated radiance data were collected by the MODIS/ASTER Airborne Simulator (MASTER) instrument during a single flight over California on January 17, 1999. The dataset provides Level 1B georeferenced imagery at approximately 20-meter spatial resolution, covering wavelengths from 0.460 to 12.879 micrometers. This deployment was coordinated by the U.S. Department of Energy's Remote Sensing Laboratory for the primary purpose of instrument validation.
Liam Pereira's dissertation dataset contains synthetic photovoltaic and load data for one year at hourly frequency. The photovoltaic dataset includes irradiance, solar positional angles, and cloud cover, while the load dataset includes baseline averages and stochastic noise. A separate search space dataset explores combinations of PV and battery sizes with fitness evaluations.
From 2025 to 2026, this dataset contains criminal court decisions from federal general jurisdiction courts in Russia, collected from the GAS 'Justice' portal (sudrf.ru). It was authored by kamjke and last updated on June 12, 2026. The data includes case metadata and cleaned decision texts.
Residential energy billing records for Colombia's Non-Interconnected Zones (ZNI), as mandated by Resolution SSPD No. 20172000188755 of October 2, 2017. The dataset includes commercial information such as consumption, subsidies, and billing values, sourced from the Colombian open data portal. It was last updated on 2026-05-18.
A 2026 pilot study by Bethany Laursen tracks outcomes from five hybrid research teams participating in design-thinking-based chartering workshops. It includes quantitative survey scores for team purpose and structure, along with qualitative interview data on factors influencing outcomes. The data supports a mixed-methods, longitudinal analysis of team science facilitation.
Cloud particle images and measurements from the NASA African Monsoon Multidisciplinary Analyses campaign. The dataset contains high-resolution black and white images and derived properties from two airborne probes: the Two-Dimensional Stereo probe and the Cloud Particle Imager. Data was collected during the 2006 field mission based in the Cape Verde Islands to study African Easterly Waves and Mesoscale Convective Systems.