Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,944 datasets
Characterization data for beneficiaries of humanitarian aid due to victim status in Colombia's armed conflict. The dataset includes family nuclei, demographic details, victimizing events, and economic aid disbursements. It is sourced from datos.gov.co and covers records from January 2017 to March 2026.
A research paper describing the AMULED framework for addressing moral uncertainty in reinforcement learning. The framework integrates multiple ethical theories using large language models to provide moral feedback, evaluated across two environments with 50-run replicates. The PDF document was authored by Rohit K. Dubey and last updated on 2026-05-04.
Chinese multi-channel conversational speech data with expert human annotations, developed by ASLP@NPU and QualiaLabs. It is part of the SmoothConv–DuplexConv corpus family, constructed from the same underlying conversational sources as the companion DuplexConv dataset.
Australia's National Base Map provides seamless topographic colour mapping for the entire country and its external territories, including Norfolk, Lord Howe, Macquarie, Cocos (Keeling), Christmas, Heard, McDonald Islands, and the Australian Antarctic Territory. The service integrates data from Geoscience Australia, the Australian Antarctic Division, OpenStreetMap, and other sources, portraying cultural, hydrography, marine, transport, vegetation, and relief themes. Topographic information was checked in 2008 using satellite imagery and supplemented in 2009.
A 2026 study by Ildiko Krisztina Preiner explores the effects of repeated digital nudges on healthier food choices in an online meal ordering context. The dataset likely contains results from a three-day experiment with 154 participants randomly assigned to Feedback, Assortment, Combined, or Control conditions. Participants made hypothetical daily lunch orders, each including a main dish, side dish, and drink, with health scores tracked over time.
A single medical case report document describing a rare presentation of polyarteritis nodosa (PAN) with isolated sixth nerve palsy and orbital apex syndrome. The document details the patient's symptoms, diagnostic imaging, treatment with corticosteroids, and 6-month follow-up outcome. The report was authored by João Mendes and published under a CC-BY-4.0 license on figshare in May 2026.
A retrospective cohort study of 184,291 Chinese adults with normal baseline fasting plasma glucose, conducted by the Rich Healthcare Group. The data includes a median follow-up of 3.0 years, during which 11.28% of participants developed incident impaired fasting glucose. The study, authored by Jintao Jiang, analyzes the relationship between estimated pulse wave velocity and diabetes risk.
A database of Notices of Intent for storm water discharge permits associated with construction activity, sourced from data.delaware.gov. Records include permit details, project locations, and statuses, with the most recent update logged on May 29, 2026. The dataset likely contains planned construction activities that require coverage under the National Pollutant Discharge Elimination System (NPDES).
A high-resolution bathymetric grid of the Cape Darnley region in East Antarctica, compiled from multiple data sources over the last three decades. The dataset was created by researchers including Smith J. et al. and published in Antarctic Science in 2021. It integrates single-beam, multibeam swath bathymetry, and digitized chart depths to visualize seafloor morphology.
A 20.1 MB theoretical dataset from figshare, authored by Bikram Dholey and last updated in May 2026. It contains analytical and neural network model data for shear horizontal wave propagation in a layered fluid and piezoelectric composite system. The data supports a parametric study on phase velocity effects from piezoelectric coupling, strain gradients, and fiber volume fraction.
Colombia's National Geological Museum 'Jose Royo y Gomez' catalog records all collection objects, including minerals and fossils. The dataset includes information on categories, photographs, conservation status, descriptions, types, textures, groups, specific locations, and places of origin. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
This dataset contains pre- and post-training measures of attention and verbal language from five people with chronic aphasia who completed a 9-week meditation intervention. It assesses whether intensive meditation training can enhance attention processes in this population and potentially generalize to verbal-language skills.
1,665 annotated Myanmar language sentences provide a balanced foundation for sentiment classification tasks. The dataset is designed to capture linguistic nuances, including sarcasm and compound sentence structures. It was created by kalixlouiis and last updated on Hugging Face in June 2026.
Pyrolysis and bulk kinetic studies investigate the hydrocarbon generation potential of marine organic-rich rocks from the Middle Ordovician Goldwyer Formation in the Canning Basin, Western Australia. The dataset, published in the International Journal of Coal Geology in 2020, includes results from Rock Eval pyrolysis and pyrolysis gas chromatography (Py-GC). It provides basin-specific kinetic inputs for burial history modeling on the Broome Platform.
A dataset of 9,999 integers from 2 to 10,000, tracking their Collatz conjecture trajectories. It was created by Taro Fujita and last updated in May 2026. The data includes counts of steps where digit sums in prime bases like 2 and 3 align or nearly align.
Geoscience Australia's study examines how eight different spatial reference systems affect the predictive accuracy of interpolation methods for seabed sediment data. The research compares geographic coordinate systems WGS84 and GDA94 with six map projections using Inverse Distance Squared and Ordinary Kriging methods. Results indicate negligible differences in accuracy for predicting sediment data in the Australian Exclusive Economic Zone.
A 2.0 MB document by weizheng xu, last updated May 23, 2026, investigates the dynamic plastic bending of simply supported and clamped beams. The analysis, within small-deflection theory, uses plastic bound theorems and modal approximations for two impulsive velocity distributions: exponential and Gaussian decay. It provides closed-form solutions for deformation and response time, comparing them with finite element results.
Registro de uso y demanda del agua Corantioquia is a registry of water resource users reported to the Corantioquia environmental authority in Colombia. The dataset covers minor water uses that do not require a formal concession, such as small domestic, community, or agricultural activities with reduced volumes. It is published on the datos.gov.co platform and was last updated on 2026-05-18.
Geoscience Australia Data released a workflow and datasets for developing a local, corrected wind field for Tropical Cyclone Debbie. The data combines modeling with observational corrections and accounts for local wind effects like topography and land cover to estimate maximum 0.2-second wind gusts at 10 meters above ground. This release aims to showcase a method for producing local TC wind fields to support post-disaster surveys and vulnerability modeling.
A spectral library of eight representative plastics captured during thermal transformation using in situ Raman spectroscopy. The identification framework elevated the median spectral matching degree from 47 to 91% and reduced unassigned spectra by 76%. The dataset, created by Ting Su and last updated in April 2026, provides a framework for identifying aged microplastics from waste-to-energy incineration.