Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,312 datasets
A text dataset for biomedical information extraction, developed for the ACL 2026 Findings paper 'Applicability Condition Extraction for Therapeutic Drug-Disease Relations'. The dataset is authored by B1tta and was last updated on June 18, 2026. It focuses on identifying context-specific conditions under which a drug is therapeutically effective for a disease.
Manually defined parameters serve as the ground-truth reference for generating synthetic cell-like clusters. The 5.5 KB XLS file contains a priori values controlling cluster shape, spread, orientation, and event number. Authored by Bradley Mason and last updated in May 2026, this dataset supports replication and accuracy assessment for the Rosetta-Routine modelling pipeline.
A 5.5 KB Excel file maps traditional descriptive statistical measures to conversion methods used by the Rosetta-Routine modeling algorithm. The mapping is intended to acquire information from unknown data and define corresponding cluster generator argument variables. Author Bradley Mason last updated the file on May 29, 2026, and it is shared under a CC-BY-4.0 license.
A 5.1 MB Excel file containing datasets used for figure generation and quantitative analyses in a manuscript. The data includes real and synthetic event-level measurements intended for population modelling. It was authored by Bradley Mason and last updated on 2026-05-29.
A 5.5 KB Excel file uploaded to figshare by Wyatt H. Bridgman on May 29, 2026. It contains data on the predictive skill of Probabilistic Predictive Trajectories (PPTs) generated using different infection-rate estimation procedures. The PPTs are scored using the Continuous Ranked Probability Score (CRPS) and have units of case counts.
Around 6,000 regulated waste management facilities in the UK report annual data on waste quantities and types received and sent on from site. This data, collected since 2006 by the Environment Agency, is used for compliance monitoring and has historically supported planning by the EC, DEFRA, and local authorities. It is published in multiple formats including an MS Access interrogator, Excel extracts, and regional summary tables.
The Waste Data Interrogator 2017 dataset contains annual waste quantity and type data reported by regulated waste management facilities in the UK. It includes data from around 6,000 sites, collected by the Environment Agency for compliance monitoring and planning. The data is provided in multiple formats including an MS Access interrogator and Excel extracts.
4.3 GB of cleaned adsorption structures from CatHub data, used for training the DBCata model. The dataset includes model checkpoints, fine-tuning scripts, and results for out-of-distribution testing. It was authored by Songze Huo and last updated on May 25, 2026.
ENERGY STAR Certified Residential Refrigerators meet specific program requirements effective from September 15, 2014 or August 5, 2021. The dataset, sourced from data.energystar.gov, includes model specifications and efficiency metrics such as Annual Energy Use and Percent Less Energy Use than US Federal Standard. It was last updated on April 3, 2026.
664 tokens per sample on average, according to the provided example. This corpus was used to train the JetonCount model and contains token-level statistics derived from the FineWeb-Edu dataset. It was created by the author 'fromziro' and last updated on June 22, 2026.
A dataset from a study on synthesized catechol-conjugated benzothiazole derivatives designed to inhibit biofilm formation in Pseudomonas aeruginosa. The data likely contains results for a series of compounds, including the hit compound 4p with an IC50 of 0.27 μM for biofilm inhibition. The dataset was authored by Ying-Bo Zhou and last updated on 2026-05-04.
Kun Li's replication dataset supports a study on the relationship between Talent Policy Attention (TPA) and the Talent Policy Sentiment Index (TPSI). It includes a city-year panel for 232 Chinese cities from 2014 to 2023, with indices for the digital economy and city-level controls. The materials also contain source sentiment aggregation data and Stata code to reproduce the TPSI using an entropy weight method.
Ahorros ciudadanos por racionalización de trámites is a dataset from www.datos.gov.co that calculates the direct monetary savings for citizens and users from the simplification and rationalization of administrative procedures offered by public entities. The dataset includes columns for total savings, entity name, procedure name, and breakdowns by savings type such as fee elimination and time reduction. It was last updated on 2026-05-18.
A longitudinal case study of a 24-year-old elite female road cyclist, tracking bone mineral density and physiological markers over three time points from October 2024 to May 2025. The dataset includes a detailed case report on the effects of a periodized resistance and impact loading program, and subsequent unilateral rehabilitation following a grade-2 MCL knee sprain. The report was authored by Stefan Pettersson and published on figshare in April 2026.
A single-case study documents the skeletal response of a 24-year-old elite female road cyclist to a targeted loading program before and after a knee injury. The 16.3 KB document, authored by Stefan Pettersson and last updated in April 2026, details longitudinal DXA scans, blood tests, and training logs over an 8-month period. It provides a detailed account of site-specific bone mineral density changes associated with unilateral rehabilitation training.
A single-case report details the skeletal response of a 24-year-old elite female road cyclist to a targeted bone-loading program and subsequent unilateral rehabilitation following a knee injury. The document includes data from three DXA scans, blood assessments, training logs, and diet records collected between October 2024 and May 2025. It was authored by Stefan Pettersson and published on figshare in April 2026 under a CC-BY-4.0 license.
A supplementary research file analyzing gross primary productivity (GPP) in Chinese forests dominated by Fagaceae species. The data likely contains results from variance decomposition, structural equation modeling, and regression analyses identifying climate and forest characteristics as primary drivers. The file, authored by Shaowei Yang and last updated in April 2026, is shared under a CC-BY-4.0 license.
A UK multicentre randomized controlled trial tested the neurofunctional effects of Trigeminal Nerve Stimulation (TNS) on 62 children and adolescents with ADHD. The dataset includes fMRI data from three ADHD-sensitive tasks assessing response inhibition, working memory, and sustained attention. Natali Bozhilova published the data on figshare in May 2026 under a CC-BY-4.0 license.
A seismic velocity survey was carried out in Associated Freney Oilfields Nerrima No. 1 Bore by the Bureau of Mineral Resources on the 10th August 1955. The well is situated on the Nerrima Dome in the Fitzroy Basin, Western Australia. Average measured velocities ranged from 8000 ft/sec near the top to 12,200 ft/sec for the total depth of the bore.
Field photographs document the appearance and characteristics of samples collected from slag banks at four locations in Scotland and northwest England. The collection includes contextual and sample-specific images organized by site. John MacDonald and Robin Hilderman of the University of Glasgow collected the data between 2021 and 2023.