Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,611 datasets
MCTD-KG is a 1.4 MB knowledge graph in JSON format, created by Peize Li and last updated on May 25, 2026. It is designed for complex material question answering by integrating multiple heterogeneous data sources. The dataset is shared under a CC-BY-4.0 license on figshare.
19 distinct robotic manipulation tasks are represented, each with 8 different robot embodiment configurations. The dataset includes 50 demonstrations for each unique combination of task and embodiment, totaling 7,600 demonstrations. Generated by jsw19 and uploaded to Hugging Face, this version excludes the 'put_object_cabinet' task due to stability issues in the final generation run.
A specialized parallel dataset engineered for Supervised Fine-Tuning of Large Language Models in zero-resource Turkic languages. The corpus, created by ansarzeinulla and last updated in June 2026, is designed to mitigate catastrophic forgetting during model adaptation to endangered languages. It contains high-fidelity Nogai-Russian translations of biblical text.
Research data supporting the manuscript "Controlled Zeno-Induced Localization of Free Fermions in a Quasiperiodic Chain". The 1.5 GB ZIP archive contains Python and C++ programs for data generation and results folders with analyzed data for figures in the paper. The dataset was authored by Pinaki Singha and last updated on 2026-05-06.
City of Moreton Bay provides a customized vector tile basemap layer optimized to display special areas of interest. The layer is uniquely symbolized and includes landscaping features like grass, trees, and rock, as well as sports amenities such as tennis courts and field lines. It is built using the same data sources as the World Topographic Map and other Esri basemaps, last updated in May 2026.
Prydz Bay and the Mac.Robertson Shelf in Antarctica were sampled during Voyage 7 of the 1992/93 Antarctic Division shipping season. The dataset likely contains sediment cores, grab samples, and 3.5kHz echo sounder data collected to study Quaternary environmental change and sedimentation processes. It was published by Geoscience Australia Data.
12 sound horses were studied during walking under three shoeing conditions: unshod, flat shoe, and rocker shoe. Data were collected by ROBERT WHITTON using 3D motion capture and force plates, with measurements taken immediately after trimming and after 6 weeks of hoof growth. The dataset includes calculated net torques for distal forelimb joints.
Replication data for a study on tax incentives and firm compliance in China. The package includes code to construct analysis datasets and generate tables, figures, and a map. It was authored by Jingjing Fu for China Economic Review and last updated on June 27, 2026.
An exploratory protocol outlines the criteria for a trial evaluating CuePD, a smartphone app for gait assessment and personalized auditory cueing in people with Parkinson's disease. The 9.5 KB Excel file details participant selection rules for a study registered at ClinicalTrials.gov (NCT06941779). Author Conor Wall published the protocol on figshare under a CC-BY-4.0 license, with a last update timestamp of 2026-04-15.
Police Incidents data is published by the City of Dallas for research purposes, with the authoritative source being the Crime Analytics Dashboard. The dataset represents Dallas Police Public Data - RMS Incidents from June 1, 2014, to the current date, containing filtered crime reports as supplied by reporting parties. Data is filtered to exclude sensitive information such as sexually oriented offenses and cases involving juveniles.
England's Sites of Special Scientific Interest (SSSI) Impact Risk Zones are a GIS tool developed by Natural England. The zones define areas around terrestrial SSSIs and underpinned SACs, SPAs, or Ramsar sites to reflect feature sensitivities and indicate potentially adverse development types. LPAs can use the tool for rapid initial assessments and to determine when to consult Natural England.
Updated spatial boundaries for ancient woodland sites across England, excluding the Isles of Scilly. Natural England's inventory identifies sites using historic maps, aerial photography, and ground surveys, recording area in hectares and woodland type. The 2024 revision addresses gaps in prior data and for the first time includes small woodlands between 0.25 and 2 hectares.
Replication Data for 'Using AI to understand AI' supports a study submitted to the Journal of Risk Research. The data was developed to model expert perceptions of AI risk in health using a mental-models approach and to validate a workflow leveraging ChatGPT. Author Jonas Krieger provided this data through DataverseNL to ensure replicability of the described procedure.
Financial statements from the Colombian government's open data portal, last updated on 2026-05-18. The data includes classified and summarized information on an entity's financial position and results. Columns such as VALOR, TIPO, and VIGENCIA likely contain details on financial values, account types, and validity periods.
Colombia's Aeronautics Industry Corporation (CIAC) inventory of public information classified as confidential or reserved. The dataset includes 15 columns detailing the information's title, classification reason, legal basis, responsible parties, and dates. It is published on the datos.gov.co platform and was last updated on May 18, 2026.
Data Sheet 1 from figshare contains experimental results on arbuscular mycorrhizal fungi (AMF) effects on grapevines under pH stress. The 24.7 MB DOCX file, authored by Dehui Sun, was last updated in April 2026. It details measurements of mineral elements, plant physiology, and antioxidant enzyme activities.
Data from a 2026 study evaluates an antimicrobial peptide-loaded nanofibrous dressing for treating multidrug-resistant Pseudomonas aeruginosa infections in BALB/c mice. The dataset includes results from in vivo testing over a 4-day treatment period, showing significant bacterial count reduction. Samar Essam Metwally authored this research, which is shared under a CC-BY-4.0 license.
Experimental data details the in vivo efficacy of an antimicrobial peptide-loaded nanofibrous dressing against multidrug-resistant Pseudomonas aeruginosa in BALB/c mice. The dataset contains results from a once-daily treatment over 4 days, including bacterial count reductions and scaffold characterization metrics. Samar Essam Metwally authored this dataset, last updated in April 2026.
EM86, a computer-designed antimicrobial peptide, killed multidrug-resistant Pseudomonas aeruginosa SM016 within 60 minutes and showed low toxicity to human skin fibroblasts (IC50 > 300 μg/mL). The study functionalized a sodium alginate/polyvinyl alcohol nanofibrous dressing with 35 ± 18 μg of EM86, which significantly reduced bacterial counts in open-wound infected BALB/c mice after 4 days of treatment. This dataset, authored by Samar Essam Metwally and uploaded in April 2026, documents the in vitro and in vivo results.
Results from a study evaluating a novel antimicrobial peptide, EM86, loaded onto a gamma-irradiated sodium alginate/polyvinyl alcohol nanofibrous dressing for treating multidrug-resistant Pseudomonas aeruginosa wound infections in BALB/c mice. The dataset, created by Samar Essam Metwally, contains detailed experimental findings from in vitro and in vivo testing, including MIC/MBC values, time-kill kinetics, and wound bacterial counts. It was last updated on April 7, 2026, and is stored in a 26.2 KB DOCX file.