Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,607 datasets
219 clonotypes and 169 αβ TCR pairs were isolated from six donors to characterize the T-cell response to a conserved SARS-CoV-2 spike protein epitope. The dataset, authored by Yoshiki Aritsu and last updated on 2026-04-27, includes analysis of peptide modifications and their impact on T-cell recognition and vaccine efficacy. It is shared under a CC-BY-4.0 license on figshare as a 1.5 MB PDF document.
Bloom's taxonomy assessments of Algebra I online credit recovery items from a study vendor in the 2022-23 academic year, alongside coded state regulations from fall 2023. These data were analyzed for a JPAM Policy Insights publication titled 'Failing to Learn from Failure in Online Credit Recovery Assessments'. The dataset was authored by Carolyn Heinrich and is hosted on Harvard Dataverse.
Colombian data on lottery prizes held by the public, disaggregated by lottery. The dataset includes the prize value, description, month, prize type, and year for each entry. It is provided by the Colombian open data portal, www.datos.gov.co, and was last updated on 2026-05-18.
Nemotron-Personas-El-Salvador is an open-source dataset licensed under CC BY 4.0, composed of synthetically generated personas. The dataset is anchored in real-world distributions and focuses on Salvadoran Spanish. It was created by NVIDIA and last updated on June 3, 2026.
A Digital Elevation Model (DEM) for Antarctica provides surface topography data extending to 81.5 degrees south latitude at a 5-kilometer resolution. Approximately twenty million data points derived from ERS-1 radar altimetry during its geodetic phase from March 1994 to May 1995 were used to generate this dataset. It offers a foundational, large-scale view of Antarctic ice sheet elevation and surface roughness.
Land cover classification raster files focus on water and wetland vegetation classes across three Arctic-Boreal Vulnerability Experiment (ABoVE) campaign regions in Alaska and Canada. The dataset was derived from NASA UAVSAR L-band synthetic aperture radar acquisitions between 2017 and 2019, with classifications trained and validated using field visits, UAV imagery, and satellite data. It includes both preliminary 13-class and final simplified 5-class versions, along with training data and lake characteristics.
Supplementary PDF files for a study assessing the test-retest reliability of running economy and other physiological parameters during a 90-minute run. The study involved 14 well-trained male marathon runners with a maximal oxygen uptake of 63.1 ± 5.8 mL·kg⁻¹·min⁻¹. The files, authored by Michele Zanini and last updated in April 2026, accompany the published research article.
GPT-4o and Gemini 2.5 Pro were evaluated for extracting PI-RADS v2.1 scores from free-text prostate MRI reports, comparing their performance with three radiologists of varying experience. Inter-rater agreement between human experts was highest (Gwet's AC1=0.68), while agreement between LLMs was lower (AC1=0.52). The dataset likely contains the processed reports and the assigned scores from both LLMs and human readers.
A comparative study published in 2026 evaluates GPT-4o and Gemini 2.5 Pro for extracting PI-RADS v2.1 scores from free-text prostate MRI reports. The dataset likely contains the processed reports and assigned scores used to compare LLM performance against three human radiologists of varying experience. Author Jing Wen released the study document under a CC-BY-4.0 license.
A 16.7 KB DOCX file containing the full Boolean search strategy used to query three academic databases for literature on parental adverse childhood experiences. The strategy was exported on 23 April 2026 and authored by J.P.C. Staaks. It includes detailed search terms and operators for PsycInfo, Medline, and Web of Science Core Collection.
Matrícula en I.E. oficiales por Género corte Abril 2022. The dataset contains enrollment figures for official educational institutions in Colombia, broken down by gender and educational level. It was published on the Colombian open data portal www.datos.gov.co and last updated on 2026-05-18.
Twenty certified divers performed standard dives to a minimum depth of 66 feet as part of a real-world study. The dataset, authored by Subhojit Jash and last updated in April 2026, likely contains heart rate measurements across six dive phases: rest, pre-dive, descent, bottom, ascent, and post-dive.
A metadata catalog from www.datos.gov.co, last updated on 2026-05-18. The dataset documents the production, routing, and responsible parties for information generated by the Colombian government. It includes columns for language, support medium, consultation location, format, generation date, responsible personnel, and update frequency.
An inventory of public information generated, obtained, acquired, or controlled by INVIPASTO that has been classified as confidential or reserved. The dataset includes columns for legal basis, legitimate objective, area, and document series. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
An inventory of public information generated or controlled by obligated entities in Colombia that has been classified as confidential or reserved. The dataset includes 13 columns detailing the legal justification, responsible parties, and classification terms for each record. It is published by www.datos.gov.co and was last updated on May 18, 2026.
Marie-Annick Moreau authored this dataset, which is a small 8.7 KB file last updated on June 3, 2026. The description details a scene of cooperative work, with an individual named Bumbo constructing the inner folds of a trap chamber while others assist from the outside. The data is stored in an EAF file format and is shared under a CC-BY-NC-SA 4.0 license.
An ethnographic description documenting the construction of a trap chamber. The text describes Bumbo working inside the chamber to make its inner folds, assisted by other men positioning and tying stakes on the outside. The dataset is a 24.1 KB PDF file authored by Marie-Annick Moreau and last updated on June 3, 2026.
DeepJEB v1.0 contains 2,138 synthetic 3D jet engine bracket designs. The dataset was generated by KAIST-SmartDesignLab using a DeepSDF-based generative model and an automated simulation pipeline, pairing geometry with finite-element analysis results. The Hugging Face distribution mirrors the official release.
SO-Bench is an audio-only spatial question answering benchmark built from first-order ambisonics (FOA) spatial audio. The dataset contains FOA waveform files paired with natural-language question-answer pairs covering sound event detection, localization, spatial relations, motion, and multi-step spatial reasoning. It was created by dieKarotte and last updated on June 13, 2026.
A 35% maximum improvement in static stiffness and a 20.4% maximum reduction in dynamic stiffness were observed for a proposed transverse leaf spring. The dataset likely contains numerical simulation and experimental validation results from a study by Zhi Li, published on figshare in April 2026. Findings provide a theoretical basis for designing leaf springs with strongly nonlinear stiffness characteristics.