Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
40,759 datasets
A dataset containing evaluation results from the Scaffold-Lab framework, which systematically assesses protein backbone generation methods. The data was created by Zhuoqi Zheng and last updated on May 22, 2026. It likely contains performance metrics for seven representative methods across categories like designability, novelty, diversity, efficiency, and structural properties.
Derived parameter values used as inputs for cluster generation software to create a synthetic cluster replicant dataset. The dataset is a 5.5 KB XLS file authored by Bradley Mason and last updated on 2026-05-29. Comparison of these parameters with those in a referenced table enables quantitative assessment of the replication process's accuracy and fidelity.
Geoscience Australia Data presents results from a geological reconnaissance of the northwest Australian continental shelf. The survey was conducted over two 3-month cruises in late 1967 and 1968, covering a 1200 km region from Barrow Island to beyond Scott Reef. It describes shelf and upper slope sediments, maps their distribution, and investigates the late Cainozoic geological history of the margin.
Geospatial seabed morphology and geomorphology information for the Beagle Marine Park in south-eastern Australia. The data product was created using a nationally consistent two-part classification system applied to bathymetry digital elevation models. It is intended for use by marine park managers, regulators, the general public, and other stakeholders.
A 1:100,000 scale map zones Colombian land for the production of four pasture grasses: pará (Brachiaria mutica), braquipará (Brachiaria plantaginea), alemán (Echinochloa polystachya), and tanner (Brachiaria arrecta). The dataset integrates physical and socio-ecosystem criteria and was used as an input for national and export beef and dairy cattle grazing suitability studies. The product was generated in September 2019.
30 women in the UK with prior experience using washable menstrual pads participated in a single-arm pilot study. The dataset, created by Rebecca Cannings-John and last updated in May 2026, captures feedback on product acceptability and side effects across two menstrual cycles. This initial testing was designed to inform a larger field study of the SunPad product in Nepal.
Research instruments and session data from a mixed-methods study on the role of self-awareness work in improving classroom attention. The study involved 26 students from a public secondary school's first-year intermediate vocational training program. Data includes questionnaires, Likert scales, participant and non-participant observation, and activity evaluation rubrics from four sessions.
Data from 2010 onward, updated through the last day of the previous month, detailing victims of crime under the Colombian Oral Accusatory Penal System (Laws 906 of 2004 and 1098 of 2006). The dataset is published by the Fiscalía General de la Nación (Colombia's Attorney General's Office) via the national open data portal. It contains counts of victims categorized by crime type, demographics, location, and case stage.
A 258.2 KB PDF dataset from figshare, authored by Ranjit Kumar Sahoo and last updated on June 1, 2026. It presents experimental findings from controlled crosses of the parthenium beetle, Zygogramma bicolorata. The data explores the effects of short-term inbreeding and Wolbachia infection on sex allocation, pupal mortality, and offspring production.
Marie Lemerle's dataset supports research on the foraging behaviour, hunting strategies, and temporal activity patterns of brown hyenas (Parahyaena brunnea) at a coastal seal colony in southern Namibia. The data were collected over a three-year period through direct observations and camera traps. It is a 3.3 MB XLSX file shared under a CC-BY-4.0 license.
UPRA and the University of Nariño produced a land suitability map for commercial Arabica coffee in the Nariño department, Colombia, under contract 222 of 2017. The map, published in December 2018, classifies land into five aptitude categories based on biophysical, socioeconomic, and socio-ecosystem components at a 1:100,000 scale. It includes exclusion zones where commercial coffee cultivation is legally prohibited.
Motion data from seventeen wireless inertial measurement units capturing full-body kinematics during onshore breaststroke, freestyle, and butterfly strokes. The dataset was created by Qiwei Zhang and published on figshare in May 2026. It includes validation metrics comparing IMU data to optical motion capture.
Samuel & Audrey YouTube Transcripts EN Corpus Legacy Deposit is an archived earlier version of a dataset containing English-language transcripts from a travel and food YouTube channel. The 85.0 MB package includes transcript records in CSV and JSONL formats, along with documentation and checksum files. Samuel Jeffery authored this legacy version, which is retained for historical tracking but is not the current recommended dataset.
A dataset from a randomized controlled trial investigating the effects of resveratrol on endothelial progenitor cells and apoptosis biomarkers in postmenopausal women with chronic coronary heart disease. The study involved 20 participants allocated to receive either 1,000 mg of resveratrol or a placebo for 90 days. It was authored by Gustavo Henrique Ferreira Gonçalinho and last updated in May 2026.
A land suitability assessment map for the traditional cultivation of cassava (Manihot esculenta Grants var. llanera) in the Meta department of Colombia. The dataset results from a 2017 inter-administrative contract (No 202) between UPRA and the Meta Governor's Office, applying a methodology that includes biophysical, socioeconomic, and socio-ecosystem components. It was published in December 2018 at a general scale of 1:100,000.
Samuel Jeffery's Partnerships and Media References Legacy Archive is a historical version of a dataset package for the Samuel & Audrey Media Network. The 188.6 KB ZIP file contains CSV and JSONL records, documentation, and schema files for partnership, media-reference, and publishing-history records. This legacy version was archived on Figshare in May 2026 and is retained for continuity, with the current dataset maintained on multiple platforms.
English-language transcripts from travel videos on the Nomadic Samuel YouTube channel. This 17.9 MB legacy deposit includes transcript records, CSV and JSONL exports, and documentation files. The dataset was authored by Samuel Jeffery and is licensed under CC-BY-4.0, with a last update recorded on 2026-05-30.
A legacy archive of long-form English travel articles published on NomadicSamuel.com by Samuel Jeffery and the Samuel & Audrey Media Network. The dataset package includes article records in CSV and JSONL formats, documentation, and archive materials from an earlier version of the corpus. Users are directed to more current versions on platforms like Hugging Face and GitHub for the latest cleaned data.
A legacy archive of Argentina-focused travel articles published on CheArgentinaTravel.com by the Samuel & Audrey Media Network. The dataset package includes article records, CSV and JSONL exports, documentation, and archive materials from an earlier version of the corpus. The current cleaned version is maintained on Hugging Face, GitHub, Zenodo, Figshare, Kaggle, and DagsHub.
A legacy archive of structured citation and reference records connected to the Samuel & Audrey Media Network. The dataset includes records for public media mentions, academic citations, tourism-sector references, finance-media references, public profiles, interviews, and third-party source records. It was created by Samuel Jeffery and is preserved as a historical version for continuity, with a last update timestamp of 2026-05-30 22:36:28.