Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
39,925 datasets
Module 117 from the Davis Logic V2 project provides a real-time, first-order complementary telemetry filter for sensor fusion in embedded systems. The algorithm, authored by Jamie Davis and licensed under CC BY 4.0, merges high-frequency inertial data with low-frequency reference signals to produce a stabilized tracking vector. It is designed for zero-heap, zero-copy execution on low-power microcontrollers, with a file size of 63 bytes.
A cross-sectional study of 291 family caregivers in China, conducted via structured questionnaire in psychiatric settings across multiple cities. The data explores willingness to support online psychotherapy, preferred delivery formats, and perceived benefits and barriers. The dataset was authored by Qianqian Li and last updated in June 2026.
A survey of 291 family caregivers of individuals with mental illness in China, assessing their willingness to support online psychotherapy. The dataset includes caregiver demographics, technology access, patient characteristics, and attitudes toward online therapy, including willingness scores, treatment preferences, and perceived benefits and barriers. The data was collected via a structured questionnaire in psychiatric settings across multiple Chinese cities and is licensed under CC-BY-4.0.
424 adults with type 2 diabetes were assessed in a cross-sectional study at the Second Hospital of Jilin University. The dataset includes dual-energy X-ray absorptiometry (DXA) measurements of bone mineral density and bioelectrical impedance analysis (BIA) body composition data. It was authored by Dihe Cheng and last updated on June 3, 2026.
The Murray Basin in southeastern Australia contains subsurface geological data on mid-Tertiary permeability barriers affecting groundwater flow. The dataset likely contains stratigraphic and sedimentological analyses from borelogs, including porosity measurements from a fully cored section in the Piangil West-2 borehole. It was published by the Australian Ocean Data Network and last updated on 2026-06-04.
Hongmin Wang published performance metrics for a joint spatiotemporal-geometry framework in passive radar on 2026-06-02. The dataset contains per-class precision, recall, and F1-score results from 500 Monte Carlo trials per SNR point across five evaluation seeds. The proposed method achieved a mean classification accuracy of 93.7% and a mean localization error of 1.15 km at -6 dB SNR.
191 first-year engineering students at Tribhuvan University's Pulchowk Campus participated in a cross-sectional survey between April and June 2022. The dataset likely contains structured questionnaire responses assessing knowledge, attitudes, and practices regarding blood donation. The study was authored by Bhola Teli and published on figshare.
Adel Alshamrani published experimental results on June 2, 2026, for a machine learning approach to detecting Advanced Persistent Threats (APTs). The results, which include a 96.3% detection accuracy and a 42% reduction in false positives, are derived from a methodology that simulates APT behaviors using the CERT Insider Threat Dataset. The dataset is a 5.5 KB Excel file containing ablation study results from a 5-fold cross-validation.
Adel Alshamrani's dataset, last updated June 2026, presents results from a machine learning approach for detecting Advanced Persistent Threats (APTs). The methodology uses the CERT Insider Threat Dataset to simulate APT behaviors via multi-modal data analysis and language models. Experimental results achieved 96.3% detection accuracy and a 42% reduction in false positives compared to state-of-the-art methods.
29 peer-reviewed studies from 2020 to 2026 are synthesized in this systematic review. It evaluates the clinical efficacy and socioeconomic implications of integrating artificial intelligence into oncology multidisciplinary team decision-making. The review was authored by Shuang Liu and last updated in May 2026.
29 peer-reviewed studies from 2020 to 2026 synthesize evidence on AI integration into oncology multidisciplinary team decision-making. The review, authored by Shuang Liu and shared under a CC-BY-4.0 license, evaluates clinical efficacy and socioeconomic implications. It finds AI systems achieve 62-76% concordance with human tumor boards across multiple cancer types.
GPT-4 annotated the clinical severity of over 17,500 phenotypic abnormalities in the Human Phenotype Ontology across nine clinical characteristics. The annotations were benchmarked against ground-truth labels with a mean true positive recall rate of 97%. This dataset, created by Kitty B. Murphy and last updated in May 2026, provides quantitative severity metrics for prioritizing therapeutic targets in rare diseases.
GPT-4 annotated the severity of over 17,500 phenotypic abnormalities catalogued in the Human Phenotype Ontology. The annotations are based on nine clinical characteristics and their frequency, benchmarked against ground-truth labels with a mean recall of 97%. Kitty B. Murphy published the dataset on figshare in May 2026.
Athabasca, Peace, and Slave rivers in Canada's oil sands region contain water quality chemistry data from 17 sites. The dataset includes measurements of major ions, nutrients, metals, and organics, with over 1300 samples collected from 2012 to 2015. It was produced by Environment and Climate Change Canada, and an interpretive report was released in 2018.
Stratigraphic framework maps for the Saskatchewan Phanerozoic Fluids and Petroleum Systems project were produced using a 2 km equi-spaced modified grid and kriging algorithms. The Government of Saskatchewan compiled and validated data from multiple regional projects and wells in adjacent jurisdictions. This map series includes structure, isopach, and zero edge files for geological analysis.
A series of stratigraphic framework maps produced for the Saskatchewan Phanerozoic Fluids and Petroleum Systems (SPFPS) project. The maps were generated using a 2 km equi-spaced modified grid and a kriging algorithm from Golden Softwareโs Surfer 9. Data from multiple government-led projects between 2003 and 2009 were validated, edited, and supplemented with well data from adjacent jurisdictions to minimize edge effects.
Stratigraphic framework maps for the Saskatchewan Phanerozoic Fluids and Petroleum Systems project were produced using a kriging algorithm on a 2 km grid. The underlying data was compiled from multiple regional projects by the Saskatchewan Ministry and validated for consistency. Data from wells in adjacent jurisdictions was also incorporated to minimize edge effects during contouring.
36,555 Reddit posts and comments published between March 1, 2020, and March 31, 2025, analyzed for public discourse on artificial intelligence in healthcare. The dataset was created by Zaiyu Tang and includes results from BERTopic modeling and sentiment analysis. It was last updated on June 2, 2026.
Chicago's Red Light Camera Program records daily violation volumes at each camera location from July 1, 2014 onward, excluding the most recent 14 days. The dataset includes all potential violations captured by the system and reviewed by contractors, regardless of whether a citation was ultimately issued. Columns suggest detailed geospatial and administrative data for each recorded event.
Spring 2025 survey of 371 staff members (~20% of the workforce) at the Italian National Institute for Astrophysics (INAF) on generative AI usage. The data were collected and analyzed by Alessandro Cabras, revealing weekly usage patterns, ethical concerns, and user segmentation. Results were published in a PDF document in June 2026.