Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,093 datasets
A corpus of news articles published between 2010 and 2024 analyzing media discourse on HIV/AIDS in China. The dataset, created by Yuhang Li and last updated in 2026, employs topic modeling and collocation analysis to identify thematic communities, terminology for people living with HIV, and conceptual metaphors. It reveals a discursive shift towards political narratives and the persistence of stigmatizing language.
A corpus of Chinese news articles published between 2010 and 2024 analyzed for HIV/AIDS discourse. The dataset, created by Yuhang Li and shared under CC-BY-4.0, contains extracted thematic networks, 19 categories of terminology for people living with HIV, and 12 categories of HIV/AIDS metaphors.
Yuhang Li's dataset contains terminology and metaphor usage for people living with HIV (PLHIV) extracted from a large-scale corpus of Chinese news articles published between 2010 and 2024. The analysis identifies 19 categories of PLHIV terminology, 12 categories of HIV/AIDS metaphors, and 48 distinct topics across five thematic communities. The dataset is stored in an XLS file and was last updated on 2026-05-13.
5.5 KB of ablation study results from a federated learning framework for multimodal data fusion. The dataset likely contains experimental metrics comparing a novel tensor-based method against existing approaches on benchmarks like TREC2017 and CMU-MOSI. It was authored by Li Wan and published on figshare under a CC-BY-4.0 license in May 2026.
Experimental results comparing multimodal fusion methods on the CMU-MOSI sentiment benchmark. The dataset likely contains performance metrics from a federated learning framework that uses tensor decomposition for privacy-aware training. It was authored by Li Wan and uploaded to figshare on 2026-05-06.
Li Wan published a federated learning framework for multimodal data fusion on figshare in May 2026. The dataset likely contains algorithm performance metrics, specifically Mean Average Precision (MAP) values, from experiments on the TREC2017 Precision Medicine Track and CMU-MOSI sentiment benchmarks. The file is 5.5 KB in size.
A dataset supporting research into numerical methods for nonlocal physical models. It contains results from an energy-based Local-to-Nonlocal coupling used as a constraint for an interface identification problem. The dataset was authored by Matthias Schuster and last updated on May 14, 2026.
City of Hobart's 2015 Interim Planning Scheme defines spatial overlays for hazards like landslides, coastal inundation, and climate change. These geospatial layers provide a general indication of regulated areas for land development and infrastructure projects. The data is intended for preliminary planning, with site-specific investigation recommended for final decisions.
50,944 records of generated responses to math and word-problem prompts. The dataset was prepared by User01110 from a local JSON file and published on Hugging Face in Parquet format for viewer compatibility. It was last updated on June 13, 2026.
A paper and associated materials presenting improved Bayesian filtering techniques and a novel smoother for regime-switching state-space models. The work assesses performance using a New Keynesian DSGE model and three filters, with simulation results showing speed and accuracy improvements. The author is Nigar Hashimzade, and the materials were last updated in April 2026.
figshare admin karger published survey data from 2065 adults with overweight or obesity in Germany on 2026-05-05. The data likely contains responses on awareness, use, interest, and barriers regarding primary care consultations, behavioural programmes, and pharmacotherapy for weight management. The dataset is a 4.1 MB PDF file licensed under CC-BY-4.0.
A 2026 theoretical study by Chenhui Wang provides high-accuracy interaction energies for linear alkane dimers (C_n H_{2n+2}, n=1 to 18). The dataset includes BSSE-corrected results from -2.2 kJ/mol (n=1) to -62.6 kJ/mol (n=18), with relative errors below 5% against benchmark calculations. It also contains thermodynamic analysis indicating spontaneous dimerization from n ≥ 8 at 100 K.
Twenty-five spectral channels from the High Altitude MMIC Sounding Radiometer (HAMSR) captured atmospheric data during the NASA EPOCH project in August 2017. This dataset provides measurements to infer three-dimensional profiles of temperature, water vapor, and cloud liquid water, even in cloudy conditions. It was collected from the NASA Global Hawk aircraft as part of a training and research mission focused on tropical cyclogenesis in the Eastern Pacific.
Navigation and housekeeping data from NASA's Global Hawk aircraft during the Hurricane and Severe Storm Sentinel campaign. The dataset contains real-time 1 Hz UDP packets broadcast in IWG1 format, capturing flight and atmospheric measurements to study tropical storm formation and the Saharan Air Layer. It is produced by the National Aeronautics and Space Administration, with metadata last updated in March 2026.
DNABERT embeddings calculated from plasmids and chromosomes. Maho Tokuda created this 2.1 MB dataset for a RandomForest model predicting plasmid destinations. The dataset was last updated in June 2026.
A longitudinal study dataset from 25 Mandarin-speaking children who received cochlear implants before 30 months of age. The data includes parent lexical diversity (NDW) and grammatical complexity (MLU) measures at 1 and 2 years post-implant, correlated with children's standardized language test scores at 3 years post-implant. The dataset was published by Luo et al. in 2026 and is hosted on figshare.
A Serena L. DiLiberti study reports the defluorination of trifluoromethyl arenes bearing an ortho N-heterocycle upon reaction with potassium tert-butoxide in THF at ambient temperature. The dataset likely contains experimental results from 14 demonstrated examples of this reaction, with yields up to 85%. The findings, shared on figshare in May 2026, are intended to inform synthetic route design for targets containing these functional groups.
Thirteen ground stations across Europe, Africa, and Brazil collected global lightning activity data from August 1 to October 1, 2006. This dataset was generated for the NASA African Monsoon Multidisciplinary Analyses campaign to study African Easterly Waves and Mesoscale Convective Systems. The network provides high temporal resolution of 1 millisecond and spatial accuracy ranging from 10-20 km within the network to over 50 km outside its periphery.
Mouse and human spatial transcriptomics data generated using SPTT and SPTEdu-seq techniques. The dataset includes multiple mouse embryo, kidney, and brain samples, as well as human ccRCC frozen sections, with digital expression data in .mtx format and metadata. The dataset is 1.6 GB in size, authored by Shuang Zhang, and was last updated on May 14, 2026.
Roberta Martino's dataset from figshare, last updated May 2026, provides morphometric and microwear data on European hippopotamus fossils. The 177.4 KB XLSX file includes data from a review of Pleistocene specimens from Central and Western Europe. It focuses on mandibular and cranial features to assess phenotypic diversity and dietary shifts in Hippopotamus antiquus populations.