Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,344 datasets
A retrospective cohort study of 284 adults with idiopathic sudden sensorineural hearing loss (SSNHL) from 2012 to 2019, conducted by Li Guo. The data compares initial systemic versus intratympanic corticosteroid therapy, analyzing outcomes like pure-tone average (PTA) gain and recovery rates using a causal inference framework.
CoDA-Bench is a benchmark created by RUC-DataLab to evaluate AI agents on code and data-intensive tasks in realistic environments. Unlike benchmarks providing oracle data directly, it requires agents to discover relevant data among hundreds of semantically similar files. The dataset was last updated on June 16, 2026.
PREFIRE's dual CubeSats carry a 63-channel spectrometer measuring far-infrared radiation from 5 to 53 µm, a spectral range where most polar emissions occur but which has not been measured on a large scale. This dataset provides monthly, 1°x1° gridded climatologies of surface spectral emissivity and its standard deviation, sorted by surface type and derived from the PREFIRE-SAT1 instrument. Its primary purpose is to identify emissivity behaviors by surface type and to assimilate these measurements into climate models to improve predictions of future polar and global climates.
Polar regions are the focus of this dataset, which provides surface emissivity values retrieved from the PREFIRE CubeSat mission's Thermal Infrared Spectrometer. Data are processed into Cloud-Optimized GeoTIFF images with a nominal spatial resolution of 2.23 km, representing mean values for clear-sky conditions from a specific spectral channel centered at approximately 18.2 µm. This information aims to fill knowledge gaps in the far-infrared portion of the Earth's energy budget, particularly for polar emissions.
PREFIRE_SAT2_2B-ATM_COG provides retrieved column water vapor (CWV) values derived from the PREFIRE Thermal Infrared Spectrometer aboard PREFIRE-SAT2. The data are rendered as Cloud-Optimized GeoTIFF files on a global grid with approximately 2.23 km raster elements, specifically for clear-sky conditions. This dataset aims to fill knowledge gaps in the polar radiant energy budget by measuring far-infrared radiation not previously characterized on a large scale.
Science data retrieval started July 24, 2024 from the PREFIRE-SAT1 CubeSat. This dataset provides georeferenced, cloud-optimized GeoTIFF (COG) renderings of retrieved column water vapor (CWV) values for clear-sky conditions, derived from far-infrared spectral radiance measurements. The data aims to fill knowledge gaps in the polar radiant energy budget by characterizing far-infrared emissions, with results intended for assimilation into global circulation models.
63 spectral channels from 5 to 53 µm measure far-infrared radiation from the polar regions, a previously under-sampled part of the energy spectrum. This dataset provides monthly, 1°x1° gridded climatologies of surface emissivity, sorted by surface type and including standard deviations, derived from the PREFIRE-SAT2 CubeSat. Its primary purpose is to identify emissivity behaviors and assimilate data into climate models to improve predictions.
32 ferrets were inoculated with a lethal dose of Ebola virus and treated with a monoclonal antibody cocktail. The dataset likely contains outcomes for 14 survivors, 8 acute fatalities, and 10 atypical recrudescence fatalities between 12 and 18 days post-infection. The data was authored by Wenguang Cao and last updated on 2026-05-04.
Immunoregulatory proteins from SARS-CoV-2 interfere with host antiviral defenses, playing a critical role in COVID 19 pathogenesis. This dataset contains candidate proteins identified through immunoprecipitation-mass spectrometry (IP-MS) as interacting with the viral nonstructural protein 1 (Nsp1). The work provides mechanistic insights into how Nsp1 suppresses the NF-κB signaling pathway to evade host immunity.
A 791.5 KB XLSX file containing a gene-pathway weight matrix derived from the PROGENy method. The dataset was created by Han-cheng Wei and last updated on 2026-05 04, with the goal of quantifying how SARS-CoV-2 proteins perturb host immune pathways. It supports a framework for identifying viral immunoregulators and provides insights into immune evasion mechanisms.
A dataset containing NIH DAVID pathway analysis results from a study on Staphylococcus aureus metabolic adaptation. The data, authored by Reginald A. Woods and shared under a CC-BY-4.0 license, explores the link between the lipoic acid transfer enzyme LipL and the phosphotransacetylase (Pta) pathway in bacterial infection. It was last updated on May 4, 2026.
A Hindi text-to-speech dataset containing 332,247 audio segments totaling 599.91 hours. The audio was collected from 378 YouTube videos using auto-generated closed captions for transcription. The dataset was created by author somu9 and last updated on 2026-05-29.
The Great Cumbung Swamp is the terminus of the low-gradient Lachlan River in eastern Australia. The dataset, provided by the Australian Ocean Data Network, describes three distinct depositional environments within the swamp, including a channel up to 40 meters wide and overflow channels up to 20 meters wide. It was last updated on 2026-04-16.
One Excel file with 7 spreadsheets provides data for a systematic analysis of compliance system automation. The data includes meta-information, raw requirement extracts, and analyses categorized by automation stage, regulatory field, application domain, and ML usage. The dataset was uploaded by Angermeir on figshare and last updated on 2026-05-18.
23 project areas listed in an anticipatory notice from Opticomm Pty Ltd, given on 15 October 2024. The dataset is published by the Australian Communications and Media Authority and was last updated on 14 April 2026. It includes project names, addresses, IDs, and declared statuses for sites across Queensland, New South Wales, Victoria, and Western Australia.
Heavy mineral deposits occur along beaches from Ballina to Tweed Heads in northern New South Wales, Australia. The Australian Ocean Data Network provides this report, last updated in 2026, which details mineral occurrence, formation, composition, and origin. It also notes commercial accumulations extending south to Coff's Harbour and north to Southport in Queensland.
Zhenghui Wang's anonymized dataset supports the PLOS ONE article "Research on Network Public Opinion Dissemination Mechanism of Emergency Events Based on System Dynamics." It contains raw expert scoring results used to calculate variable coefficients for the system dynamics model. The dataset is 22.8 KB in size and was last updated on May 11, 2026.
Newcastle City Council provides a list of funerals it arranged under a statutory duty for deceased individuals with no assets or next of kin. The data, covering cases up to December 2016, details basic funerals where the council does not pay for notices, flowers, or transport. This record offers insight into municipal responsibilities for destitute deaths and the associated funeral practices.
Surficial Geology of Alberta: Ungeneralized Digital Mosaic is a GIS compilation of existing and new surficial map data for the province. The Alberta Geological Survey created it by tiling multiple source maps into a single provincial layer with a standard attribute table. It includes previously unpublished data from a 1:1,000,000-scale mapping project.
DuplexConv is a 2,000-hour Chinese multi-channel conversational speech dataset with annotations assisted by large language models. It was released by ASLP@NPU and QualiaLabs as part of the SmoothConv & DuplexConv project. The dataset is intended for large-scale training, with its companion dataset, SmoothConv, serving for fine-grained evaluation.