Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,560 datasets
9.5 KB of assessment indicators for morphological and functional polycentricity, authored by Juan Zhu and shared under a CC-BY-4.0 license. The dataset was last updated on June 1, 2026, and is available in XLS format.
A list of Differentially Expressed Proteins (DEPs) from the model organism Ciona whose human orthologues are implicated in neurodegenerative diseases. The dataset was authored by Daniele Capitanio and last updated on June 1, 2026. It is a small dataset, 5.5 KB in size, stored in an XLS file format.
A scientific description of the Great Cumbung Swamp, the terminus of Australia's low-gradient Lachlan River. The dataset, sourced from the Australian Ocean Data Network, details the swamp's three depositional environments, channel morphology, and sediment characteristics. It was last updated on 2026-04-10.
Revenue data generated for management bodies from the Hanifaru Marine Protected Area (MPAR). The dataset was authored by Hannah M. Moloney and last updated on June 1, 2026. It is a small file of 5.5 KB available in XLS format under a CC-BY-4.0 license.
200 held-out multilingual soundscapes provide exact, automatically-gradable answer keys for evaluating universal audio annotation. This benchmark is designed for the LAION Universal Audio Annotation Pipeline (UAAP), which describes everything audible in a clip as a structured JSON list. Each clip is built by gluing together pieces, as noted in the full description.
ML2SO2_NRT is the EOS Aura Microwave Limb Sounder Near-Real-Time product for sulfur dioxide (SO2) profiles, produced by NASA. The data are available within 3 hours of observation, cover near-global latitudes (-82 to +82 degrees), and provide vertical profiles from 215 to 10 hPa. The most recent 7 days of data are available online.
150,000 unfiltered samples for training models to generate concise titles from a user's first message in a conversation. Created by SupraLabs, this dataset is derived from the training pipeline for their experimental Supra Title model family. The dataset was last updated on June 14, 2026.
Near-global (-82 to +82 degrees latitude) carbon monoxide (CO) profiles measured by the Aura satellite's Microwave Limb Sounder (MLS). The National Aeronautics and Space Administration provides this near-real-time data, typically available within 3 hours of observation, covering the most recent 7 days. Profiles are derived from the 240 GHz region, spaced 1.5 degrees along the orbit track, and cover vertical pressures from 215 to 0.1 hPa.
Near-real-time nitric acid (HNO3) profiles derived from the EOS Aura Microwave Limb Sounder (MLS) satellite, typically available within 3 hours of observation. The data provides near-global spatial coverage from -82 to +82 degrees latitude and vertical coverage from 100 to 1.47 hPa, with profiles spaced ~165 km along the orbit track. This product is generated by NASA using a simplified algorithm to meet latency requirements and is scientifically useful when screened according to provided documentation.
Near-real-time temperature profiles from the Aura satellite's Microwave Limb Sounder, available within 3 hours of observation. NASA provides this data with near-global coverage from -82 to +82 degrees latitude and vertical coverage from 215 to 0.001 hPa. The algorithm uses a simplified fast forward model to meet latency requirements, making it scientifically useful in selected atmospheric regions when screened appropriately.
Near-real-time nitrous oxide profiles derived from the EOS Aura Microwave Limb Sounder satellite, typically available within 3 hours of observation. The National Aeronautics and Space Administration provides near-global coverage from -82 to +82 degrees latitude, with data from the most recent 7 days online. Profiles are spaced 1.5 degrees along the orbit track and cover the vertical range from 100 to 1 hPa.
ML2H2O_NRT is the EOS Aura Microwave Limb Sounder (MLS) Near-Real-Time product for water vapor (H2O) profiles derived from the 190 GHz region. Data are typically available within 3 hours of observation, with near-global spatial coverage from -82 to +82 degrees latitude and vertical coverage from 147 to 1 hPa. The National Aeronautics and Space Administration (NASA) produces this data, with the most recent 7 days available online as of the last update in March 2026.
613,399 No-Limit Hold'em hands in Open Hand History format (spec 1.4.7) were generated via self-play using rs-poker's arena. The dataset, created by otter-crew, serves as the training set for the range-reader model, which predicts a villain's hole cards from the action. It was last updated on June 15, 2026.
1,200 complex instruction-response pairs were autonomously generated by a local LLM via Ollama. This bilingual dataset in English and Indonesian is designed for training Agentic AI systems. The pipeline, created by Kimsang766, was last updated on June 15, 2026.
115 countries in the decade before, and 119 countries in the decade after the adoption of SDG 16.9 are covered by this dataset. V.N. Tran created this data to measure the latent robustness of Civil Registration and Vital Statistics systems. It includes estimated scores based on birth and death registration coverage, completeness, national ID card coverage, age heaping, and statistical capacity.
An investigation details the partial conversion of rock phosphate into more soluble calcium phosphate phases using acetic acid and monocalcium phosphate solutions. The study, authored by Youness Sedki Alaoui and last updated in April 2026, analyzes the resulting mineral phases using FTIR, XRD, XRF, SEM, and EDS. Results indicate the formation of phases like calcium-deficient hydroxyapatite and dicalcium phosphate, with these more soluble minerals constituting up to 27% of the product.
Meteorological data collected at the MarineGEO Carrie Bow Cay Observatory in Belize. The 24.5 MB dataset contains measurements from WXT536, PTB110, and LI-190R instruments beginning in September 2023, authored by Valerie Paul and last updated in May 2026. Data files are available in CSV and TXT formats.
Meteorological data collected at the Smithsonian Marine Station in Fort Pierce, Florida, from 2023 to 2025. The 23.0 MB dataset includes measurements from WXT536, PTB110, and LI-190R instrumentation at coordinates 27.460121 N, 80.31134 W. It was authored by Dean Janiak and last updated in May 2026.
Thomas A. Russell's study compares the ReFeel® bioresorbable nerve cuff to commercial collagen-based devices in rat sciatic nerve models. The dataset includes functional, histological, immunohistochemical, and morphometric assessments at multiple time points up to 26 weeks. It was last updated on 2026-04-23 and is shared under a CC-BY-4.0 license.
Thomas A. Russell published a study on 2026-04-23 comparing the ReFeel® bioresorbable nerve cuff to commercial collagen-based devices in rat sciatic nerve models. The document describes functional, histological, and morphometric assessments conducted at multiple time points up to 26 weeks. It quantifies outcomes such as scaffold degradation, axon regeneration, Schwann cell activity, and fibrosis.