Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,062 datasets
The San Francisco Flood Aftermath (SFFA) dataset is a challenging query set of social media imagery from 2018 to 2024, featuring adverse conditions like nighttime and heavy rain. The Global Flood Aftermath (GFA) dataset contains 1,283 samples for training and validation, annotated with multi-axis severity labels. Author Zihua Zhu published this 1.7 GB collection on figshare under a CC-BY-4.0 license, last updated in April 2026.
World Bank data on energy production, use, dependency, and efficiency for Hong Kong SAR, China, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset addresses trends in energy use and sustainability in the context of economic growth and industrialization. It was last updated on 2026-04-28.
A research document details a study on the transgenerational effects of the artificial sweetener erythritol on ovarian health in Wistar rats. The study, authored by Amina Fallata and published on figshare in 2026, administered erythritol to pregnant rats and analyzed ovarian morphology and molecular markers in their F1 and F2 offspring. Results indicate disrupted folliculogenesis, elevated oxidative stress, and suppressed autophagy and PI3K signaling across generations.
European Union data on registered vehicles, including PPPs (likely Passenger and Private Vehicles), as of April 1, 2026. The dataset provides breakdowns by brand, type, fuel, age, region, ecological category, and technical specifications for the first quarter of 2026. It was compiled by the Data Department of a State e-Government Agency.
Data Department - State e-Government Agency provides a dataset on registered vehicles as of January 1, 2026, with breakdowns by brand, type, fuel, age, and region. The description suggests it includes 22 distinct tables or views covering fleet composition, new registrations, terminations, and technical specifications for 2025. It also contains data on vehicle characteristics like mass, engine power, and ecological categories.
3.7-5.4 meter resolution maps of total suspended solids concentration in water surfaces across the Atchafalaya and Terrebonne Basins in Louisiana. The dataset was developed by NASA using AVIRIS-NG airborne spectrometer data from 2015 and 2016, validated with coincident field measurements. It provides spatially explicit estimates of concentration in milligrams per liter.
August 1 to September 30, 2000 data provides daily averaged tropospheric carbon monoxide concentrations over southern Africa. The dataset is a Level-3 subset from NASA's MOPITT instrument on the Terra satellite, processed onto a 1x1-degree grid. It reports concentrations at two pressure heights, 700 hPa and 350 hPa, derived from daytime satellite swaths.
415,090 line-kilometres of radiometric data were acquired in 2024 by the WA Government to produce this grid. The dataset shows the total equivalent air-absorbed dose rate for the Narryer survey area, derived from potassium, uranium, and thorium measurements. Processed and quality-checked by Geoscience Australia geophysicists, the grid has a cell size of approximately 20 meters.
World Bank data on energy production, use, dependency, and efficiency for France. The data is compiled by the World Bank from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset was last updated on 2026-04-28.
DeNovoSWE is a large-scale dataset for whole-repository generation introduced by AweAI-Team. It contains 4,818 high-quality instances derived from 11,000 filtered trajectories. The dataset was last updated on June 14, 2026.
Geoscience Australia Data provides a processed radiometric grid showing the equivalent air-absorbed dose rate for the Narryer survey area. The grid has a cell size of approximately 20 meters and is derived from 415,090 line-kilometres of airborne gamma-ray spectrometric data acquired in 2024 by the WA Government. The data is processed with NASVD and standard reductions to isolate the terrestrial radiation component from potassium, uranium, and thorium decay.
1,310 Web of Science publications form a bibliometric corpus delineating China/Chinese Studies research in Australia. Ziqing Huang compiled this dataset in 2026, starting from 96 manually verified ARC Discovery projects and expanding via reference analysis. The collection includes raw records, processed files, and VosViewer network analysis outputs.
A geospatial grid of equivalent air-absorbed dose rate derived from gamma-ray spectrometric data for the Narryer survey area. The grid has a cell size of approximately 20 meters and was produced from 415,090 line-kilometres of data acquired in 2024 by the WA Government. Geoscience Australia processed and quality-checked the data, applying NASVD and standard reductions to isolate the terrestrial radiation signal.
Yuhui Miao published a dataset on 2026-05-05 containing structural data related to the discovery of novel covalent inhibitors for TEAD proteins. The dataset, 401.9 KB in size, includes CIF files likely detailing the biochemical and structural studies of the inhibitor LC-TEAD01. This work integrated structure-based design with parallel synthesis and in situ screening.
World Bank data on energy production, use, dependency, and efficiency for Germany, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset is licensed under CC-BY-4.0 and was last updated on 2026-04-28. It addresses trends in energy use and sustainability within the context of economic growth and industrialization.
15 land cover classes, including building construction, vegetation, and water bodies, are defined for North Rhine-Westphalia. The dataset is derived from a combined analysis of Sentinel-2 satellite data and digital orthophotos, supplemented by ALKIS data and a normalized digital surface model. It is provided by the Bundesamt fΓΌr Kartographie und GeodΓ€sie.
Ryan V. Quiroz published a dataset on figshare in 2026 describing a structure-activity relationship (SAR) campaign for novel camptothecin-derived linker-payloads. The dataset, 2.6 KB in size, likely contains design parameters and performance metrics for antibody-drug conjugates (ADCs). The most promising candidates reportedly achieved low aggregation (<5%), stability, and significant tumor regression in vivo.
Approximately 61 degrees North to 61 degrees South, this dataset provides calibrated surface-flagged sigma-0 (radar backscatter) and attenuation corrections in 12.5 km Wind Vector Cells. It is the Version 2.0 Level 2A science data record from the ISS-mounted RapidScat instrument, a Ku-band scatterometer derived from QuikSCAT hardware. Data are stored in single-orbit HDF-4 files with a list structure to accommodate the instrument's variable circular scan pattern.
ISS-RapidScat Level 2A Version 2.0 provides calibrated surface-flagged radar backscatter (sigma-0) and attenuation corrections in 25km Wind Vector Cells. The data are derived from a Ku-band scatterometer mounted on the International Space Station, offering a complete historical reprocessing for consistent calibration across the instrument's operational period. Coverage is limited to latitudes between approximately 61 degrees North and South due to the ISS's non-sun-synchronous, low-inclination orbit.
World Bank Group data on China's energy and mining sector, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset covers topics such as energy production, use, dependency, and efficiency. It was last updated on 2026-04-27.