Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,363 datasets
Uncivil Reddit is a text dataset from the figshare platform, published under a CC-BY-4.0 license. The dataset is 242.2 MB in size and is available in CSV and R file formats. It was last updated on May 4, 2026, by an author listed as Anonymous Anon.
A geospatial dataset mapping cultural heritage assets within the flood risk zone (TRI) of Bordeaux, Nouvelle-Aquitaine, France. It was produced by the Bureau de Recherches Géologiques et Minières (BRGM) under the European Flood Directive. The data likely contains surface-based features representing assets potentially affected by overflow flooding.
City of Greater Geelong provides a geospatial dataset documenting mosquito spray treatments at major breeding sites. The data tracks interventions aligned with the breeding cycles of three primary mosquito species in the region. It was last updated in April 2026 by the City of Greater Geelong.
Mosquito spray data details the treatment of major breeding sites in line with the breeding cycles of three main mosquito species in the Geelong area. The dataset is created and maintained by the City of Greater Geelong, with a last recorded update in April 2026. It is provided in multiple geospatial formats including GeoJSON, SHP, WFS, and WMS.
Ballarat's public artwork inventory includes monuments, statues, fountains, plaques, and other cultural features. The City of Ballarat maintains this geospatial dataset, last updated in April 2026. Attributes cover condition, feature type, location, and maintaining authority.
Point data identifies artwork locations within Ballarat, including monuments, statues, fountains, plaques, and memorials. The dataset is maintained by the City of Ballarat and was last updated in April 2026. Features include attributes for condition, maintaining authority, suburb, and ward.
forreview43's companion dataset supports an anonymous NeurIPS submission on AI-versus-human rubric evaluation. The dataset pairs with a separate code repository to reproduce every headline number from the paper without re-running API-backed stages. The dataset was last updated on May 8, 2026.
Paired-samples t-test results on Past simple tense forms of habitual expression usage. The 5.5 KB Excel file was authored by Aman Matebie Dagnaw and last updated on April 28, 2026. It is licensed for reuse under CC-BY-4.0.
21.5 KB of data on pressure indicators and physicochemical variables for lagoons in West Africa, shared by Metogbe Belfrid Djihouessi on figshare. The dataset is available in XLS format and was last updated on 2026-04-28.
1.2 MB of gene expression data in an XLSX file, listing differentially expressed genes across all cell types in human and chicken limb buds. The dataset was authored by Ruohan Zhao and last updated on April 28, 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
Relative expression (RELATEXP) data measured by quantitative PCR (qPCR) for all volunteer samples. The dataset was authored by Anna Bachmann and last updated on April 28, 2026. It is a 39.8 KB XLSX file shared under a CC-BY-4.0 license on figshare.
Data and code for a study on carryover effects, provided for private peer review. The dataset is authored by Elise Keister and is scheduled for public release upon publication. The file is 44.7 KB in size and was last updated on April 28, 2026.
MMO1097 displays the anthropogenic source values used in the modelling of underwater noise. The dataset originates from the Government Digital Service and is available via the eu_open_data platform. It is provided under the UK Open Government Licence.
Data from the Wabamun Lake area in Alberta, Canada, was assembled for a common test problem to assess models simulating CO2 injection into the subsurface. The Alberta Geological Survey provided this information, which includes stratigraphy, rock properties, formation pressure, and well completion details collected by the petroleum industry. The dataset was last updated in March 2026.
Experimental data evaluating the performance of Coriolis flowmeters for dense-phase CO2 transport in pipelines. The data was generated by the British Geological Survey using a mass flow-rig with gravimetric calibration, testing pure CO2 and various impurity mixtures. Recorded variables include mass, volume flow rate, pressure, temperature, velocity, and density for each test fluid.
Wei Wang's systematic review analyzes 13 studies on the cost-effectiveness of degarelix and LHRH agonists for prostate cancer treatment. The review, covering literature up to December 2025, compares drugs like leuprorelin, goserelin, and triptorelin. It includes methodological details such as model frameworks, parameters, and uncertainty analyses from the selected studies.
EMCompress is the first benchmark for evaluating Endomorphic Multimodal Compression (EMC). The dataset was released by LordUky in May 2026 and includes reproduction code. The associated paper was accepted to ACL 2026 Findings.
A conversational music-recommendation corpus mined from Reddit, with each recommended item resolved to a Deezer track or album. The dataset includes raw Reddit text and LLM-paraphrased augmentations, along with corresponding audio embeddings. It was created by McAuley-Lab and last updated on 2026-05-12.
Supplemental tables for a 2026 manuscript on cardiovascular responses in long-term breast cancer survivors. The dataset includes two tables with echocardiographic and hemodynamic parameters, published under a CC-BY-4.0 license by author JOÃO IZAIAS. The file is 222.6 KB in size and was last updated on April 13, 2026.
A snapshot dataset from 2026 contains performance metrics for 100 real-world websites, split evenly between WordPress and WooCommerce platforms. It includes mobile and desktop PageSpeed scores, Core Web Vitals (LCP, CLS, INP), page size, request counts, and JavaScript and image sizes. The sites are categorized by traffic volume, with 30 low, 40 mid, and 30 high-traffic sites.