Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,939 datasets
A replication study finds participants playing a public goods game in Luganda contributed 28.9% more than those playing in Lugisu, matching the original effect within 0.1 percentage points. The research, led by Paul Clist, explores mechanisms by eliciting injunctive norms and empirical expectations for every possible action. The dataset was last updated in March 2026.
Air pressure measurements collected by the Australian National Moorings Network's National Reference Systems sub-facility. The data is part of the Integrated Marine Observing System project, focusing on marine and meteorological observations in Australia. The dataset includes time series data, as indicated by the tags, but specific row and column counts are unknown.
2026 spatial data from the City of Perth GIS Services defines eight combined precinct boundaries. The dataset covers Central City Precincts 1 through 8, including Northbridge, Cultural Centre, and Foreshore areas. It supports urban planning and development analysis under the City of Perth Planning Schemes.
800 bilingual hotel reviews from the Trip.com and Ctrip platforms. The description indicates the sample includes sub-ratings and owner replies. The dataset was uploaded to Kaggle, but the author, organization, and specific collection date are unknown.
An R script for phylogenetic comparative analysis, authored by Martina Francesconi and published on figshare under a CC-BY-4.0 license. The dataset is 24.3 KB in size and was last updated on 2026-04-17. It appears to analyze factors influencing the suppression of play behavior among adult non-human primates.
Appeals filed to review decisions made by the San Francisco Rent Board. The data likely contains records of appeals decided by the Rent Board Commission, including grounds such as substantive, procedural, and hardship appeals. The dataset is provided by the City of San Francisco and was last updated in March 2026.
This dataset contains Reddit sentences scored for similarity to spoken dialogue and written forum communication. It was created for the EMNLP 2025 paper, though the authors note it was not used in the final results. Early experiments showed no significant gains versus smaller C4 and Subtitle training sets.
llm-jp provides a Japan-specific question-answer evaluation dataset created by native Japanese annotators. The dataset collects questions where the correct answer differs between Japan and other countries, such as rules for which side of the road to drive on. It was last updated on 2026-04-17.
Neutral Particle Imager (NPI) data from the Mars Express ASPERA-3 instrument, provided in units of count rate (counts/second). The data covers the period from the spacecraft's launch on June 2, 2003, through to the end of its mission. The dataset is provided by the National Aeronautics and Space Administration.
Australia-focused geoscience research articles compiled by Geoscience Australia. The newsletter includes titles on topics like mineral mapping in the Pilbara Craton, Proterozoic thrusting in the Kimberley, and landscape evolution near Broken Hill. It was last updated on 2026-03 25.
SPICE kernel files for the complete Venus Express mission provide geometric and ancillary information needed to interpret science instrument data. The data includes spacecraft and planetary ephemerides, instrument mounting alignments, spacecraft orientation, event sequences, and time conversion data. This dataset is produced by the National Aeronautics and Space Administration and was last updated in March 2026.
Surface underway chemical and physical data collected during the R/V Meteor research cruise along WOCE Section AR12/AR24 in the North Atlantic Ocean from May 15 to June 8, 1997. The dataset includes measurements of mole fraction of CO2 in equilibrator headspace and dry outside air, barometric pressure, water temperature, sea surface temperature, salinity, and fugacity of CO2 in seawater. These data were collected by Arne Körtzinger of the Institute of Marine Research, Germany, using a carbon dioxide gas analyzer and shower head chamber equilibrator.
A cleaned catalog of movies and TV shows available on the Netflix streaming platform, intended for exploratory data analysis and natural language processing text mining. The dataset's author, organization, and specific size are unknown. Its last update date is also unknown.
A modified version of the MovieLens dataset, created for testing the robustness of recommender systems. The raw description indicates it has been injected with shilling attacks, which are artificial profiles designed to manipulate recommendation outputs. The dataset's author, organization, and specific version details are unknown.
Replication package for the paper 'Fiduciary Duty of Loyalty and Corporate Culture' includes Stata code and pseudo-data files. The pseudo-data preserve the original variable names, data types, and merge keys but contain anonymized, randomly generated values. Author Ming Ju deposited this package in the Review of Corporate Finance Studies Dataverse on May 6, 2026.
38 IEEE papers on movie certification automation, curated with metadata and feature annotations. The dataset appears to be a collection of research paper metadata, likely gathered from IEEE publications. The author, organization, and specific collection date are unknown.
CTD profiles provide high-resolution measurements of temperature, salinity, density, and other parameters from the Bering and Chukchi Seas. Data was collected from the R/V Alpha Helix during the Aleutian Birds HX-172 cruise funded by the NSF Division of Polar Programs, spanning July 9 to August 7, 1993. The dataset was processed by Dr. Chirk Chu of the University of Alaska's Institute of Marine Science and submitted to the National Oceanographic Data Center.
Puget Sound oceanographic data from January to February 1973, collected via CTD and STD casts from the NOAA Ship McARTHUR. The dataset provides high-resolution vertical profiles of temperature, salinity, density, and potentially dissolved oxygen or transmissivity. Data were processed to the NODC standard F022 format, which may include cruise metadata, station positions, and environmental conditions.
Zonal mean hydrogen chloride (HCl) concentrations derived from three satellite instruments: HALOE (1991-2005), ACE-FTS (2004-onward), and Aura MLS (2004-onward). The data includes statistical information like standard deviation and minimum/maximum values, mapped on a vertical pressure grid from 147 to 0.5 hPa. This source dataset is produced by NASA and is distributed in netCDF4 format.
1991 onward, this dataset provides 1-month Level 3 zonal averages of atmospheric water vapor from four satellite instruments. The data, produced by NASA, includes statistical information like standard deviation and min/max values on a vertical pressure grid ranging from 147 to 0.01 hPa. It serves as the source for a separate merged water vapor product.