Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,956 datasets
OnlineNewsPopularity is a tabular regression dataset from the UCI repository summarizing 61 features for articles published by Mashable over a two-year period. The goal is to predict the number of social media shares, with features including word counts, link counts, and content channel indicators. The dataset was created by researchers from INESC TEC and Universidade do Minho for a 2015 conference on artificial intelligence.
National-Hockey-League-Interviews is a text dataset of interview transcripts scraped from ASAPSports. The data likely contains responses from players, coaches, and other officials during Stanley Cup Finals, with text cleaned to include only interviewee speech. The dataset was released under a CC0-1.0 license.
Aryaman Chaudhury's dataset contains environmental and health measurements for sports fields in New York City. It includes data on heat vulnerability, pollution levels, surface temperatures, tree canopy coverage, traffic density, and asthma cases. The dataset is available as an Excel file under a CC-BY-4.0 license.
Profiles collected roughly twice daily over months to years by instruments crawling a wire from ice-bound buoys. The dataset contains repeated vertical profiles of ocean temperature, salinity, and pressure from about 7 to 750 meters depth, with oxygen and velocity data for some instruments. ITPs have been deployed across the Arctic Ocean, including near the North Pole, with a vertical resolution of 1 meter.
Top Rated movies is a dataset published on Kaggle. Its specific content, size, and origin are not detailed in the provided metadata. The dataset likely contains information about films that have received high audience or critic scores.
Analytics data for the 'Lets Heal Already' podcast. The dataset is hosted on Kaggle, but specific details about its contents, size, and creation are not provided in the available metadata. The data likely contains metrics related to podcast performance and audience engagement.
Orbital images from the Nimbus-1 satellite's High-Resolution Infrared Radiometer (HRIR) instrument, showing nighttime brightness temperature values. The data was collected by NASA from August 28, 1964, through September 22, 1964, and contains scanned negatives of 70mm film strips saved as JPEG 2000 files. Each image is gridded with geographic coordinates and covers a swath from the north to the south pole.
The OISO-27 cruise collected continuous sea surface measurements from January 5 to February 7, 2017. The dataset includes partial pressure of CO2 (pCO2), temperature, salinity, and likely associated parameters like dissolved inorganic carbon, nutrients, and chlorophyll-a. These measurements are part of the long-term OISO program, initiated in 1998, and contribute to international carbon synthesis efforts like SOCAT and GLODAP.
Kenya Compressed Power Data contains raw mains voltage, current, and frequency measurements collected by vaccine refrigerators in health facilities. The data was collected from 2017 to 2025. Author Tom Kreyche and the organization Kenya are associated with this dataset.
Kaggle hosts a dataset titled 'resume-reviewer-sft-dataset' intended for text generation tasks. The platform tags indicate it is likely used for natural language processing, specifically for resume review via supervised fine-tuning. The dataset's author, organization, size, and specific content details are not provided in the available metadata.
11,448 records of Conductivity, Temperature, Depth (CTD), transmissivity, and fluorescence were collected from 15 casts during a 1993 research cruise. The data was gathered from the ship GYRE as part of the Texas Institutions Gulf Ecosystem Research (TIGER) project. Mr. P.V. Pittman of Texas A&M University submitted the data to NOAA's National Centers for Environmental Information.
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. It includes customer reviews with ratings and text, along with item metadata such as descriptions, categories, price, brand, and images. The dataset was uploaded to Hugging Face by sparsh3011 and features reviews up to September 2023.
Supplementary data from a high-pressure melting study of FeH₀.₀₈ includes melting temperatures for experimental groups, parameters for solidus line fitting, and hydrogen content calculated from unit cell volume collisions. The dataset is structured in an Excel file and is openly available under a CC-BY-4.0 license. It provides quantitative results for modeling the behavior of iron-hydrogen alloys under extreme conditions.
Hourly oceanographic and meteorological data collected from a buoy at 43.0853°N, 70.8639°W in the Gulf of Maine near Portsmouth, New Hampshire. The dataset includes variables such as air temperature, wind speed, salinity, dissolved oxygen, and chlorophyll, with observations compared to a 13-year historical record. Data collection is managed by SCIOPS, with the buoy seasonally recovered in December and redeployed each spring.
Estimating Water Storage Capacity of Existing and Potentially Restorable Wetland Depressions in a Subbasin of the Red River of the North is a NASA Earthdata study from CEOS_EXTRA. It develops models to estimate and spatially depict wetland storage volumes and interception areas in the upper Mustinka subbasin. The study simulates water storage increases from restoring farmed and drained wetlands under various land use and climatic scenarios.
Lu Sun compiled a novel dataset on political protest movies from 2000 to 2018 to study their influence on anti-government demonstrations in autocratic countries. The research analyzes how widely watched imported protest movies impact demonstrations through imitation, coordination, and value transmission. The dataset was published via the International Studies Quarterly Dataverse.
Vinewsqa is a text dataset published on Kaggle. Its title suggests it likely contains news articles paired with questions and answers. The specific content, size, and origin are unknown from the provided metadata.
Flood Data Notes Polygons describe the expected reliability of flood data, indicating specific areas where the City of Moreton Bay Council assesses limited data reliability and the reasons for it. The dataset stores spatial definitions of model areas with known lower reliability or subject to review, authored by moretonbaygis and last updated in March 2026.
2026-03-20 updated spatial dataset from the City of Moreton Bay's Data Hub. It describes the expected reliability of flood data, indicating specific areas where the data has limited reliability and the reasons for this limitation. The dataset stores spatial definitions of model areas with known lower data reliability or subject to review.
Site locations and descriptions for micrometeorological data collection in Ash Meadows and Oasis Valley, Nevada. The U.S. Geological Survey created this dataset in cooperation with the U.S. Department of Energy. Data collection spans from December 1993 through 2001.