Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,037 datasets
Replication data for a forthcoming article in the Review of Economics and Statistics, authored by Bas Sanders. The dataset likely contains variables used to analyze measurement error and counterfactual scenarios in quantitative trade and spatial economic models. It was last updated on March 18, 2026.
DopplerWild-Preview is a dataset hosted on Kaggle. The title suggests it likely contains audio recordings related to wildlife, possibly using Doppler-based sensing. The dataset's specific content, size, and origin are not detailed in the available metadata.
Restaurant_Reviews is a dataset hosted on Kaggle. The dataset likely contains textual feedback from customers, potentially with associated ratings or labels. Its specific size, origin, and update history are not detailed in the available metadata.
Over 42,600 public figure accounts from India and the United States are represented in this collection of tweets. The dataset includes politicians, celebrities, news media, and influencers, compiled by author Anmol Panda. It provides text data for analyzing political and social discourse across two major democracies.
A collection of over 600 romantic and platonic couples sourced from fictional media. The dataset includes characters from Film, Television, Anime, and Literature. Its specific author, license, and update history are not provided in the metadata.
A dataset titled 'depression_dataset' published on the Kaggle platform. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its title suggests it contains information related to depression, likely for analysis or modeling purposes.
Approximately 114,000 user reviews collected from over 150 movies on IMDb. Each movie's reviews are stored in a separate JSON file identified by its IMDb ID. The dataset was created by chaziee and last updated on 2026-01-30.
VQS-4k Random Sample is a dataset posted on Kaggle. The title suggests it is a random sample of data related to NeurIPS conference reviews. The dataset's specific content, size, and structure require verification after download.
PureDocBench v2 Reviewer Sample is a dataset published on Kaggle. The title suggests it is a sample from a benchmark designed for evaluating document quality, likely containing text data for assessment tasks. Metadata is minimal; actual content requires verification after download.
A dataset named 'fakenewsnetPython' is hosted on Kaggle. Its title suggests it likely contains news articles or related metadata for the study of misinformation. The dataset's specific content, size, and origin require verification after download.
Libraries review data covers three fiscal years from 2014/15 to 2016/17 for each library in Calderdale. It includes metrics on visits, building costs, and running costs, compiled by the Calderdale Metropolitan Borough Council.
A collection of entertainment industry data from The Movie Database (TMDB). The dataset includes over 22,000 movies, 16,000 TV shows, 58,000 people, and 25,000 reviews. The original author, organization, and license are unknown.
Data collected by the NOAA ship New Horizon in October 1988 likely contains measurements of water pressure and other oceanographic properties. The dataset's columns suggest it is a time-series of in-situ observations. Metadata is minimal; actual content requires verification after download.
Dummy Movies Dataset For Practice is a collection of movie metadata intended for data cleaning and exploration practice. The dataset is hosted on Kaggle, but its author, organization, and specific creation details are unknown. The number of rows, file formats, and license information are also unspecified.
NOAA_NCEI provides pressure and water data collected from the vessel SHU GUANG 06 over a three-day period in July 1980. Columns suggest this dataset likely contains oceanographic measurements, potentially including depth or salinity readings. Its presence on NASA EarthData indicates it is part of a broader environmental data archive.
MovieLens1M movies' metadata includes genres, cast, and overviews. The dataset is hosted on Kaggle, but details on the number of rows, columns, and specific file formats are not provided. The original author, organization, and last update date are unknown.
Kaggle hosts a deduplicated corpus of public pull requests related to video compression. The raw description indicates the data has been scored, suggesting it may contain metrics or labels for analysis. The dataset's origin, size, and specific content require verification after download.
A dataset titled 'review-chekpoints--2026-04-30--13239-13239' was published on Kaggle. The title suggests it may contain evaluation data or checkpoints related to model reviews. No further metadata, such as column descriptions, sample data, or author information, is available.
A dataset for news optimization containing user behavior, location signals, and engagement metrics. The dataset was sourced from Kaggle, but the author, organization, and specific collection details are unknown. The last update date and temporal coverage are also unspecified.
Twitter Engagement Dataset is a collection of multi-topic Twitter posts intended for engagement and trend analysis. The dataset was sourced from Kaggle, but its author, size, and last update date are unknown.