Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,035 datasets
Xiph.Org Test Media is a collection of uncompressed video files hosted on AWS Open Data. The dataset is provided by Xiph.org, a non-profit organization supporting open multimedia standards. It is intended for research in video compression and video processing algorithms.
Replication Data for the study 'Conspiracy Thinking and Belief in Partisan Conspiracy Theories: A Moderating Effect of Partisan Congruence?' was authored by Omer Yair and submitted for review. The dataset is hosted by Harvard Dataverse and was last updated on March 17, 2026. It is tagged within the Social Sciences domain.
A collection of news articles for the Russian-language competition 'Prediction of News Topics [AI 25/26]'. The dataset is hosted on Kaggle and appears to be designed for a supervised text classification task. The author, organization, and specific data volume are unknown.
New York City film permits detail authorized exclusive use of public property like sidewalks, streets, and parks. The dataset is provided by the City of New York's Mayor's Office of Media and Entertainment (MOME) and was last updated in March 2026. Specific row and column counts are not provided in the input.
MovieLens_32M is a dataset hosted on Kaggle, likely containing user ratings for movies. The title suggests it contains 32 million data points, which is a substantial scale for training models. Its specific contents, such as user and movie identifiers, require verification after download.
A collection of top-rated movies, likely sourced from the TDBM platform. The raw description indicates the dataset spans films from 1896 to 2026. It is hosted on Kaggle, but other metadata such as author, license, and specific column details are not provided.
Deep IoT Cultural Heritage Restoration Data is a dataset from Kaggle concerning the digital conservation of cultural artifacts. The raw description indicates it contains records related to Chinese artifact digital conservation, but specific details on size, format, and structure are unavailable. The dataset's author, organization, and last update date are unknown.
Kaggle hosts a dataset titled 'Netflix movies'. The dataset likely contains information about movies available on the Netflix streaming platform. Metadata is minimal; actual content requires verification after download.
Featuring raw teleoperation data collected using the Agilex Cobot Magic robot, formatted for the Lerobot2.1 framework. It is part of the Great March 100 (GM-100) Project and is authored by rhos-ai.
This dataset documents wage and human capital differences between agriculture and other sectors across 13 countries, including Canada, the U.S., India, and Indonesia. It contains data on average wages, worker education levels, and Mincer returns to education. The data is used to derive implied barriers to labor reallocation out of agriculture.
This dataset supports a cross-sectional study investigating the determinants of financial development across countries. The analysis uses an instrumental variables approach to examine the roles of cultural values, institutional quality, and trade openness.
This dataset supports a cross-sectional study investigating the determinants of financial development across countries. The analysis uses an instrumental variables approach to examine the roles of cultural values, institutional quality, and trade openness.
This dataset analyzes over $1.1 billion in loans provided to 50 U.S. railroads by the Reconstruction Finance Corporation and Public Works Administration between 1932 and 1939. It examines the bailouts' effects on employment, wages, firm debt, bond default, and spillover benefits to nearby manufacturing firms.
Over $1.1 billion in loans provided by the Reconstruction Finance Corporation and Public Works Administration to 50 U.S. railroads between 1932 and 1939. It was created by Gertjan Verdickt to analyze the effects of these bailouts on employment, wages, firm debt, and bond default.
Meta Platforms, Inc. and academic researchers collected data on participant treatment assignment and engagement with civic content on Instagram during the 2020 U.S. election. The dataset focuses on a political ads holdout experiment, measuring exposure to social issues, elections, and political ads for control group participants.
U.S. 2020 Facebook and Instagram Election Study data contains information on participants in a political advertising holdout experiment. It includes treatment assignments, engagement with civic content, and for control group participants, exposure to social issues, elections, and political ads. The study was conducted by Meta Platforms, Inc. in partnership with academic researchers.
Comprising establishment-level data from the U.S. Census of Manufactures, used to study how multi-plant firms allocated resources in response to local economic shocks during the Great Depression. It was authored by Nicolas Ziebarth and last updated in February 2026.
A collection of establishment-level data from the U.S. Census of Manufactures, used to study how multi-plant firms allocated resources during the Great Depression. It was created by Nicolas Ziebarth to analyze the geographic propagation of local economic shocks through firm networks. The specific row count, column count, and file size are not provided in the input.
Featuring digitized data on the expansion of the electric telegraph network in America from 1840 to 1852, used to study its impact on national elections and news coverage. It was created by Tianyi Wang for research analyzing how telegraph access influenced voter turnout and newspaper content.
A source of replication data for a study on the impact of the electric telegraph on national elections in America from 1840 to 1852. It contains newly digitized data on the telegraph network's growth, used in a difference-in-differences analysis to measure effects on voter turnout and newspaper content.