Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,003 datasets
Menahem Blondheim's book analyzes the development of telegraphic news wire services in the United States from 1844 to 1897. The work reconstructs the history of the New York Associated Press and Western Union, using a wide-ranging body of primary sources, many previously untapped. It examines the effect of technology on news concepts and the emergence of private sector monopolies.
War Stories: the Causes and Consequences of Public Views of War is a text-based dataset from the paperswithcode platform. The description suggests it is a book or academic work by Matthew Baum, structured into chapters analyzing news, elite rhetoric, media bias, and public opinion regarding war and foreign policy. The specific data format, size, and row count are not provided.
Paul Fussell's book is a seminal work in First World War studies. It analyzes the conflict's impact on cultural history and memory through literary detail and emotional argument. The work is sourced from the paperswithcode platform.
Edward W. Said's 'Culture and Imperialism' analyzes the roots and cultural impact of 19th-century European imperialism. The work examines literature from Jane Austen to Salman Rushdie and media coverage of events like the Gulf War to trace the embedded Western view of the East. It is a foundational text in postcolonial studies, following the author's earlier work 'Orientalism'.
Movie_recommender is a dataset hosted on Kaggle. Its title suggests it contains data for building or evaluating movie recommendation systems. The dataset's specific content, size, and authorship are unknown from the provided metadata.
A similarity matrix for movie recommendations, likely derived from user ratings or interactions. The dataset is hosted on Kaggle, a platform for data science competitions and projects. Its specific creation date and author are unknown.
Pinnacle AH CLV Dataset - Sports Alpha Stream is a dataset hosted on Kaggle. The title suggests it contains data related to sports analytics and customer lifetime value calculations. The dataset's specific content, size, and origin are unknown from the provided metadata.
BBCNewsNepali is a collection of news content from the BBC's Nepali language service. The dataset is hosted on Kaggle, but its specific size, date range, and structure are not detailed in the available metadata. The original publisher is likely the BBC, though the specific author and compilation method are unknown.
Amazon's PASS dataset contains automatically generated summaries for product reviews. The summaries were produced by the Perturb-and-Select Summarizer (PASS) method for 32 products sourced from the FewSum dataset. The dataset is hosted on AWS Open Data and was created by Amazon.
review-chekpoints--2026-05-26--13265-13265 is a dataset hosted on Kaggle. The title suggests it likely contains evaluation metrics or logs for machine learning model checkpoints. The dataset's specific content, size, and origin require verification after download.
Mumbai's informal settlements are the focus of this factorial field experiment by Anjali Thomas (2026) regarding municipal water access. The data records outcomes from bureaucratic facilitation drives and bottom-up political coordination campaigns targeting elected officials.
A dataset of facilities designed for sports within New York City parks, managed by the Citywide Event Management System for booking permits. It includes locations available for permitting and those designated for sports but not available for permitting. The data uses the NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet projection with lengths in feet and areas in square feet.
Records from the City of New York track complaints, inspections, and reinspections of news racks in public spaces. The dataset reports on the results of these inspections and was last updated on March 8, 2026. Data is provided in multiple formats including XML, RDF, JSON, and CSV.
Montgomery County, Maryland, maintains a registry of businesses licensed to repair radios, televisions, and small appliances. The dataset includes business names and registration numbers and is updated monthly. The data is provided by Montgomery County of Maryland and was last updated on March 8, —.
June 1975 to August 1976 data from the Tropospheric Wind Earth Radio Location Experiment (TWERLE) constant-level balloons over the Southern Hemisphere. The balloon measurements were taken at pressures near 150mb and include temperature and wind. The dataset was last updated on 1976-08-11.
Two collections of economy-related X (Twitter) posts span 2007-2020 and 2021-2023, with LLM-generated analyses for the latter period. The data, curated using targeted keywords, supports research into macroeconomic narratives. The pre-pandemic dataset provides tweet IDs, while the post-2021 data includes corresponding LLM analyses.
A collection of Reddit posts focusing on discussions about risks related to software updates. The dataset is hosted on Kaggle, but details on its size, author, and creation date are unavailable. The content likely consists of user-generated text from the Reddit platform.
Trending movies over the years is a dataset from Kaggle. The title suggests it contains information about movie popularity across different time periods. Specific details on data volume, source, and collection date are not provided in the available metadata.
TMDb Top Rated Movies Dataset is a collection of movie information and ratings from The Movie Database. It is hosted on Kaggle, but the specific number of records, update date, and data collection methodology are not provided in the available metadata. The dataset likely contains details on films that have received high audience or critic scores.
Environmental Information Data Centre provides dispersion model files estimating Aspergillus fumigatus concentrations near outdoor composting facilities in England. The data covers a ten-year period from 2005 to 2014, generated using the ADMS 5 model for sites within 4km of composting locations.