Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,003 datasets
A dataset titled 'news.csv 3' published on Kaggle. The title suggests it contains news-related information, likely in a tabular format. No further details on size, origin, or specific content are available from the provided metadata.
A dataset from a study exploring the NFL's use of Instagram to manage fan relationships. It examines three teams and their application of six relationship cultivation strategies derived from relationship management theory and marketing literature. The dataset was authored by Emily Jones and last updated in March 2026.
A synthetic dataset for incompressible 2D fluids, published on Kaggle. The dataset likely contains simulated data relevant to computational fluid dynamics. Specific details on size, columns, and creation method are unavailable from the provided metadata.
Amazon Reviews likely contains customer feedback text posted on the Amazon marketplace. The dataset is hosted on Kaggle, but its specific size, creation date, and authorship are unknown. Columns and content details require verification after download.
A list of URLs for websites documented as containing fake news by fact-checking organizations. The dataset was compiled by researcher Joakim Jansson and was last updated in April 2026. The specific size and number of sites are not provided.
James Axtell's historical analysis examines the three-sided struggle for empire in colonial North America over a period of nearly 150 years. The work contrasts English and French colonial strategies regarding native allies and religious conversion, framing the conflict as a contest of cultures.
4,785 records from the 2006 Latino National Survey processed by Jessala Grijalva to analyze how acculturation orientations predict political behavior. The data includes four bidimensional categories—Culture Affirming, Assimilationist, Bicultural, and Demicultural—derived through Gaussian Mixture Model clustering. It contains the processed data and R/Quarto code required to replicate analysis of Latino ideology, party identification, and immigration attitudes.
Vicki-Ann Ware at the Australian Institute of Health and Welfare authored a literature review summarizing evidence on the benefits of sports and recreation programs for Aboriginal and Torres Strait Islander communities. The review synthesizes findings from critical program descriptions and systematic reviews, noting improvements in areas like school retention, health, and social cohesion. It also identifies gaps in the literature regarding causal links, barriers to participation, and program suitability for different demographics.
Kaggle dataset titled 'review-chekpoints--2026-05-25--13264-13264'. The title suggests it contains review-related data, possibly involving checkpoints. The dataset is hosted on Kaggle, but no further metadata is available.
A structured collection of Turkish-language technology news articles sourced from the ShiftDelete.Net website. The dataset is intended for natural language processing, machine learning, and artificial intelligence research. The author, organization, and specific scale of the collection are unknown.
ahmed_asya_podcast is a dataset published on Kaggle. The title suggests it contains audio or transcript data from a podcast series. The dataset's specific contents, size, and origin are not detailed in the available metadata.
Seven classical tolerance experiments conducted in Norway, Sweden, and Germany between 2020 and 2022 by Lise Bjånesøy. The data captures non-Muslim responses to Muslims exercising freedom of assembly to preach conservative religious values. It includes survey responses and background characteristics alongside Stata replication code.
City of Los Angeles Department of Building and Safety provides data on permits issued for construction, remodeling, and repair. Permits are categorized into building, electrical, and mechanical types, with issuance processes ranging from same-day Express Permits to those requiring plan review. The dataset includes records from 2020 to the present.
BakeAI's preview dataset contains 50 challenging university-level mathematics reasoning problems. Each problem includes a detailed reference solution, a structured grading rubric, and an anonymized model evaluation result.
A multimodal dataset likely containing news images paired with sentiment labels. The description suggests it is designed for exploring whether models can interpret narratives from visual content alone. The dataset originates from Kaggle, but its size, author, and specific creation details are unknown.
A text dataset likely containing content from Twitter, BBC articles, and the 20 Newsgroups corpus for topic classification tasks. It was published on Kaggle, but the author, organization, and specific collection details are unknown. The original creation date and last update are not provided.
10 diverse examples demonstrate 'generative collapse' and 'Cultural Hallucination' in frontier base models evaluating Yoruba proverbs. The dataset was created by author 'saaga' and last updated on February 23, 2026. It captures model 'blind spots' for non-Western abstract reasoning.
A collection of four annotated datasets submitted to ICWSM'25, created by Marc Riven Herrera and hosted on Harvard Dataverse. It analyzes political and non-political content from Philippine Facebook pages, providing insights into user engagement, sentiment, civility, and misinformation. The datasets support applications in social media analysis, political communication, and machine learning.
Bollywood movies dataset published on Kaggle. The dataset likely contains information about films from the Hindi-language film industry. Metadata is minimal; actual content requires verification after download.
A dataset titled 'movie_recommender.csv' sourced from Kaggle. The title suggests it contains data for building or testing movie recommendation systems. No further metadata is available to confirm its specific contents, size, or origin.