Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,980 datasets
A historical analysis by Jeremy Kuzmarov examines U.S. police training programs as a tool of foreign policy and nation-building. The work covers interventions from the early 20th century, including the Philippines and Haiti, through the Cold War and the post-9/11 wars in Iraq and Afghanistan. It argues these programs were used to suppress radical movements and create social control, often resulting in blowback against U.S. interests.
An article by Hazel Rose Markus of Stanford Medicine reviews psychological and cultural research on the meaning of choice. The work contrasts Western, particularly American, perspectives with non-Western and working-class Western views. It examines the relationship between choice, freedom, autonomy, and well-being.
151 quality indicators for blogs and podcasts were identified and refined through a rigorous research process. The resulting Quality Checklists are designed to assist with quality appraisal of medical blogs and podcasts. The dataset was created by Isabelle N Colmers at the University of Alberta and is available under an Open Access (diamond) license.
Emily G. Hervey's study investigates correlations between childhood transition patterns and college adjustment success for Missionary Kids, a subgroup of Third Culture Kids. The research tests hypotheses about the impact of negative transition experiences, interaction with Western peers, and support systems. The dataset likely contains survey or assessment data from the study, which was published on the paperswithcode platform.
Anthony Summers' biography draws on more than 800 interviews to detail the life of FBI Director J. Edgar Hoover. The book covers his nearly fifty-year tenure and his role in major twentieth-century American events. It is a closed-license publication sourced from the paperswithcode platform.
Three networks of political communication between Twitter users were sampled from the public Twitter streaming API. The data was collected over a six-week period prior to the 2010 U.S. Congressional midterm elections. It was created by researchers including Michael Conover of Indiana University Bloomington for a 2011 ICWSM conference paper.
NOAA_NCEI Accession 9700169 contains temperature, salinity, and other oceanographic data collected from CTD casts aboard the R/V New Horizon. The data includes measurements of depth, temperature, salinity, sigma-theta, light attenuation, and dynamic height. Collection occurred in the Southern California Bight from May 10 to May 16, 1986.
Oceanographic data from CTD casts collected during the Texas-Louisiana Shelf Circulation and Transport Processes Study (LATEX PART A). Texas A&M University gathered the data, which includes parameters like temperature, pressure, and oxygen concentration. Measurements were taken over an eight-day period from May 1 to May 8, 1992.
Structured news headlines intended for automated text analytics tasks. The dataset's author, organization, and size are unspecified. Its last update date is unknown.
A cleaned dataset of 9,668 movies and TV shows available on the Amazon Prime Video streaming platform. The data was sourced from Kaggle, but the original author, organization, and specific collection method are not provided. The last update date and temporal coverage of the catalog are unknown.
AG News is a dataset for topic classification tasks, likely containing news articles. It is published on Kaggle. The dataset's size, creation date, and author are unknown.
54.4 KB of morphometry statistics from a study on 3D 'emboli' culture models of epithelial breast cancer cells. The data, published on figshare by Kuppusamy Balamurugan, likely contains quantitative measurements related to cell morphology and mitochondrial metabolism. It was last updated on 2026-03-19.
A curated compilation of real and fake news articles for NLP and classification tasks. The dataset's author, organization, and specific size are unknown. Its last update date is also unknown.
Replication package for a JFQA-published paper on household finance origins. The data, authored by Guillaume Vuillemey, was last updated on April 20, 2026. Its specific contents likely support the paper's analysis of home ownership and agricultural cultural heritage.
Monthly financial data from fantasy sports wagering, detailing entry fees, winnings, and state revenue calculations. The dataset includes columns for total entry fees, total winnings, net revenue, in-state fees, location percentage, contest revenue, and state payments. It is maintained by the State of Connecticut, with the latest update in March 2026.
A mixture of 2,909,551 Chinese news articles from the SogouCA and SogouCS corpora, categorized into 5 classes. The dataset was created by Xiang Zhang, Junbo Zhao, and Yann LeCun, with Chinese characters converted to Pinyin. Classification labels are derived from the news article's URL domain.
Replication data and code for all tables and figures in a published academic paper. The dataset, authored by Guillaume Vuillemey, is hosted on Harvard Dataverse and was last updated in April 2026. It likely contains tabular data supporting analysis of home ownership as a cultural heritage linked to agricultural backgrounds.
50,000 movie reviews from IMDb, each labeled as positive or negative for sentiment analysis. The dataset was sourced from the Kaggle platform, but the author, organization, and specific collection time range are not provided. Its primary purpose is for training and evaluating binary sentiment classification models.
A collection of Persian news articles intended for classification tasks. The dataset is hosted on Kaggle, but its size, creation date, and authorship are unspecified. Columns and sample data are unavailable, limiting detailed assessment prior to download.
FP AI Review is a dataset published on Kaggle. Its title suggests it contains reviews or evaluations related to artificial intelligence products, services, or models. The dataset's specific content, size, and origin are not detailed in the available metadata.