Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,012 datasets
A dataset of news and entertainment content published on HuggingFace by author Sachin21112004. The dataset was last updated on 2026-04-05. The specific volume, source, and temporal coverage of the content are not detailed in the available metadata.
450 conversations designed to test whether persona framing gates the expression of alignment faking (AF) in the Gemma 3 27B-it language model. The dataset was created by author vincentoh and last updated on March 6, 2026. It includes 15 roles, 10 AF elicitation prompts, and 3 experimental conditions, with responses judged by Claude Opus.
A dataset titled 'review-chekpoints--2026-05-20--13259-13259' was published on Kaggle. The title suggests it may contain review data, potentially for analysis or model training. The specific content, scale, and origin are unconfirmed due to minimal metadata.
Geoscience Australia Research Newsletter 28 presents new evidence on ocean-floor volcanism in the Lachlan Fold Belt, focusing on the Wyalong area in New South Wales. The dataset consists of a scientific journal paper published by Geoscience Australia, available in PDF and HTML formats.
This dataset comprises over 6.6 million records including 5.5 million tweets, 750,000 news articles, and 419,000 parliamentary questions from the UK and Denmark. Collected by Daniel Sandvej Eriksen for the American Political Science Review, the data spans 2015 to 2022 to track how political parties initiate and elevate agendas. It provides a multi-channel view of political discourse across social media, mainstream news, and official government proceedings.
newstt is a dataset hosted on Kaggle. The dataset's title suggests it contains news-related text data. No further descriptive metadata, column information, or sample data is available for verification.
A collection of news content written in the Prachalit script, which is used for the Nepal Bhasa (Newar) language. The dataset is hosted on Kaggle, but its specific source, size, and collection date are not detailed in the provided metadata. The content likely contains articles from Newa news outlets, though the exact scope and volume require verification after download.
Kaggle hosts a dataset titled 'Trending Movies'. The dataset likely contains information on films that are currently popular or gaining attention. Specific details on its contents, size, and origin are not provided in the available metadata.
A Kaggle dataset titled 'Fake_news_bangla_dataset_KHR' likely contains text data in the Bengali (Bangla) language related to news articles. The dataset's content and structure are inferred from its title, as no detailed metadata is provided. Its author, size, and specific creation details are unknown.
A dataset related to Tamil cinema box office collections. It is published on Kaggle and is intended for machine learning applications. The specific source, collection method, and temporal coverage are not detailed in the provided metadata.
IMDb provides the source for this dataset, which contains 188 rows of information about the American mockumentary sitcom 'The Office'. The series depicts the everyday lives of office employees at the fictional Dunder Mifflin Paper Company in Scranton, Pennsylvania. The dataset is released under a CC0 1.0 license.
Operational data from the Social Security Administration's Unified Measurement System (SUMS) concerning Continuing Disability Reviews (CDRs). The dataset stores information related to the process of reviewing individuals' eligibility for disability benefits. It was last updated on March 10, 2026.
Approximately 4.5 million measurements of surface water partial pressure of CO2 collected over the global oceans between 1968 and 2008. The data, assembled by the Lamont-Doherty Earth Observatory (LDEO), includes open ocean and coastal measurements from equilibrator-CO2 analyzer systems and has undergone quality control. It is available as a numeric data package from the Carbon Dioxide Information Analysis Center (CDIAC).
Social Security Administration data stores information for reporting on the number of electronic records processed through the Electronic Records Express website and at each Front End Capture System. The dataset was last updated on March 10, 2026. It likely contains operational metrics for tracking digital intake volumes across different capture points.
Social Security Administration's Disability Quality Review (DQR) dataset stores information about the review process associated with disability cases. The dataset was last updated on March 10, 2026. It is published on the Data.gov platform under an unspecified license.
Adespatial provides tools for the multiscale spatial analysis of multivariate data. The methods are based on a spatial weighting matrix and its eigenvector decomposition, known as Moran's Eigenvectors Maps (MEM). The approach is described in the review by Stรฉphane Dray et al. (2012).
playstore-reviews-sentiment is a dataset from Kaggle. It likely contains user reviews for mobile applications from the Google Play Store, annotated with sentiment labels. The dataset's specific size, author, and collection period are not provided in the available metadata.
A dataset listing movies that gained popularity over time. It is hosted on Kaggle, but the specific time range, data source, and collection method are not provided in the metadata. The dataset's author, organization, and last update date are also unknown.
A dataset concerning fake news, published on Kaggle. The specific content, size, and collection methodology are unknown. The dataset's author, organization, and last update date are not provided.
Authoritative entertainment tax growth values for Sioux Falls, South Dakota. The dataset is published by the City of Sioux Falls and was last updated on March 22, 2026. It is available in multiple formats including CSV and GeoJSON.