Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,013 datasets
TikTok data published on Kaggle, a popular platform for data science and machine learning projects. The dataset's specific content, size, and collection method are not detailed in the available metadata. Users must download the data to verify its structure and potential for analysis.
3,706 forecasting questions regarding AI industry developments, model releases, and agentic deployments were compiled by LightningRodLabs. The collection spans January 2025 to January 2026 and features binary outcomes verified through web search using a 'Future-as-Label' methodology.
A dataset from Kaggle containing clinical and gene expression information for breast cancer patients. The title suggests it likely contains z-score normalized gene expression values alongside clinical variables. The specific source, time range, and collection method are not detailed in the provided metadata.
A collection of e-commerce product reviews, likely containing customer feedback text and associated metadata. The dataset is hosted on Kaggle, but its specific origin, size, and creation date are not detailed in the available metadata. Columns and sample data are unknown, limiting immediate assessment of its content and structure.
This dataset combines 11 monthly surveys with 15,000 total participants to investigate news discernment patterns in the U.S. It measures how informed voters are about political news, finding that 47% of subjects confidently choose a true story over a fake one, while 3% choose the fake. The analysis links discernment to socioeconomic differences and partisan congruence.
2026-03-19 updated data from the British Geological Survey's GeoIndex Offshore cultural data theme. The dataset provides marine geology and digital map information for the UK and other global areas via a free web service interface.
A collection of draft masterplan documents for the Copthall Sports Hub and Mill Hill Open Spaces in London. The documents include full and summary versions referenced in Environment Committee reports from March and September 2018. The consultation report contains embedded PDF responses from the public.
A dataset titled 'Movies' published on the Kaggle platform. The dataset likely contains information related to films, but specific details such as columns, size, and origin are unknown. Users must inspect the actual content after download to verify its scope and utility.
A large-scale corpus of weight loss barriers constructed from Reddit narratives. The data is annotated using the COM-B behavioral framework, which categorizes barriers related to capability, opportunity, and motivation. The dataset was sourced from Kaggle, but the author, organization, and specific size details are unknown.
A dataset listing popular movies from around the world, sourced from Kaggle. The specific number of records, features, and time period covered are not detailed in the available metadata. Users should verify the actual content and scope after download.
Hatred-on-Twitter-During-MeToo-Movement is a dataset of tweets from the MeToo movement, labeled for hatred and non-hatred content. The data includes tweet text, timestamps, and user engagement metrics like retweet and favorite counts. It was sourced from openml under a CC-BY-NC-SA-4.0 license.
Movie V7 is a dataset uploaded to Hugging Face by author vanduc11. The dataset's specific content and scale are unknown, but the title suggests it contains information related to films. It was last updated on April 1, 2026.
A collection of scientific visualization datasets converted to a chunked, multi-scale OME-Zarr format and hosted on AWS S3. The project is provided by NumFOCUS under a CC-BY-4.0 license through the AWS Open Data Program. It aims to serve as a web-based resource for the scientific visualization community.
A dataset of job postings, likely containing both real and fraudulent listings. It was published on Kaggle, but the specific collection date, author, and data volume are unknown. The dataset's primary purpose appears to be for training models to identify deceptive employment advertisements.
Multisensory mental health monitoring data, likely collected from wearable devices during art therapy sessions. Published on Kaggle, the dataset's specific size, collection date, and author are unknown. It appears to combine physiological signals with therapeutic activity data.
A collection of images uploaded to Twitter by user 'Tianxinkitten' on April 23, 2025. The dataset, authored by 'daaxila', was last updated on the Hugging Face platform in April 2026. The exact number of images and their content is not specified in the available metadata.
1880 to 1979 daily sea-level pressure data for the Northern Hemisphere, provided on a 10-degree by 5-degree latitude/longitude grid. The dataset was produced by the organization SCIOPS and is hosted on the nasa_earthdata platform.
September 1996 measurements of temperature, salinity, conductivity, pressure, and transmissivity collected using a CTD instrument from the R/V Alpha Helix in the Chukchi Sea. The data is provided by NOAA NCEI under accession number 0061042. The dataset represents a snapshot of Arctic oceanographic conditions during a late summer cruise.
spacespress_ADL_2026 is a dataset published on Kaggle. Its title suggests a focus on Activities of Daily Living, which are tasks related to personal care and routine. The dataset's specific content, size, and collection details are not provided in the available metadata.
Imdb 2019 Movie Reviews is a text dataset published on the Hugging Face platform by author gosaeng101. The dataset likely contains user reviews for movies from the IMDb database, focusing on the year 2019. The last recorded update to the dataset listing was on 2026-04-06 17:21:05.