Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,995 datasets
DL_ohlcv_newssources is a dataset from Kaggle. Its title suggests it combines financial market data, likely OHLCV (Open, High, Low, Close, Volume) metrics, with news sources. The dataset's actual content, scale, and origin require verification after download.
Raw_fwd_return_DL_ohlcv_newssource combines financial market returns with price and news data. The dataset is hosted on Kaggle, but its specific origin, size, and creation date are not detailed in the available metadata. Columns likely contain forward-looking returns, open-high-low-close-volume (OHLCV) data, and indicators of news sources.
Historical TikTok Gaming Dataset is a dataset published on Kaggle. The title suggests it contains data related to gaming content on the TikTok platform over a past period. The dataset's specific content, size, and collection details are unknown from the provided metadata.
A dataset containing real-time information related to the TikTok platform. It was published on Kaggle, but the specific collection date and author are unknown. The dataset's content, scale, and specific variables require verification after download.
TikTok Search Gaming Dataset is a collection of data related to search behavior on the TikTok platform, specifically within the gaming domain. The dataset is hosted on Kaggle, but its specific size, authorship, and update history are unknown. Columns likely contain information about search queries, user interactions, or content related to gaming topics.
Backlink data likely related to news publishing, digital public relations, and search engine optimization. The dataset is hosted on Kaggle, but its specific contents, size, and origin are not detailed in the provided metadata. The author, organization, and last update date are unknown.
Prediksi Engagement TikTok & Instagram is a dataset hosted on Kaggle. The dataset likely contains metrics for predicting user engagement on the TikTok and Instagram social media platforms. Its specific contents, size, and origin are not detailed in the available metadata.
13 episodes from the first season of the animated series The Simpsons provide the source text. The dataset is formatted in a native CPT JSON structure and was uploaded by SicariusSicariiStuff. The record was last updated on March 11, 2026.
CTD (conductivity, temperature, depth) data from 62 casts conducted during the Aurora Australis KROCK cruise from January to March 1993. The data, collected by the Australian Antarctic Data Centre (AU_AADC), supplements a krill and geology research program in the Prydz Bay region. Measurements include Pressure, Temperature, Salinity, and Sigma-T.
A dataset derived from BBC News RSS feeds, likely containing text features for topic modeling. The description mentions Non-negative Matrix Factorization (NMF) was applied to the data. The author, organization, and specific scale are unknown.
Real news articles scraped from various sources are paired with dramatized versions labeled as fake news. The dataset's author, size, and specific sources are not detailed in the provided metadata. Its creation method suggests it is intended for binary classification tasks in media analysis.
News Dataset is a text corpus hosted on Kaggle. The dataset's specific content, size, and collection methodology are not detailed in the available metadata. Its source, author, and temporal coverage are unknown.
Closing-line odds and no-vig probabilities for sportsbook teasers are available for preview at an external site. The dataset appears to contain betting market data, likely from August 2026. Its specific structure and volume are not detailed in the provided description.
Transportation Safety Board of Canada provides an official investigation report for a 2021 helicopter accident. The report details a collision between a sling load and tail rotor involving an Airbus AS350 B2 operated by Héli-Express Inc. near Les Escoumins, Quebec on May 11, 2021.
102,028 images are grouped into 11,142 subsets, each containing an original image and manipulated derivatives. The dataset was created by Silvan Heller of the University of Basel for research on media derivation and tampering detection. It was sourced from a large community of image manipulation enthusiasts.
Filtered and averaged lidar data from a buoy-mounted Leosphere Windcube 866 instrument, standardized into NetCDF format. The dataset is provided by Raghavendra Krishnamurthy of the Pacific Northwest National Laboratory. It includes parameters from various instruments on the buoy, with details on measurement frequency available in an attached data dictionary.
Despoina Chatzakou from Aristotle University of Thessaloniki presents a dataset for detecting bullying and aggressive behavior on Twitter. The corpus contains 1.6 million tweets posted over a 3-month period. The research proposes a methodology extracting text, user, and network-based attributes to distinguish bullies and aggressors from regular users.
Review-checkpoints--2026-06-01--13271-13271 is a dataset hosted on Kaggle. The title suggests it likely contains evaluation metrics or saved states from a machine learning model training process. No further metadata, such as author, size, or columns, is provided.
A dataset of movies, likely containing information about films. It is published on the Kaggle platform. The specific content, size, and origin are unknown from the provided metadata.
2004 data from the AnSlope program in the Ross Sea, collected from the Nathaniel B. Palmer research vessel. The dataset contains oceanographic measurements including temperature, salinity, dissolved oxygen, and pressure to study dense water transfer and poleward flow. It is managed by NOAA NCEI and appears on multiple government data platforms.