Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,964 datasets
A curated collection of trending movies sourced from The Movie Database (TMDB). The description indicates the data includes popularity scores for movies, suggesting a focus on measuring public interest over time. The dataset's author, organization, and specific temporal coverage are unknown.
Tiktok APP Review 2026 is a dataset published on Kaggle. The title suggests it contains user reviews for the TikTok application from the year 2026. The dataset's specific content, size, and origin require verification after download.
A collection of 21 field observations of residential roofing integrated photovoltaic installations in California, with 2 observations for re-roofing projects and 19 for new construction projects. It documents detailed time and motion data for installation activities collected between July 2021 and June 2022 by the Department of Energy.
290,100,190 bytes of source data include unprocessed toeprinting films, raw luciferase measurements for reporter assays, polysome profiling data, and raw biofilm analysis data. The dataset supports research into how extended Shine-Dalgarno motifs govern translation initiation in the bacterium Staphylococcus aureus. It contains raw experimental outputs from multiple biochemical and genetic assays.
Cosmos WTS Compress Prompt is a dataset hosted on Kaggle. Its title suggests it contains text prompts related to compression tasks. The dataset's author, organization, and specific content details are unknown.
Synthetic data designed for building and testing recommendation systems and graph machine learning models. The dataset is hosted on Kaggle, but the author, organization, and specific data volume are unknown. Its last update date and licensing information are also not provided.
A 31-page report summarizing the U.S. transportation system for the Bureau of Transportation Statistics. The publication, 'Transportation in the United States: A Review,' provides a snapshot highlighting physical characteristics and trends in passenger travel and freight movement. It examines the economic performance, safety record, and environmental impact of the system, which served 260 million people and 6 million businesses at the time.
50,250 synthetic records emulate a catalog of movies and TV series similar to Netflix. The dataset is hosted on Kaggle, but its author, license, and update history are not specified. Its synthetic nature suggests it was generated for modeling or analysis rather than sourced from a real service.
MMDocIRT2ITRetrieval is an evaluation dataset from the Massive Text Embedding Benchmark (MTEB). It contains 313 long documents averaging 65.1 pages, categorized into ten domains including research reports, academic papers, and government documents. The dataset features a multimodal distribution, with text comprising 60.4% of the content.
A dataset related to the Facebook platform, sourced from Kaggle. The specific content, size, and creation details are not provided in the available metadata. Users must download the dataset to inspect its actual structure and contents.
ViSocialNews is a dataset for classifying Vietnamese social media news. The dataset likely contains text posts from social media platforms, annotated for news classification tasks. Its author, organization, and specific size are unknown.
Amazon Fake Review Labled is a dataset hosted on Kaggle. The title suggests it contains Amazon product reviews with labels indicating authenticity. The dataset's author, organization, and specific details are unknown.
Code_reviews is a dataset hosted on the Kaggle platform. The dataset's title suggests it contains records related to the software code review process. No further descriptive metadata, sample data, or column definitions are available for verification.
Code review data likely contains records of software code changes and associated review comments. The dataset is hosted on Kaggle, but its specific size, origin, and creation date are unknown. Columns and sample data are unavailable for verification.
September 1984 to November 1985 data collection of temperature and salinity profiles from CTD casts in the Atlantic Ocean aboard the R/V Oceanus. This dataset was created by the University of Rhode Island's Graduate School of Oceanography for the Mediterranean Eddy (MEDDY) experiment. It represents a focused oceanographic campaign to study a specific mesoscale feature.
Four ships collected water depth and temperature profiles across the North and South Atlantic Ocean between August 14 and December 4, 1986. The dataset originates from the World Ocean Circulation Experiment (WOCE) and was submitted by Dr. Reiner Onken of the University of Kiel, Germany. Data is available in the NODC C125 Bathythermograph-XBT-Selected Depths file format.
Over two years of pressure, temperature, and current velocity data were collected from the R/V ENDEAVOR and OCEANUS research vessels. The dataset was submitted by Thomas Shay of the University of North Carolina at Chapel Hill as part of the SYNoptic Ocean Prediction project. Measurements were taken via speed meter casts in the Gulf of Mexico.
Atlantic Ocean water temperature and pressure profiles collected from 1988 to 1995 via the BSH Ship-of-Opportunity Programme. The dataset contributes to the World Ocean Circulation Experiment, with principal investigation led by Dr. Alexander Sy of the Bundesamt für Seeschiffahrt und Hydrographie. It represents a multi-year collection of expendable bathythermograph (XBT) data.
Northeast Pacific Ocean data comprises CTD vertical cast measurements from the R/V New Horizon cruise CaBS7, collected off the California coast. The dataset captures seawater pressure, temperature, and salinity from October 16 to 23, 1987. Dr. Barbara Hickey of the University of Washington led the collection for the Southern California Bight Basin Study.
A movie dataset intended for building recommendation systems and performing data analysis. It originates from the Kaggle platform, but details on its creator, size, and specific contents are unspecified. The last update date is unknown.