Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,012 datasets
NOAA's National Oceanic and Atmospheric Administration collected surface underway chemical, meteorological, and physical data from the R/V F.G. Walton Smith in 2023. The dataset includes air-sea difference of partial pressure of carbon dioxide (pCO2), pCO2 in atmosphere and water, barometric pressure, sea surface salinity, and temperature. These measurements are part of the Global Coastal Carbon Data Project, focusing on carbon cycle understanding on continental margins.
A collection of MRI-derived mechanical and structural measurements from human cadaveric intervertebral discs, produced by Deva Chan and hosted on Harvard Dataverse. It quantifies T1 and T2 relaxation times alongside in-plane strains and estimated shear modulus under physiological compression and bending loads.
Extracted film metadata, ratings, cast, genres, and user reviews from Letterboxd. The dataset is hosted on Kaggle, but its author, size, and temporal coverage are unknown. The description suggests it contains scraped data from the Letterboxd platform.
Scraped news articles from Google News based on keywords, brands, or topics. The scraper returns canonical URLs and covers over 50 countries. The author, organization, and specific temporal coverage are unknown.
Hacker News stories, comments, and polls scraped via the Algolia API. The dataset likely contains user-generated content from the technology and startup discussion forum. The author, organization, and last update date are unknown.
Steam game metadata, pricing, genres, Metacritic scores, and user reviews scraped from the platform. The dataset likely contains structured information on games available via the Steam digital distribution service. The author, organization, and specific data volume are unknown.
Roblox Indonesian Reviews Dataset is a text collection intended for sentiment analysis experiments using machine learning and deep learning. The dataset is hosted on Kaggle, but its author, size, and license are unknown. Its last update date is also unknown.
A dataset titled 'output_epresso' published on Kaggle. The title suggests the data likely contains information related to espresso or coffee. Specifics regarding its contents, size, and origin are unavailable from the provided metadata.
A preprocessed derivative of the ISOT Fake and Real News Dataset, designed for binary text classification tasks. The original dataset contains collections of both fake and real news articles. This version has been processed for machine learning applications, though specific preprocessing steps are not detailed.
Turkish Technology News Dataset (HWP.com.tr) is a collection of news articles focused on technology topics in the Turkish language. The dataset is intended for natural language processing, machine learning, and text analysis projects. The source is the Turkish news website HWP.com.tr, but the author, license, and specific data volume are unknown.
50,000 movie entries sourced from the Internet Movie Database (IMDB). The dataset is hosted on Kaggle, a platform for data science competitions and projects. The specific collection date, author, and update frequency are not provided in the available metadata.
A dataset of 50,000 movies sourced from IMDB. The dataset is hosted on Kaggle, but the author, specific columns, and update history are unknown. The content likely includes movie titles and associated metadata.
News Category Dataset (20 Classes) is a text dataset hosted on Kaggle. The title suggests it contains news articles labeled into 20 distinct categories. The dataset's author, size, and specific source are unknown.
Vietnamese_news_10m is a dataset hosted on Kaggle. Its title suggests it likely contains a corpus of news articles written in the Vietnamese language. The dataset's scale, source, and creation details are not provided in the available metadata.
Synthetic binary blobs used for measuring sequential write throughput and latency. The dataset is strictly for infrastructure verification purposes. New data for version 2026 04 was uploaded by author micmicmicmicmicchan on 2026-03-21.
York County Chapter news articles are referenced by the organization's social media posts. The dataset was created by Jennifer D. McGrew of the York County Chapter and was last updated on April 1, 2026.
Brevard County Chapter news videos are referenced by the organization's social media posts. The dataset was authored by Jennifer D McGrew and last updated in April 2026.
News articles referenced by the Brevard Chapter's social media posts, compiled by Jennifer D McGrew of the Brevard County Chapter. The dataset was last updated in April 2026.
News videos form the media archive referenced by a specific local government chapter's social media posts. The dataset was created by Jennifer D McGrew of the Placer County Chapter and was last updated in April 2026. The exact volume of videos and their publication dates are not specified.
Placer - News Articles contains news articles referenced by social media posts from the Placer County Chapter. The dataset was created by Jennifer D McGrew of the Placer County Chapter organization. It was last updated on April 1, 2026.