Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,008 datasets
Giving access to a mapping of official and commonly spoken languages for countries worldwide, containing fewer than 1,000 records. Created by brandontravel and updated in March 2026, it serves as a reference for developers building travel and translation tools.
FanChuan is a multilingual, graph-structured benchmark containing between 10,000 and 100,000 records for parody detection and analysis on social media, published by Ziyi510 in 2025. The data covers six distinct subsets including Reddit-Trump, Tiktok-Trump, and CampusLife in both English and Chinese.
250 movies comprise this list of the highest-rated titles from the IMDB platform. The dataset is hosted on Kaggle, but its specific columns, source, and update history are not detailed in the provided metadata. Further details about the data's collection method and time period are unknown.
Top 250 IMDB Movies is a dataset published on Kaggle. It likely contains a ranked list of films based on user ratings from the Internet Movie Database. The specific columns, data volume, and creation details are not provided in the metadata.
20,000+ cleaned player reviews curated for sentiment analysis and text mining. The dataset likely contains textual feedback from users of the Steam gaming platform. Its author, organization, and last update date are unknown.
SIB-200 is a multilingual topic classification dataset covering 205 languages and dialects. It is based on the human-translated Flores-200 corpus, with topic annotations originally provided in English for categories like science/technology, travel, and politics. The dataset was created by the mteb organization and last updated in February 2026.
Movie ratings and tags from the MovieLens platform, likely containing 25 million records. The dataset was published on huggingface by alitourani and was last updated on 2026-04-10. The specific columns, file formats, and license are currently unknown.
A business-to-business contact list focused on the sports and hockey industries in the United States. The dataset title indicates it contains 20 leads. It is published on Kaggle, but the author, source organization, and collection methodology are unknown.
Monthly reports on the receipt of information requests submitted under Ukraine's 'On Access to Public Information' law. The dataset is provided by the States site of Ukraine and was last updated on March 3, 2026. It is available in multiple formats including Word, Excel, and CSV.
TikTok Search Cards data, published on the Hugging Face platform by author qiyang666. The dataset was last updated on April 10, 2026. Its specific content, scale, and structure require verification after download.
High-frequency wall pressure traces sampled at 1MHz for analyzing detonation wave stability. The data is intended for use in Remaining Useful Life prognosis models using CNN-BiLSTM architectures. The author, organization, and specific data volume are unknown.
Netflix Movies and TV Shows is a dataset from Kaggle. The dataset likely contains information about titles available on the Netflix streaming platform. The specific content, size, and origin details are not provided in the available metadata.
A multilingual text dataset for fake news detection, containing content in both Bengali and English. It is hosted on Kaggle, but the author, organization, and creation details are unspecified. The dataset's size, specific contents, and collection methodology are not described in the available metadata.
Kaggle hosts the CK-Dataset-Face-Expressions. The dataset likely contains images of human faces depicting various emotional expressions. Its specific scale, creation details, and update history are not provided in the available metadata.
A geospatial dataset detailing the development plan for the 'Riding and shooting sports area Scheuen' in the city of Celle. It is provided via a Web Feature Service (WFS) in the INSPIRE PLU data format version 4.0.1. The dataset is maintained by the Bundesamt für Kartographie und Geodäsie and was last updated on March 3, 2026.
Min Fang Bay in the Southern Ocean is the location for this dataset of temperature, salinity, sigma_t, and pressure measurements. The data was collected using a CTD instrument from an unknown platform and contributed by NOAA's National Centers for Environmental Information (NCEI). Measurements were taken between 1984 and 1985.
1997 to 1999 data on temperature, salinity, conductivity, pressure, and transmissivity gathered by Conductivity-Temperature-Depth (CTD) instruments on Canadian Coast Guard Ships. The data were collected as part of the North Water Polynya project and are archived by NOAA's National Centers for Environmental Information (NCEI).
Temperature profile, pressure, nutrients, and biological data were collected from the R/V TYRO in the Indian Ocean. The data spans from May 1992 to February 1993. NOAA_NCEI is the authoritative organization for this dataset.
Replication data for research on how daylight influences news emotion, used in Studies 2 through 4. The data was created by author Jiaxin Li for the project 'How daylight shapes news emotion'. It was last updated on April 3, 2026.
A dataset titled 'review-chekpoints--2026-05-21--13260-13260' was published on Kaggle. The title suggests it likely contains textual review data, possibly with checkpoint or versioning information. Metadata is minimal; the actual content, scale, and structure require verification after download.