Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,013 datasets
Taiwan-based survey data explores the relationship between depression and resilience among high school students. The dataset was contributed by author Li, Ming-hui and last updated in March 2026. It originates from the Harvard Dataverse repository under the Social Sciences domain.
Eswatini survey data from Afrobarometer Round 9 supports research on monarchical legitimacy. It includes replication materials for a specific academic article authored by Adeniyi Awoyemi. The dataset was last updated in March 2026.
Professional esports match data for the 2020 season leading up to the League of Legends World Championship. The dataset covers regular season and playoff matches for 22 qualified teams, including G2 Esports and Top Esports, from September 13, 2020. It was originally created and updated by Tim Sevenhuysen of OraclesElixir.com.
CodeReview-Bench is a software engineering benchmark curated by ronantakizawa for evaluating models on code editing and review tasks. It contains between 100,000 and 1,000,000 records derived from GitHub interactions, updated as of March 2026. The dataset is structured to support sequence-to-sequence tasks where natural language feedback is converted into code modifications.
TMDB Movies Dataset is a collection of movie-related data published on Kaggle. The dataset likely contains information about films, such as titles, genres, cast, crew, and ratings. Its specific size, columns, and time range are unknown from the provided metadata.
A collection of film reviews for horror movies sourced from the IMDB platform. The dataset likely contains user-generated text reviews and associated metadata. It was published on Kaggle, but the author, size, and specific time range are unknown.
Hijabi and Non-Hijabi Facial Expression Dataset is a collection of facial images published on Kaggle. The dataset likely contains images of individuals with and without hijabs, annotated for expression analysis. Its specific size, collection method, and author are unknown.
Old Vietnamese News Dataset, Cleaned Version is a text corpus published on Kaggle. The title suggests it contains historical news articles in Vietnamese that have undergone a cleaning process. Metadata is minimal; actual content, size, and collection methods require verification after download.
Twitter data that has been processed for use with the Naive Bayes machine learning algorithm. The dataset is hosted on Kaggle, but its specific source, size, and creation details are unknown. Its content likely contains text from tweets formatted for classification tasks.
Kaggle hosts a collection of datasets focused on movie recommendations. The specific content, scale, and origin of the data are not detailed in the provided metadata. Users must download the datasets to inspect the actual records, features, and data quality.
Movies Dataset Arpit is a dataset published on Kaggle. Its title suggests it contains information related to films. The dataset's specific content, size, and origin are not detailed in the provided metadata.
HinFakeNews is a dataset focused on fake news detection in the Hindi language. The dataset is hosted on Kaggle, but specific details about its size, creation date, and authorship are not provided in the available metadata. Its content likely contains text samples labeled as real or fake news for model training.
WeART provides 280,000+ artworks labeled with 152 styles and 1,556 artists, published by ZexiJia in 2026. It functions as a multimodal benchmark for artistic style analysis, addressing gaps in cultural coverage and annotation completeness found in previous art collections.
movies.csv is a dataset hosted on Kaggle. Its specific content, size, and provenance are not detailed in the available metadata. The dataset likely contains information related to films, such as titles, genres, or ratings.
Kaggle hosts a dataset listing movies and television shows. The dataset likely contains titles and associated metadata for media released or cataloged in the year 2020. Its specific contents, such as cast, genre, or ratings, require verification after download.
Review-checkpoints--2026-05-17--13256-13256 is a dataset published on Kaggle. Its title suggests it contains information related to checkpoints, likely for evaluating or reviewing machine learning models. The actual content, scale, and structure require verification after download.
The Movie Database (TMDB) provides information on films scheduled for release in 2026. This dataset was fetched via the TMDB API, though the specific number of records and data fields are not detailed. The original author, organization, and last update date are unknown.
Data_review_clustering is a dataset hosted on Kaggle. Its title suggests it contains textual reviews intended for clustering analysis. The dataset's specific content, size, and origin are not detailed in the available metadata.
Arabic Fake News Data AFND is a dataset hosted on Kaggle. Its title suggests it contains Arabic-language news articles labeled for veracity. The specific number of articles, collection method, and authorship details are not provided in the available metadata.
TMDB Top Movies Rating is a dataset published on Kaggle. The title suggests it contains ratings for popular movies sourced from The Movie Database (TMDB). The dataset's specific content, size, and authorship are unknown.