DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

11,013 datasets

Taiwanese High School Student Depression and Resilience Survey

Taiwan-based survey data explores the relationship between depression and resilience among high school students. The dataset was contributed by author Li, Ming-hui and last updated in March 2026. It originates from the Harvard Dataverse repository under the Social Sciences domain.

TabularMental HealthResilienceSocial SciencesHigh School StudentsDepression+1

0 views

Media & Communication

Public Opinion and Monarchical Legitimacy in Eswatini

Eswatini survey data from Afrobarometer Round 9 supports research on monarchical legitimacy. It includes replication materials for a specific academic article authored by Adeniyi Awoyemi. The dataset was last updated in March 2026.

TabularSocial SciencesPolitical LegitimacyEswatiniPublic OpinionAfrobarometer+1

0 views

Media & Communication

Professional League of Legends Matches from 2020 Pre-Worlds Season

Professional esports match data for the 2020 season leading up to the League of Legends World Championship. The dataset covers regular season and playoff matches for 22 qualified teams, including G2 Esports and Top Esports, from September 13, 2020. It was originally created and updated by Tim Sevenhuysen of OraclesElixir.com.

TabularCompetitive GamingEsportsMatch ResultsLeague Of Legends+1

0 views

Media & Communication

CodeReview-Bench: 100K+ GitHub Code Editing and Review Pairs

CodeReview-Bench is a software engineering benchmark curated by ronantakizawa for evaluating models on code editing and review tasks. It contains between 100,000 and 1,000,000 records derived from GitHub interactions, updated as of March 2026. The dataset is structured to support sequence-to-sequence tasks where natural language feedback is converted into code modifications.

ParquetLanguagecodeTask Categoriestext GenerationLibrarypolarsLanguageenModalitytextSize Categories100 Kn1 MLibrarymlcroissantSoftware EngineeringLibrarydatasetsBenchmarkLibrarypandasCode ReviewCode GenerationRegionusLicensemit+1

0 views

Media & Communication

TMDB Movies Dataset

TMDB Movies Dataset is a collection of movie-related data published on Kaggle. The dataset likely contains information about films, such as titles, genres, cast, crew, and ratings. Its specific size, columns, and time range are unknown from the provided metadata.

TabularMoviesTmdbEntertainment+1

0 views

Media & Communication

Horror Movie Reviews from IMDB

A collection of film reviews for horror movies sourced from the IMDB platform. The dataset likely contains user-generated text reviews and associated metadata. It was published on Kaggle, but the author, size, and specific time range are unknown.

TextFilm ReviewsSentiment AnalysisHorror MoviesImdb+1

0 views

Media & Communication

Hijabi and Non-Hijabi Facial Expression Dataset

Hijabi and Non-Hijabi Facial Expression Dataset is a collection of facial images published on Kaggle. The dataset likely contains images of individuals with and without hijabs, annotated for expression analysis. Its specific size, collection method, and author are unknown.

ImageDemographic AnalysisHuman AttributesComputer VisionFacial Expressions+1

0 views

Media & Communication

Old Vietnamese News Dataset, Cleaned Version

Old Vietnamese News Dataset, Cleaned Version is a text corpus published on Kaggle. The title suggests it contains historical news articles in Vietnamese that have undergone a cleaning process. Metadata is minimal; actual content, size, and collection methods require verification after download.

TextNews ArticlesHistorical NewsText DataVietnamese Language+1

0 views

Media & Communication

Twitter Data Prepared for Naive Bayes Classification

Twitter data that has been processed for use with the Naive Bayes machine learning algorithm. The dataset is hosted on Kaggle, but its specific source, size, and creation details are unknown. Its content likely contains text from tweets formatted for classification tasks.

TextNaive BayesSocial MediaText ClassificationTwitter Data+1

0 views

Media & Communication

Top Movie Recommendation Datasets

Kaggle hosts a collection of datasets focused on movie recommendations. The specific content, scale, and origin of the data are not detailed in the provided metadata. Users must download the datasets to inspect the actual records, features, and data quality.

TabularMovie RecommendationCollaborative FilteringEntertainment+1

0 views

Media & Communication

Movies Dataset Arpit

Movies Dataset Arpit is a dataset published on Kaggle. Its title suggests it contains information related to films. The dataset's specific content, size, and origin are not detailed in the provided metadata.

TabularMoviesFilmEntertainment+1

0 views

Media & Communication

HinFakeNews: Hindi Fake News Dataset

HinFakeNews is a dataset focused on fake news detection in the Hindi language. The dataset is hosted on Kaggle, but specific details about its size, creation date, and authorship are not provided in the available metadata. Its content likely contains text samples labeled as real or fake news for model training.

TextHindi LanguageFake NewsText ClassificationNatural Language Processing+1

0 views

Media & Communication

WeART: 280,000 Artworks Across 152 Styles and 1,556 Artists

WeART provides 280,000+ artworks labeled with 152 styles and 1,556 artists, published by ZexiJia in 2026. It functions as a multimodal benchmark for artistic style analysis, addressing gaps in cultural coverage and annotation completeness found in previous art collections.

MultimodalWEBDATASETLicenseotherLanguageenLibrarywebdatasetModalitytextSize Categories100 Kn1 MLibrarymlcroissantModalityimageLibrarydatasetsBenchmarkTask Categoriesimage ClassificationIcassp 2026Computer VisionArxiv260117697RegionusStyle RetrievalCultural AnalysisArtArtistic Style+1

0 views

Media & Communication

movies.csv

movies.csv is a dataset hosted on Kaggle. Its specific content, size, and provenance are not detailed in the available metadata. The dataset likely contains information related to films, such as titles, genres, or ratings.

TabularFilm DataMoviesEntertainment+1

0 views

Media & Communication

Movies and TV Shows Catalog from 2020

Kaggle hosts a dataset listing movies and television shows. The dataset likely contains titles and associated metadata for media released or cataloged in the year 2020. Its specific contents, such as cast, genre, or ratings, require verification after download.

TabularMoviesMedia CatalogTv ShowsEntertainment+1

0 views

Media & Communication

Review Checkpoints: Model Evaluation Data

Review-checkpoints--2026-05-17--13256-13256 is a dataset published on Kaggle. Its title suggests it contains information related to checkpoints, likely for evaluating or reviewing machine learning models. The actual content, scale, and structure require verification after download.

TabularMachine LearningModel EvaluationReview Checkpoints+1

0 views

Media & Communication

Upcoming Movies for 2026 from The Movie Database

The Movie Database (TMDB) provides information on films scheduled for release in 2026. This dataset was fetched via the TMDB API, though the specific number of records and data fields are not detailed. The original author, organization, and last update date are unknown.

TabularMoviesTmdbUpcoming ReleasesEntertainment+1

0 views

Media & Communication

Data Review Clustering for Text Analysis

Data_review_clustering is a dataset hosted on Kaggle. Its title suggests it contains textual reviews intended for clustering analysis. The dataset's specific content, size, and origin are not detailed in the available metadata.

TextReview AnalysisText ClusteringCustomer FeedbackUnsupervised Learning+1

0 views

Media & Communication

Arabic Fake News Data AFND: News Articles with Veracity Labels

Arabic Fake News Data AFND is a dataset hosted on Kaggle. Its title suggests it contains Arabic-language news articles labeled for veracity. The specific number of articles, collection method, and authorship details are not provided in the available metadata.

TextArabic LanguageFake NewsText Classification+1

0 views

Media & Communication

TMDB Top Movies Rating

TMDB Top Movies Rating is a dataset published on Kaggle. The title suggests it contains ratings for popular movies sourced from The Movie Database (TMDB). The dataset's specific content, size, and authorship are unknown.

TabularFilm DataMovie RatingsTmdbEntertainment+1

0 views

PreviousPage 357 of 550Next