DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

11,008 datasets

Languages By Country: Global Mapping of Official and Common Tongues

Giving access to a mapping of official and commonly spoken languages for countries worldwide, containing fewer than 1,000 records. Created by brandontravel and updated in March 2026, it serves as a reference for developers building travel and translation tools.

JSONLibrarypolarsLanguageenLicensecc0 10Size Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasCountriesLanguagesTravelRegionusGeographyTask Categoriestext ClassificationTask Categoriestranslation+1

0 views

Media & Communication

FanChuan: Multilingual Graph-Structured Parody Detection Benchmark

FanChuan is a multilingual, graph-structured benchmark containing between 10,000 and 100,000 records for parody detection and analysis on social media, published by Ziyi510 in 2025. The data covers six distinct subsets including Reddit-Trump, Tiktok-Trump, and CampusLife in both English and Chinese.

CSVSize Categories10 Kn100 KLibrarypolarsModalitytextModalitytabularLibrarymlcroissantArxiv250216503LibrarydatasetsLibrarypandasRegionusLicensemit+1

0 views

Media & Communication

Top 250 IMDB Rated Movies

250 movies comprise this list of the highest-rated titles from the IMDB platform. The dataset is hosted on Kaggle, but its specific columns, source, and update history are not detailed in the provided metadata. Further details about the data's collection method and time period are unknown.

TabularRatingsMoviesImdbTop Lists+1

0 views

Media & Communication

Top 250 IMDB Movies

Top 250 IMDB Movies is a dataset published on Kaggle. It likely contains a ranked list of films based on user ratings from the Internet Movie Database. The specific columns, data volume, and creation details are not provided in the metadata.

TabularRatingsMoviesImdbEntertainment+1

0 views

Media & Communication

Steam Player Reviews for Sentiment Analysis, 20,000+ Reviews

20,000+ cleaned player reviews curated for sentiment analysis and text mining. The dataset likely contains textual feedback from users of the Steam gaming platform. Its author, organization, and last update date are unknown.

TextSentiment AnalysisText MiningGame ReviewsSteamNatural Language Processing+1

0 views

Media & Communication

SIB-200: Multilingual Topic Classification Dataset for 205 Languages

SIB-200 is a multilingual topic classification dataset covering 205 languages and dialects. It is based on the human-translated Flores-200 corpus, with topic annotations originally provided in English for categories like science/technology, travel, and politics. The dataset was created by the mteb organization and last updated in February 2026.

TextLanguage Creatorsexpert GeneratedMachine TranslationLanguageamhLanguageajpLanguageakaLanguageasmLanguageacmLanguageafrMultilingualitytranslatedLanguagearzBenchmarkLanguagearyLanguagealsHealthcareLanguagearbLanguageacqAnnotations Creatorsexpert AnnotatedTopic ClassificationLanguageastLarge ScaleLanguageapcLanguageaceTask Categoriestext ClassificationMultilingual TextLanguagears+1

0 views

Media & Communication

MovieLens 25M: Movie Ratings and Tags

Movie ratings and tags from the MovieLens platform, likely containing 25 million records. The dataset was published on huggingface by alitourani and was last updated on 2026-04-10. The specific columns, file formats, and license are currently unknown.

TabularRecommender SystemMovie RatingsCollaborative Filtering+1

0 views

Media & Communication

USA Sports and Hockey Business Contacts Database

A business-to-business contact list focused on the sports and hockey industries in the United States. The dataset title indicates it contains 20 leads. It is published on Kaggle, but the author, source organization, and collection methodology are unknown.

TabularSports BusinessB2b LeadsUsa Business+1

0 views

Media & Communication

Monthly Public Information Requests in Ukraine Under Access Law

Monthly reports on the receipt of information requests submitted under Ukraine's 'On Access to Public Information' law. The dataset is provided by the States site of Ukraine and was last updated on March 3, 2026. It is available in multiple formats including Word, Excel, and CSV.

TabularCSVGovernment TransparencyFreedom Of InformationUkrainePublic Administration+1

0 views

Media & Communication

TikTok Search Cards Data from Hugging Face

TikTok Search Cards data, published on the Hugging Face platform by author qiyang666. The dataset was last updated on April 10, 2026. Its specific content, scale, and structure require verification after download.

TabularSocial MediaUser BehaviorTiktokSearch Data+1

0 views

Media & Communication

PHI-Twin: 1MHz Wall Pressure Traces for Detonation Wave Stability

High-frequency wall pressure traces sampled at 1MHz for analyzing detonation wave stability. The data is intended for use in Remaining Useful Life prognosis models using CNN-BiLSTM architectures. The author, organization, and specific data volume are unknown.

Time SeriesPressure DataComputer VisionDetonation WavesCnn BilstmPrognostic Health Management+1

0 views

Media & Communication

Netflix Movies and TV Shows Catalog

Netflix Movies and TV Shows is a dataset from Kaggle. The dataset likely contains information about titles available on the Netflix streaming platform. The specific content, size, and origin details are not provided in the available metadata.

TabularMoviesStreamingTv ShowsEntertainment+1

0 views

Media & Communication

Fake News Detection Dataset in Bengali and English

A multilingual text dataset for fake news detection, containing content in both Bengali and English. It is hosted on Kaggle, but the author, organization, and creation details are unspecified. The dataset's size, specific contents, and collection methodology are not described in the available metadata.

TextBengali LanguageFake News DetectionEnglish LanguageMultilingual Text+1

0 views

Media & Communication

CK Dataset: Facial Expressions

Kaggle hosts the CK-Dataset-Face-Expressions. The dataset likely contains images of human faces depicting various emotional expressions. Its specific scale, creation details, and update history are not provided in the available metadata.

ImageComputer VisionFace ExpressionsFacial Recognition+1

0 views

Media & Communication

Celle Development Plan No. 4 Scheuen: Riding and Shooting Sports Area

A geospatial dataset detailing the development plan for the 'Riding and shooting sports area Scheuen' in the city of Celle. It is provided via a Web Feature Service (WFS) in the INSPIRE PLU data format version 4.0.1. The dataset is maintained by the Bundesamt für Kartographie und Geodäsie and was last updated on March 3, 2026.

GeospatialInspireCelleLand UseUrban Planning+1

0 views

Media & Communication

Southern Ocean Temperature and Salinity Measurements from Min Fang Bay, 1984-1985

Min Fang Bay in the Southern Ocean is the location for this dataset of temperature, salinity, sigma_t, and pressure measurements. The data was collected using a CTD instrument from an unknown platform and contributed by NOAA's National Centers for Environmental Information (NCEI). Measurements were taken between 1984 and 1985.

TabularTime SeriesCtd MeasurementsOceanographySalinitySouthern OceanTemperature+1

0 views

Media & Communication

Arctic Ocean CTD Measurements from the North Water Polynya Project, 1997-1999

1997 to 1999 data on temperature, salinity, conductivity, pressure, and transmissivity gathered by Conductivity-Temperature-Depth (CTD) instruments on Canadian Coast Guard Ships. The data were collected as part of the North Water Polynya project and are archived by NOAA's National Centers for Environmental Information (NCEI).

TabularTime SeriesCtd MeasurementsOceanographyPhysical OceanographyArctic Ocean+1

0 views

Media & Communication

Indian Ocean Temperature and Nutrient Profiles from 1992 to 1993

Temperature profile, pressure, nutrients, and biological data were collected from the R/V TYRO in the Indian Ocean. The data spans from May 1992 to February 1993. NOAA_NCEI is the authoritative organization for this dataset.

TabularTime SeriesOceanographyIndian OceanTemperature ProfileNutrientsCtd Casts+1

0 views

Media & Communication

News Emotion Analysis Across Daylight Conditions

Replication data for research on how daylight influences news emotion, used in Studies 2 through 4. The data was created by author Jiaxin Li for the project 'How daylight shapes news emotion'. It was last updated on April 3, 2026.

TabularMedia PsychologySocial SciencesSocial Science DataDaylight EffectsNews Emotion+1

0 views

Media & Communication

Review Checkpoints Data from May 2026

A dataset titled 'review-chekpoints--2026-05-21--13260-13260' was published on Kaggle. The title suggests it likely contains textual review data, possibly with checkpoint or versioning information. Metadata is minimal; the actual content, scale, and structure require verification after download.

TextText AnalysisReviewsCheckpoints+1

0 views

PreviousPage 351 of 550Next