DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

11,020 datasets

Reddit NSFW Writing Prompts from ShareGPT

Reddit NSFW writing prompts, likely sourced from ShareGPT conversations. The dataset was uploaded by author 'lipilipic' to the Hugging Face platform and was last updated on 2026-04-04 16:24:02. Its specific content, scale, and structure require verification after download.

TextRedditText GenerationWriting PromptsNsfw Content+1

0 views

Media & Communication

Movies Dataset from Kaggle

Movies is a dataset hosted on the Kaggle platform. The dataset's specific content, size, and provenance are not detailed in the available metadata. Users must download the data to verify its scope, features, and potential applications.

TabularMoviesMediaEntertainment+1

0 views

Media & Communication

Sample News Dataset

A dataset of news content published on Kaggle. The title suggests it likely contains textual news articles or headlines. The author, organization, and specific temporal coverage are unknown.

TextNewsMedia ContentText Data+1

0 views

Media & Communication

Asian Facial Expressions Dataset for Emotion Recognition

A high-fidelity facial expression dataset focused on Asian demographics. The dataset is hosted on Kaggle, but details about its size, creation date, and authorship are not provided. Its description emphasizes demographic focus and high-fidelity imagery for the Asian population.

ImageEmotion RecognitionComputer VisionAsian DemographicsFacial Expressions+1

0 views

Media & Communication

Cultural Capital Indicators for Renewable Energy Case Study Counties

Delivering public data on cultural capital for selected counties designated as the most renewable in eight U.S. economic regions. It assesses community resources like libraries, religious proclivities, ethnic heritage, language use, festivals, museums, symbolism, and education. The dataset was authored by Michael Petersen and is hosted by Harvard Dataverse.

Arts And HumanitiesSocial SciencesLibraries Renewable Energy Museums Festivals EducaEarth and Environmental Sciences+1

0 views

Media & Communication

Crystal Math Preview: 1,000-10,000 Olympiad and Competition Math Problems

Crystal Math Preview is a collection of 1,000 to 10,000 mathematical reasoning problems released by ycchen in February 2026 to accompany a research preprint. The dataset focuses on olympiad and competition-level mathematics, featuring specialized configurations derived from high-reasoning budget rollouts. It serves as an early-access version of a larger planned release for training and evaluating mathematical reasoning models.

ParquetSize Categories1 Kn10 KTask Categoriestext GenerationLicenseotherLibrarypolarsLanguageenOlympiadModalitytextModalitydocumentModalitytabularLibrarymlcroissantLibrarydatasetsCompetition MathLibrarypandasRlvrRegionusReasoningMath+1

0 views

Media & Communication

Finetuned Steam Reviews for Text Analysis

Finetuned-steam-reviews is a text dataset sourced from Kaggle. The dataset likely contains user reviews from the Steam gaming platform, potentially processed or annotated for machine learning tasks. Its specific size, author, and update history are not provided in the available metadata.

TextSentiment AnalysisGame ReviewsText Data+1

0 views

Media & Communication

Predicted Land Cover Change Scenarios for 127 Welsh Upland Catchments

Environmental Information Data Centre provides predicted outcomes for land use change scenarios across 127 sub-catchments in upland Wales. The data project maximum and minimum change for 10 land-cover types based on factors like agricultural land quality and ownership. This work was part of the NERC-funded DURESS project, using underlying mapping data from 1998-2007.

CatchmentsDiversity in Upland River Ecosystem Service SustaiBiodiversity Ecosystem Service Sustainability BUplandsLand CoverScenarios+1

1 views

Media & Communication

wav2vec2: Base Speech Recognition Model

wav2vec2 is a machine learning model for speech recognition. The dataset likely contains audio data and corresponding model weights or training artifacts. It is published on Kaggle under the identifier 'facebook/wav2vec2-base'.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Media & Communication

Top Rated Movies from The Movie Database

Top Rated Movies data was collected using the TMDB API. The dataset likely contains information on films with high user ratings. The specific number of rows, columns, and last update date are unknown.

TabularMoviesTmdbEntertainment+1

0 views

Media & Communication

Reviews Dataset from Kaggle

reviews_dataset is a text dataset hosted on Kaggle. The dataset likely contains user-generated review content. Its specific size, origin, and detailed contents are not described in the available metadata.

TextReviewsSentiment AnalysisText Data+1

0 views

Media & Communication

Medical News Articles Collection

medical_news_vi is a dataset of medical news articles published on Kaggle. The dataset's specific size, source, and time period are not detailed in the available metadata. Its content likely contains text from medical news sources.

TextHealthcare TextNews ArticlesMedical News+1

0 views

Media & Communication

Sentimentanalysdata-facebook/nlbb: Social Media Sentiment Data

Sentimentanalysdata-facebook/nlbb is a dataset published on Kaggle. The title suggests it contains data from Facebook for sentiment analysis. The dataset's specific content, size, and creation details require verification after download.

TextSocial MediaSentiment AnalysisNatural Language Processing+1

0 views

Media & Communication

Steam Game Reviews for Fine-Tuning Language Models

A dataset of Steam game reviews intended for fine-tuning models. The data was published on Kaggle. The specific volume, time range, and collection methodology are unknown from the provided metadata.

TextText AnalysisSentiment AnalysisGame Reviews+1

0 views

Media & Communication

In-Vehicle Display Icons Literature Review: 200+ Articles and 100+ Websites

A literature review by Cher Carney for Battelle's guideline development project, analyzing over 200 articles, several books, and more than 100 websites on in-vehicle information system (IVIS) symbols. The report synthesizes findings on icon design, standards, and evaluation methods, concluding with five key points about the state of IVIS icon development. It includes 7 appendices, 88 figures, and 7 tables across 247 pages.

TextHumancomputer InteractionComputer ScienceInformation DisplayWorld Wide WebLiterature ReviewBenchmarkIcon DesignComputer Graphics ImagesAutomotive UiArtIn Vehicle SystemsVisual ArtsHuman Computer Interaction+1

0 views

Media & Communication

COVID-19 Fake News Headlines with Binary Fact-Check Labels

A collection of COVID-19-related headlines and claims shared across the internet, each labeled for veracity. The dataset was published by Sumit Banik in response to research demand for a combined fake news resource. It contains a binary outcome column where 0 indicates a fake headline and 1 indicates a true one.

TextTabularMedicineInternet Privacy2019 20 Coronavirus OutbreakComputer ScienceCoronavirus Disease 2019 Covid 19MisinformationBiologyFake NewsVirologyText ClassificationCovid 19GeographySevere Acute Respiratory Syndrome Coronavirus 2 SaOutbreakInfectious Disease Medical Specialty+1

0 views

Media & Communication

Social Media Text Classification Review and Analysis

A review paper discussing text classification techniques for social media data. The paper, authored by Iosr Journals, examines data from platforms like Facebook, Twitter, LinkedIn, and YouTube, which includes user sentiments and opinions. It compares different machine learning classifiers for extracting meaningful information from informal, unstructured text.

TextMachine LearningInternet PrivacyMicrobloggingComputer ScienceData SciencePsychologySocial MediaFeelingSentiment AnalysisWorld Wide WebText ClassificationSentenceArtificial IntelligenceNatural Language ProcessingInformation Retrieval+1

0 views

Media & Communication

MUSDB18-HQ: 150 Uncompressed Music Tracks for Source Separation

MUSDB18-HQ is an uncompressed audio dataset containing 150 full-track songs across different styles, created by Zafar Rafii et al. in 2019. It provides stereo mixtures and isolated sources (vocals, bass, drums, other) for 100 training and 50 test songs, encoded as 44.1kHz WAV files. The dataset serves as a reference for designing and evaluating source separation algorithms and was used in the SiSEC 2018 campaign.

AudioMachine LearningUncompressed VideoProgramming LanguageComputer ScienceGeologyBenchmarkSource SeparationSignal Processing+1

0 views

Media & Communication

Review Checkpoints: Model Evaluation Data from Kaggle

Kaggle hosts a dataset titled 'review-chekpoints--2026-05-07--13246-13246'. The title suggests it likely contains evaluation data or metrics for machine learning model checkpoints. No further metadata on size, source, or specific content is available.

TabularMachine LearningModel EvaluationReview Checkpoints+1

0 views

Media & Communication

Several Medical News Articles

A collection of medical news articles. The dataset is hosted on Kaggle, but its specific source, size, and creation date are unknown. Columns and sample data are not provided in the metadata.

TextHealthcare TextNews ArticlesMedical News+1

0 views

PreviousPage 375 of 551Next