DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

10,967 datasets

AI Damage Claim Reviewer Dataset

A dataset likely containing text data related to the review of damage claims, potentially for insurance or property assessment. It was published on Kaggle, but its specific origin, size, and creation date are unknown. The dataset's content and structure must be verified after download.

TextDamage ClaimsReviewArtificial IntelligenceInsuranceNatural Language Processing+1

0 views

Media & Communication

Top Rated Movie Dataset from Kaggle

Top Rated Movie Dataset is a collection of movie information and ratings published on Kaggle. The dataset's specific size, columns, and creation date are unknown. Its content likely includes titles and user or critic ratings.

TabularRatingsMoviesEntertainment+1

0 views

Media & Communication

StrataSynth Cross-Cultural Negotiation Benchmark: B2B Deals in GB and US

A benchmark dataset for cross-cultural negotiation analysis. It contains records of the same B2B deal and the same negotiators, with the country context changed between Great Britain and the United States. The dataset appears to be designed for controlled comparison of negotiation behaviors across these two cultural settings.

TextB2b DealsBenchmark DataBehavioral AnalysisBenchmarkCross Cultural Negotiation+1

0 views

Media & Communication

Halftide Rocks Air Pressure Time Series 2000-2009

Australian Ocean Data Network provides air pressure measurements from the Halftide Rocks AWS weather station. The dataset covers a nine-year period from 26 July 2000 to 19 December 2009, collected by deployed weather sensors.

Time Series🇦🇺 AustraliaAir PressureWeather StationMeteorology+1

0 views

Media & Communication

SlopReview: AI vs Human Writing Distinction Dataset

A curated dataset for training models to distinguish between AI-generated 'slop' and quality human writing. It was created by feeding 200 prompts from ChaoticNeutrals/Reddit-SFW-Writing_Prompts_ShareGPT into various LLMs and comparing responses. The dataset was authored by DrRiceIO7 and last updated on March 24, 2026.

TextOPTIMIZED-PARQUETParquetSize Categories10 Kn100 KLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasText ClassificationAi DetectionRegionusWriting QualitySlop DetectionSynthetic+1

0 views

Media & Communication

NASDAQ Stock Data With News And Fundamentals

A 10-year dataset builder for NASDAQ market data, created by HaiwenWang. It includes daily and hourly OHLCV data, with optional news and ticker-level fundamental data attachments. The dataset page was last updated in April 2026.

TabularTime SeriesStock DataFundamental DataNews DataOhlcvFinancial Markets+1

0 views

Media & Communication

Mashable News Articles with Text and Auxiliary Features for Channel Prediction

Mashable.com news articles are used to predict their publishing channel based on title text and auxiliary numerical features. The dataset originates from the UCI Machine Learning Repository's Online News Popularity collection and was referenced in a 2021 arXiv preprint benchmarking multimodal AutoML. Authors include Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J. Smola.

TextTabularMultimodalMachine LearningContent AnalysisNews ClassificationText AnalysisMultimodal FeaturesText ClassificationNews MediaCommunication+1

0 views

Media & Communication

Mashable News Article Popularity Prediction for AutoML Benchmarking

A challenging tabular dataset for predicting the log-scaled popularity of Mashable.com news articles based on title text and auxiliary numerical features. The dataset, sourced from a 2021 arXiv paper, is intended as a difficult benchmark for AutoML systems. Authors include Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J. Smola.

TabularMachine LearningNews ArticlesPopularity PredictionSocial MediaBenchmarkTabular DataAutoml BenchmarkOnline MediaNews PopularityText Features+1

0 views

Media & Communication

Mashable News Articles with Text and Auxiliary Features for Channel Prediction

The news_channel dataset predicts which Mashable.com news category an article belongs to based on its title text and auxiliary numerical features. The original data was collected for the Online News Popularity dataset hosted by the UCI Machine Learning Repository. This version was referenced in the paper 'Benchmarking multimodal automl for tabular data with text fields' by Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J. Smola.

TextTabularMultimodalContent AnalysisNews ClassificationText AnalysisMultimodal FeaturesText ClassificationNews MediaCommunication+1

0 views

Media & Communication

Google Review Policy Violation Audit Benchmarks

Google review policy violation audit data by BHMarketer.ai. The dataset likely contains records of reviews flagged for violating Google's policies. The specific scope, size, and collection period are not detailed.

TabularPolicy ViolationGoogle ReviewsContent ModerationAudit Data+1

0 views

Media & Communication

IMDB Movie Reviews for Sentiment Analysis

IMDB reviews likely contain user-generated text for movies. The dataset is hosted on Kaggle, a platform for data science competitions and projects. Specific details such as the number of reviews, time range, and collection method are not provided in the available metadata.

TextSentiment AnalysisMovie ReviewsNatural Language Processing+1

0 views

Media & Communication

Water Corporation Sewer Gravity Pipes Network

Water Corporation sewer pipes with no pumps or pressure systems connected. The dataset includes features such as gravity flow, wastewater type, and asset ownership. It was last updated by the Water Corporation in March 2026.

GravityWastewaterWater CorporationSewer+1

0 views

Media & Communication

Reddit 10K Hye: Armenian-Language Reddit Posts for Embedding Training

Metric-AI's Reddit Armenian Dataset is a subset of Reddit content containing titles and bodies translated into Armenian. The dataset was created using the Gemma-2-27B-it model and is intended for training Armenian text embeddings models. It was last updated on March 25, 2026.

TextRedditNatural Language ProcessingArmenian LanguageSynthetic DataSyntheticText Translation+1

0 views

Media & Communication

Global Company Ratings and Employee Reviews Across Sectors

Global Company Ratings & Employee Reviews contains employee ratings, sentiment tags, and workplace culture metrics. The dataset appears to be sourced from Kaggle, though the original author and organization are unknown. The last update date and specific data volume are not provided.

TabularEmployee SentimentWorkplace CultureCompany RatingsNatural Language Processing+1

0 views

Media & Communication

Sportsfields in Moreton Bay Region

2026-03-23 updated collection of sportsfields from the City of Moreton Bay's Data Hub. The dataset, created by moretonbaygis, is available in multiple formats including XLSX, CSV, and GeoJSON.

Autogen+1

0 views

Media & Communication

Facebook Scraper Data: Social Media Content

Facebook-scraper_data likely contains information extracted from Facebook's public pages or groups. The dataset is hosted on Kaggle, but its specific contents, size, and creation details are unknown. Columns, sample data, and authorship information are not provided in the metadata.

TabularWeb ScrapingSocial MediaFacebook+1

0 views

Media & Communication

Fake Reviews Dataset

fake-reviews-dataset is a text dataset hosted on Kaggle. The dataset likely contains examples of fake reviews, which could be used for training models to detect deceptive or inauthentic text. Its specific size, origin, and creation date are unknown.

TextSentiment AnalysisNatural Language ProcessingFake Reviews+1

0 views

Media & Communication

Trending Movies from The Movie Database

Trending movies data sourced from The Movie Database (TMDb) and published on Kaggle. The dataset's specific size, columns, and update frequency are not detailed in the provided metadata. Users should verify the actual content and structure after download.

0 views

Media & Communication

GPC Compressive Strength Dataset

A dataset concerning compressive strength, likely related to materials such as concrete or composites. It is hosted on Kaggle, but its author, creation date, and specific scope are not detailed in the provided metadata. The actual data content, including the number of records and specific features, requires verification after download.

TabularEngineering PropertiesCompressive strengthConcreteMaterials Science+1

0 views

Media & Communication

Cleaned Reddit Pushshift Posts and Submissions for Analytics

A cleaned and structured version of the raw Reddit Pushshift dump, transformed into columnar Parquet files. The dataset includes both Reddit submissions and comments, prepared by the author 'anhchanghoangsg'. It was last updated on March 23, 2026.

TextTabularRedditSocial MediaOnline DiscussionRegionusLarge ScaleText Corpus+1

0 views

PreviousPage 302 of 548Next