DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

10,999 datasets

Pennsylvania Frontier Negotiations and Cultural Encounters, 1680s-1750s

James H. Merrell's account details the lives and work of cultural go-betweens on the Pennsylvania frontier. The text covers the period from the Quaker colony's founding in the 1680s into the 1750s, examining efforts to maintain peace between European colonists and Native Americans. It reflects on wilderness meanings and the eventual failure of diplomacy leading to war after 1750.

TextFrontierHistoryEcologyBridge Graph TheoryWildernessEnvironmental EthicsArchaeologyLawEthnologyDiplomacySociologyFur TradeEconomic HistoryPolitical ScienceCULTURAL STUDIESPolitics+1

0 views

Media & Communication

Latino Acculturation Orientations: 4,785 Records from the 2006 LNS

Jessala Grijalva developed this replication package in 2026, applying Gaussian Mixture Model clustering to 4,785 records from the 2006 Latino National Survey. The data identifies four distinct acculturation orientations—Culture Affirming, Assimilationist, Demicultural, and Bicultural—using a bootstrap-validated inferential framework. It includes the full R/Quarto analysis pipeline and processed data artifacts for two political science manuscripts.

Latino National SurveySocial SciencesLatino political behaviorAssimilationCluster AnalysisBootstrap ValidationAcculturation+1

0 views

Media & Communication

Big Cypress National Preserve Water Quality GIS Layers at 1:100,000 Scale

Small-scale GIS data layers compiled by the National Park Service for a Baseline Water Quality Data Inventory and Analysis Report. The layers depict locations of water quality monitoring stations, industrial discharges, drinking intakes, water gages, and water impoundments within Big Cypress National Preserve. Data was last updated on March 4, 2026.

GeospatialHydrologyWaterBig Cypress National PreserveBenchmarkHydrographyWater QualityNational ParkOchopeeFlorida+1

0 views

Media & Communication

Chaco Culture National Historical Park Water Quality GIS Layers

Small-scale GIS data layers compiled by the National Park Service for a Baseline Water Quality Data Inventory and Analysis Report. The layers were used to map locations of water quality monitoring stations, industrial discharges, drinking intakes, gages, and impoundments based on EPA databases. Data includes features like roads, hydrography, and political boundaries, generally at a 1:100,000 scale.

GeospatialHydrologyNmWaterNageeziBenchmarkHydrographyWater QualityNational ParkChaco Culture National Historical Park+1

0 views

Media & Communication

Infini News Corpus: Multilingual News Articles from 2021-2025

INFINI-NEWS Corpus is a large-scale multilingual collection of news articles extracted from Common Crawl News archives. The dataset, created by author 'ruggsea', contains articles from 2021 to 2025, with partial statistics showing 242 GB of data for 2021 and 356 GB for 2022. It was last updated on the platform in February 2026.

TextMultilingualComputational JournalismMedia StudiesNews CorpusLarge ScaleNatural Language ProcessingMultilingual News+1

0 views

Media & Communication

Seattle Soccer Fields Maintained by Parks and Recreation

Seattle Parks and Recreation maintains a dataset of soccer fields, published as a hosted feature layer from the DPR.AthleticsFields feature class. The data is filtered using a definition query (WHERE SOCCER > 0) and is updated on a weekly refresh cycle. The specific number of fields, rows, and columns is not provided in the input.

FieldsSeattle Gis Open DataParksSoccerSprCommon Data LayersSports+1

0 views

Media & Communication

Review Checkpoints: Model Evaluation Data for Machine Learning

Kaggle dataset titled 'review-chekpoints--2026-05-29--13268-13268'. The dataset's content likely relates to checkpoints or evaluations for machine learning models, as suggested by its platform tags. Metadata is minimal; the actual data content and structure require verification after download.

TabularMachine LearningModel EvaluationReview Checkpoints+1

0 views

Media & Communication

Seattle Football Fields Inventory by Parks Department

Seattle Parks and Recreation maintains this dataset of football fields, filtered from a broader athletics fields feature class. It is updated weekly, though the specific number of field records is not provided. The data includes geographic features and is available in multiple formats including CSV, GeoJSON, and KML.

FieldsSeattle Gis Open DataFootballParksSprCommon Data LayersSports+1

0 views

Media & Communication

Seattle Parks Baseball And Softball Field Locations

Seattle Parks and Recreation maintains a list of baseball and softball fields. The data is filtered from a larger athletics feature class using the query 'WHERE BASEBALL > 0' and is updated weekly.

FieldsSoftballSeattle Gis Open DataParksSprBaseballCommon Data LayersSports+1

0 views

Media & Communication

Wine Reviews with Tasting Descriptions and Origin Features

Wine reviews from sommeliers, likely containing text descriptions for tasting notes and structured features like price and country of origin. The dataset was originally collected from WineEnthusiast and compiled by authors Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J. Smola for a benchmarking paper on multimodal AutoML.

TabularConsumer ReviewsWine ReviewsReviewsFood And BeverageQuality ScoringTasting NotesFood BeverageWine+1

0 views

Media & Communication

Wine Reviews with Tasting Descriptions, Price, and Country of Origin

WineEnthusiast reviews collected for a machine learning benchmark. The dataset likely contains tasting descriptions from sommeliers and features like price and country-of-origin. Authors Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J. Smola published the dataset in a 2021 arXiv paper on multimodal AutoML.

TabularMultimodalConsumer ReviewsWine ReviewsReviewsFood And BeverageQuality ScoringTasting NotesFood BeverageWineTabular Text+1

0 views

Media & Communication

FOMC Press Releases: Federal Reserve Monetary Policy Statements

FOMC press releases published on Kaggle. The dataset likely contains official statements and announcements from the Federal Open Market Committee. The specific number of documents, time range, and original source are not detailed in the provided metadata.

TextMonetary PolicyFederal ReserveText Data+1

0 views

Media & Communication

California Independent Medical Review Determinations Since 2001

Independent Medical Review (IMR) decisions from the California Department of Managed Health Care, covering all determinations administered since January 1, 2001. The dataset documents reviews of health plan denials for services deemed not medically necessary, experimental, or non-urgent.

TabularEnglishZIPCSVHealth InsuranceHealthcare PolicyPatient RightsHealthcareUnited StatesMedical Review+1

0 views

Media & Communication

California Health Plan Premium Rate Filings Since 2011

California Department of Managed Health Care data contains all proposed health plan premium rate filings submitted since January 1, 2011. The dataset supports public transparency and accountability in health insurance rate setting. Row and column counts are not specified in the input.

TabularEnglishZIPCSVHealth InsurancePublic PolicyHealthcareUnited StatesPremium Rates+1

0 views

Media & Communication

Marine CO2 Seep Experiment Technology Review

June 2013 review details four submarine geolocation technologies for a 2012 CO2 release experiment offshore Oban, Scotland. The QICS1 experiment involved 200 instrument deployments, collection of 1,300 samples, and placement of 24 seabed indicator cages. The report compares audio (acoustic) and visual (photography, video) techniques for locating CO2 bubble streams and equipment.

AudioVideoUnderwater AcousticsUnited KingdomMarine GeologyCarbon Capture+1

0 views

Media & Communication

CO2/Brine Relative Permeability and Residual Trapping Data

British Geological Survey research analyzes residual saturation trapping of CO2 in sandstone reservoirs. Experimental results indicate 13–92% of injected CO2 can be residually trapped, providing evidence for storage security assessments. The data supports modeling of leakage event probabilities and financial mechanisms for carbon capture and storage projects.

TextTabularSandstoneBrineFinancePermeabilityCarbon CaptureGeological Storage+1

0 views

Media & Communication

Movies Dataset from Kaggle

Kaggle hosts a dataset titled 'movies'. The dataset's content likely pertains to films, but specific details such as the number of records, included features, and its origin are not provided in the available metadata. The platform tags suggest it is structured as tabular data.

TabularMoviesEntertainment+1

0 views

Media & Communication

ThaiSafetyBench: 1,889 Malicious Prompts for Thai LLM Safety Evaluation

ThaiSafetyBench contains 1,889 malicious Thai-language prompts developed by typhoon-ai in 2026 to evaluate the safety of large language models. The collection combines translated global safety benchmarks with original prompts specifically designed to test culturally specific attack vectors unique to the Thai context.

ParquetSize Categories1 Kn10 KLibrarypolarsTask Categoriesquestion AnsweringLanguagethModalitytextArxiv260304992LibrarymlcroissantLibrarydatasetsLibrarypandasRegionusLicenseapache 20+1

0 views

Media & Communication

VietNews-Summarizer: Vietnamese News Articles for Summarization

VietNews-Summarizer is a dataset published on Kaggle. The title suggests it likely contains Vietnamese-language news articles paired with summaries. The dataset's creator, size, and specific contents are not detailed in the available metadata.

TextVietnamese NewsNatural Language ProcessingText Summarization+1

0 views

Media & Communication

Corporate Cyber Threat OSINT from Twitter and LinkedIn

Corporate Cyber Threat OSINT: Twitter & LinkedIn is a dataset likely containing open-source intelligence data gathered from social media platforms. The dataset is hosted on Kaggle, but its specific content, size, and creation details are not provided. Its columns, sample data, and update history are unknown.

TextCybersecuritySocial MediaOsintCorporate Threat Intelligence+1

0 views

PreviousPage 337 of 550Next