DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

10,984 datasets

Community Solar Projects in the United States

This dataset lists community solar projects identified from various sources as of Spring 2018. It includes project attributes such as State, Service Territory, and System Capacity. The database is maintained by the Department of Energy's National Renewable Energy Laboratory (NREL).

StateService TerritoryCcsaSystem CapacityCommunity SolarCoalition For Community Solar AccessSolar ArraysUtilities+1

0 views

Media & Communication

Conditional Alcohol Label Approvals from Missouri ATC

Missouri's Alcohol and Tobacco Control (ATC) dataset of conditionally approved product labels submitted for review. The data includes labels submitted over five business days prior to the current date, which are in a submitted, in-review, or conditionally approved status. It is published by data.mo.gov and was last updated on 2026-02-24.

TabularCSVXMLJSONAtcAlcoholProduct LabelsMaltAlcohol RegulationConditionalBusiness LicensingLabelsApprovalGovernment ApprovalLiquorProductsWine+1

0 views

Media & Communication

New Zealand Aerial and Satellite Imagery with Historical Coverage

A collection of New Zealand's publicly owned aerial and satellite imagery, ranging from 5cm resolution in urban areas to lower-resolution full national coverage. The dataset includes historical imagery scanned from film, orthorectified, and georeferenced, provided as Cloud Optimised GeoTIFFs with STAC metadata. It is published by Toitū Te Whenua Land Information New Zealand under a CC-BY-4.0 license.

ImageGeospatialLandscape ChangeSatellite ImageryComputer VisionEarth ObservationAerial ImageryCogStac+1

0 views

Media & Communication

World Prison Facility Locations with Source Provenance and Review Flags

Global prison facility locations with source provenance and review flags. The dataset is hosted on Kaggle, but the author, organization, and specific creation details are unknown. The last update date and data volume are also unspecified.

TabularGeospatial🌍 GlobalPrison FacilitiesCriminal Justice+1

0 views

Media & Communication

CCNews: 600 Million Multilingual News Articles from 2016 to 2024

600 million news articles from the Common Crawl archive, processed from 2016 to June 2024. The data has been cleaned, deduplicated, and includes language detection for articles in over 100 languages. This dataset was created by kareenamehta and is hosted on Hugging Face.

TextMultilingualTask Categoriestext GenerationLanguagecyLanguagearTask Categoriesquestion AnsweringLanguagebsLanguagebrLanguagecaNews ArticlesLanguagedaLanguageelLanguagebnText GenerationWeb CrawlLanguagebgLanguagecsLanguagebeLanguagemultilingualLanguageazLanguageasLarge ScaleNatural Language ProcessingLanguageamLanguageafTask Categoriestext ClassificationLanguagedeText Corpus+1

0 views

Media & Communication

Northern Hemisphere Daily Atmospheric Analyses from 1963 to 1972

NMC operationally produced daily gridded analyses for the Northern Hemisphere from August 1963 to December 1972. The dataset includes parameters like upper-level winds, surface temperature, sea-level pressure, tropopause pressure and temperature, and 500mb relative humidity. Data is structured on a 47x51 polar-stereographic grid centered on the North Pole.

Time SeriesGeospatialGridded ObservationsClimate ResearchAtmospheric ScienceWeather AnalysisPolar Data+1

0 views

Media & Communication

Bangla News Dataset

Banglanewsmm-dataset is a text corpus hosted on Kaggle. The dataset's title suggests it contains news content in the Bangla language. Specific details regarding its size, collection method, and authorship are unavailable from the provided metadata.

TextNewsBanglaMedia+1

0 views

Media & Communication

Netflix Movies and TV Shows Content Catalog

Netflix content data includes movies and TV shows with associated ratings and genres. The dataset likely contains information on popularity and content types for analysis. Its origin and specific size are not detailed in the provided description.

TabularContent AnalysisMoviesTv ShowsEntertainment+1

0 views

Media & Communication

Fake News Detection Dataset for Text Classification

Kaggle hosts a dataset titled 'Fake-News-Detection'. The dataset likely contains text articles or statements labeled for veracity. Its specific size, origin, and creation date are unknown from the provided metadata.

TextMedia AnalysisFake NewsText ClassificationNatural Language Processing+1

0 views

Media & Communication

IMDB Movie Data

IMDB_Movie.csv is a dataset of movie information, likely sourced from the Internet Movie Database. The dataset's specific contents, such as columns for titles, ratings, or cast, are inferred from its name. It was published on Kaggle, but details on its creation, size, and update history are not provided.

TabularFilm RatingsMoviesImdbEntertainment+1

0 views

Media & Communication

PHEME: Twitter Rumours and Non-Rumours from Five Breaking News Events

PHEME dataset contains a collection of Twitter rumours and non-rumours posted during five breaking news events. The dataset includes 1,972 rumours and 3,830 non-rumours across events like the Charlie Hebdo attack, Ferguson unrest, and Germanwings Crash. It was created by Arkaitz Zubiaga for the paper 'Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media'.

TextMachine LearningHistoryTwitterComputer ScienceClassifier UmlBreaking NewsSocial MediaWorld Wide WebRumour DetectionText ClassificationArtificial IntelligencePrecision And RecallFlaggingConditional Random Field+1

0 views

Media & Communication

News Article on Hilton and Governor Newsom in California

California is the likely geographic focus of this dataset. The title suggests it contains text data related to news coverage of political interactions, specifically involving the Hilton entity and Governor Gavin Newsom. The dataset is hosted on Kaggle, but its specific content, size, and origin are not detailed in the provided metadata.

TextNewsCaliforniaMediaPolitics+1

0 views

Media & Communication

Clinical Outcomes and Radiation Exposure in Endoscopic Spinal Surgery

A 2026 systematic review and meta-analysis by Jianbin Guan compares unilateral biportal endoscopy (UBE) and percutaneous transforaminal endoscopic discectomy (PTED) for treating far lateral lumbar disc herniation. The dataset contains aggregated results from multiple clinical studies, focusing on efficacy, safety, and radiation exposure metrics. It was published in the Jianbin Guan Dataverse.

TabularSystematic ReviewMeta AnalysisClinical OutcomesHealthcareMedical ResearchSpinal Surgery+1

0 views

Media & Communication

BenchPreS: A Benchmark for Personalized Preference Selectivity in LLMs

BenchPreS is a benchmark for evaluating persistent-memory large language models. It pairs 10 user profiles with 39 recipient-task contexts across five formal communication domains. The dataset was created by sangyon and last updated on March 20, 2026.

TextOPTIMIZED-PARQUETParquetLlm BenchmarkTask Categoriestext GenerationLibrarypolarsLanguageenText GenerationSize Categoriesn1 KModalitytextLibrarymlcroissantArxiv260316557LibrarydatasetsBenchmarkLibrarypandasLicensecc By Nc 40CommunicationContext AwarenessRegionus+1

0 views

Media & Communication

Afghanistan News Articles, Latest Collection

Afghanistan news articles collected from unspecified sources. The dataset is hosted on Kaggle, but the author, organization, and specific collection method are unknown. Its size, format, and exact publication date are also unspecified.

TextAfghanistanCurrent EventsNews+1

0 views

Media & Communication

Indian Company Reviews Dataset from Kaggle

A dataset of reviews for companies based in India. It is hosted on the Kaggle platform. The specific source, collection method, and volume of data are not detailed in the available metadata.

Tabular🇮🇳 IndiaCompany ReviewsBusiness Sentiment+1

0 views

Media & Communication

Persian News Articles for Text Classification

Persian news articles likely organized for classification tasks. The dataset is hosted on Kaggle, but its specific size, creation date, and authorship are not detailed in the provided metadata. Columns and sample data are unknown, making a full assessment impossible without downloading the files.

TextNews ArticlesPersian LanguageText ClassificationNatural Language Processing+1

0 views

Media & Communication

Reddit Dataset 1: Social Media Posts and Comments

A dataset sourced from the Reddit platform, published on Kaggle. The specific content, scale, and collection methodology are not detailed in the available metadata. Further verification after download is required to confirm the dataset's exact composition and potential applications.

TextRedditSocial MediaText Data+1

0 views

Media & Communication

Movies Dataset from Kaggle

A dataset related to movies, published on the Kaggle platform. The specific contents, scale, and origin are not detailed in the available metadata. Further details such as the number of records, specific features, and creation date require verification after accessing the data.

TabularFilm DataMoviesEntertainment+1

0 views

Media & Communication

Archaeological Linen Textile Analysis Data

Date from a textile analysis of linen archaeological textiles. The dataset is authored by Payton Becker and was last updated in March 2026. It is a small dataset of 17.8 KB with an unknown number of rows and columns.

EgyptTextile ArchiveColonialism History+1

0 views

PreviousPage 322 of 549Next