DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

11,012 datasets

Reddit Subreddit Posts Scraped from Any Community

A collection of posts scraped from Reddit subreddits. The description mentions it includes title, author, score, comments, flair, and text content. The dataset's author, size, and last update date are unknown.

TextWeb ScrapingRedditSocial MediaText Data+1

0 views

Media & Communication

Reddit Historical Archive of Posts and Comments Spanning 10+ Years

Reddit posts and comments archived over a period of more than 10 years. The data is sourced via the PullPush service and includes full text content. The dataset is hosted on Kaggle, but specific details on volume, authorship, and licensing are not provided.

TextRedditSocial MediaHistorical ArchiveText Corpus+1

0 views

Media & Communication

Hacker News Who Is Hiring Job Listings Scraped from Monthly Threads

Hacker News Who Is Hiring Scraper contains structured job listings scraped from the monthly 'Who is Hiring?' threads on Hacker News. The dataset likely includes job titles, companies, and salary information posted by the community. It was scraped from the Hacker News platform, but the specific author, time range, and exact data volume are unknown.

TabularTech IndustryHacker NewsJob Listings+1

0 views

Media & Communication

Urdu Depression Severity Dataset: 4,000 Trilingual Twitter Posts

4,000 Twitter posts in Urdu, English, and Roman Urdu are labeled for depression severity. The dataset features 4-class severity labels verified by both large language models and human annotators. It was created in 2024-2025 and shared on Kaggle.

TextMental HealthSocial MediaMultilingual NlpDepression Detection+1

0 views

Media & Communication

ViNewsFact: Vietnamese News Dataset for Multimodal Fact-Checking

ViNewsFact is a Vietnamese multimodal evidence dataset designed for retrieval and fact-checking tasks. The dataset appears to contain news articles and likely contains associated multimodal evidence. The author, organization, and specific scale are unknown.

MultimodalNews ArticlesMultimodal DataFact CheckingVietnamese LanguageInformation Retrieval+1

0 views

Media & Communication

Hiligaynon News Articles from Hugging Face

Hiligaynon News Articles is a text dataset published on the Hugging Face platform by the user welyjesch. The dataset was last updated on 2026-04-09. Its content likely consists of news articles written in the Hiligaynon language, a major language of the Philippines.

TextNews ArticlesPhilippine LanguagesHiligaynonText Corpus+1

0 views

Media & Communication

Poltava City Council Department of Culture, Youth and Family Telephone Directory

A telephone directory for the Department of Culture, Youth and Family of the Poltava City Council, published on the States site of Ukraine. The dataset was last updated on 2026-03 06:26:00.022479 and is available in spreadsheet and document formats.

TabularUkraineContact DataGovernment DirectoryPublic AdministrationCULTURE+1

0 views

Media & Communication

Review Checkpoints: Model Evaluation Data

Review checkpoints likely contain evaluation metrics or performance data for machine learning models. The dataset is hosted on Kaggle, a platform for data science and machine learning projects. The specific content, size, and origin of the data are unknown from the provided metadata.

TabularMachine LearningModel EvaluationReview Checkpoints+1

0 views

Media & Communication

Turkish Podcast Merge Dataset

A collection of Turkish-language podcast content aggregated by the author 'yt-data-1'. The dataset was last updated on Hugging Face on April 3, 2026. The specific source, size, and content details are not provided in the available metadata.

TextAudioAudio ContentPodcastsTurkish Language+1

0 views

Media & Communication

ESWA Guardrail Evidence: Anonymous Review-Time Audit Logs

Anonymous review-time evidence packages for auditing LLM guardrails. The dataset appears to contain logs or records generated during the evaluation of large language models. Its provenance and scale are unspecified.

TextLlm SafetyAi GovernanceAudit Evidence+1

0 views

Media & Communication

Coral Skeleton Amino Acid Composition Under Varying pCO2 and Temperature

Amino acid composition data for 39 coral skeleton samples from four massive Porites spp. genotypes. The samples were cultured in an aquarium under controlled seawater pCO2 levels of 180, 260, 400, and 750 µatm and temperatures of 25 and 28°C. Data were collected between August 2020 and December 2022 by researchers including Celeste Kellock and Nicola Allison, with interpretation by a team from the British Geological Survey.

TabularOcean AcidificationAmino AcidLarge ScaleBiochemistryCoral SkeletonMarine Biology+1

0 views

Media & Communication

Clinical Reasoning Remediation: 20 Studies on Resident Deficits (2000-2024)

This scoping review synthesizes 20 studies published between 2000 and 2024 regarding remediation strategies for clinical reasoning deficits in medical residents. Created by Jovian Philip Swatan, the data maps identification methods, interventions, and institutional barriers extracted from seven major medical databases including PubMed and MEDLINE.

Medicine Health And Life Sciences+1

0 views

Media & Communication

Brevard County Moms for Liberty Book Reviews

Reviews compiled by the Brevard County Chapter of Moms for Liberty, a political advocacy group. The dataset was authored by Jennifer D McGrew and last updated in March 2026.

TextBook ReviewsSocial SciencesPolitical Advocacy+1

0 views

Media & Communication

WikiArt Impressionism: Paintings from the WikiArt Database

An image dataset likely containing paintings from the Impressionist art movement, sourced from the WikiArt online encyclopedia. The dataset is hosted on Kaggle, but its specific scale, composition, and creation details are not provided in the available metadata. Further verification is required to confirm the exact number of images, artist coverage, and image attributes.

ImageWikiartImpressionismArt History+1

0 views

Media & Communication

Cyberpunk 2077 Steam Reviews Stratified by Patch Era

49,600 English-language Steam reviews of the video game Cyberpunk 2077. The reviews are stratified by patch era, likely reflecting player sentiment across different game updates. The dataset's author, organization, and license are unknown.

TextPatch AnalysisEnglish LanguageVideo Game Reviews+1

0 views

Media & Communication

Dutch News Articles Published by NOS Since January 2010

Dutch news articles published by NOS, one of the biggest online news organizations in the Netherlands. The data was obtained by scraping the NOS website and includes articles from January 1, 2010 onward. Titles and content have been cleaned and normalized.

TextNews ArticlesDutch LanguageMediaText Corpus+1

0 views

Media & Communication

Cathepsin D and G Expression in Human Fracture Hematoma and Neutrophil Phenotypes

Encompassing expression data for cathepsin D (CTSD) and cathepsin G (CTSG) from 58 human fracture hematoma samples collected 0-19 days post-trauma and from neutrophils polarized into N0, N1, and N2 phenotypes from five human donors. It was created by Lu, Fangzhou to investigate the association of these cathepsins with fracture healing phases and specific neutrophil phenotypes. The data shows CTSD expression increased over healing time, while CTSG remained constant, and differential expression between N1 and N2 neutrophil phenotypes.

Medicine Health And Life Sciences+1

0 views

Media & Communication

Top-Rated Movies from Kaggle

Top-Rated_movies is a dataset published on the Kaggle platform. The dataset likely contains information about films with high user or critic ratings. Metadata such as column definitions, size, and license are currently unknown.

TabularRatingsRecommendationMoviesEntertainment+1

0 views

Media & Communication

Replication Code for Asset Pricing Research Paper

Replication code for the paper 'Asset Prices When Investors Underestimate Discount Rate Dynamics' published in the Review of Asset Pricing Studies. The package includes scripts and documentation but excludes proprietary data from CRSP, Compustat, and I/B/E/S due to licensing restrictions. Users must obtain the required data separately to execute the code.

Social Sciences+1

0 views

Media & Communication

Mobile Phone Specifications and User Reviews from 2019-2020

Hand-curated data on mobile phones includes specifications, user reviews, and revenue information for the period 2019 to 2020. The dataset was sourced from Kaggle, but the original author and organization are unknown. The total number of rows and specific file formats are not provided.

TabularSpecificationsMobile PhonesUser ReviewsRevenue Data+1

0 views

PreviousPage 355 of 550Next