DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Media & Communication Datasets | DataSalon

All Categories

📺

Media & Communication

News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation

10,980 datasets

Video Saliency Challenge: Audio-Visual Mouse Fixation Data for 2000 FullHD Videos

2000 FullHD videos with audio tracks and mouse fixation data from over 5000 observers form a novel audio-visual saliency dataset. The collection includes diverse content such as movies, sports, and live videos, with a mean duration of 18 seconds. This dataset was created by ANDRYHA for the CVPR-NTIRE Video Saliency Prediction Challenge 2026.

AudioVideoMultimodalSize Categories1 Kn10 KMouse TrackingAudio VisualLicensecc By 40Computer VisionRegionusLarge ScaleHuman AttentionVideo Saliency+1

0 views

Media & Communication

UK Northern Hemisphere Sea-Level Pressure and 500mb Height Grids, 1944-1946

U.K. Northern Hemisphere sea-level pressure and 500mb geopotential height data from the United Kingdom, processed by the DSS. The dataset contains daily and monthly gridded data, covering the period from December 1944 to December 1946.

Time SeriesGeospatialGridded DataAtmospheric ScienceHistorical ClimateWeather DataNorthern Hemisphere+1

0 views

Media & Communication

TCM-90: Tropical Cyclone Motion Analyses at 50 km Resolution

The Tropical Cyclone Motion (TCM-90) Research Initiative analyses were produced using a four-dimensional data assimilation system at the National Meteorological Center. The horizontal resolution is 50 km, with analyses from 1000 mb to 100 mb at 50 mb intervals. Special surface analyses include surface pressure, latent and sensible heat fluxes, and sea-surface temperatures.

Time SeriesGeospatialTropical CycloneWeather AnalysisMeteorologyOcean AtmosphereForecasting+1

0 views

Media & Communication

City of Bloomington Parks and Recreation Facilities Map Data

City of Bloomington, Indiana, provides a geospatial dataset of parks and recreation facilities owned or maintained by the city. The data layer includes features such as neighborhood parks, community parks, nature preserves, recreational sports parks, and cemeteries. It was last updated on March 8, 2026.

GeospatialCSVXMLJSONCity PlanningParks And RecreationPublic FacilitiesParksMaps+1

0 views

Media & Communication

Unitywater Sewer Infrastructure Data for Moreton Bay

Unitywater Sewer Infrastructure data from the City of Moreton Bay's Data Hub. The dataset includes sewer pressure main information and was last updated in March 2026.

MbrcUnitywaterInfrastructureSewer+1

0 views

Media & Communication

Changelog News Podcast Transcripts for 2025

2025 transcripts from the Changelog News podcast, generated from a linked GitHub repository. The dataset was authored by willtheorangeguy and last updated on the platform in April 2026.

TextMedia ContentNatural LanguagePodcast TranscriptsTechnology NewsSynthetic+1

0 views

Media & Communication

Changelog News Podcast Transcripts from 2024 Episodes

Complete transcripts from the 2024 episodes of the Changelog News podcast. The dataset was generated from a GitHub repository and uploaded to Hugging Face by the user willtheorangeguy. It was last updated on the platform in April 2026.

TextPodcast TranscriptsText DataTechnology NewsSynthetic+1

0 views

Media & Communication

Changelog News Podcast Transcripts for 2023 Episodes

Complete transcripts from the 2023 episodes of the Changelog News podcast. The dataset was generated from a GitHub repository and uploaded to Hugging Face by the user willtheorangeguy. The dataset was last updated on the platform in April 2026.

TextMedia ContentPodcast TranscriptsText DataTechnology NewsSynthetic+1

0 views

Media & Communication

TMDB Movies 2026: Film Metadata and Details

TMDB Movies 2026 is a dataset from Kaggle. It likely contains metadata and details about movies sourced from The Movie Database (TMDB). The specific content, volume, and creation details require verification after download.

TabularMoviesTmdbEntertainment+1

0 views

Media & Communication

Depression Video Quality And Engagement Metrics From TikTok

A dataset contains extracted data from a study evaluating depression-related videos on TikTok. It includes video characteristics, publisher types, engagement metrics, and quality scores from mDISCERN, JAMA, and GQS evaluations. Two independent reviewers assessed the videos, with the data supporting analysis of video quality, reliability, and educational value.

DouyinSocial MediaGeneralized Anxiety DisorderTiktok+1

0 views

Media & Communication

Alisa: Compressed and Original Media Files

Alisa is a dataset hosted on Kaggle. The title suggests it contains both compressed and original versions of media files, likely for comparison or analysis. Metadata is minimal; actual content requires verification after download.

MultimodalOriginal DataMedia ProcessingData Compression+1

0 views

Media & Communication

Rental Product Recommendation System Data

Kaggle hosts a dataset for a rental product recommendation system. The dataset likely contains user-item interaction data for building recommendation models. Specific details on size, columns, and origin are unavailable from the provided metadata.

TabularE CommerceRecommendation SystemRental Products+1

0 views

Media & Communication

SUCHO Ukrainian Cultural Heritage Web Archives

Web archives of Open Access collections from more than 3,000 websites of Ukrainian cultural institutions, such as museums, libraries, and archives. The archives were produced by the volunteer group Saving Ukrainian Cultural Heritage Online (SUCHO), which includes more than 1,300 international professionals. The data was saved during the 2022 invasion of Ukraine to preserve digitized cultural heritage before servers were potentially destroyed or offline.

MultimodalDigital PreservationCultural PreservationUkraineInternetWeb ArchiveCultural Heritage+1

0 views

Media & Communication

Lk News Docs: A Collection of News Documents

A text dataset titled 'Lk News Docs' was published by the author 'nuuuwan' on the Hugging Face platform. The dataset was last updated on April 24, 2026. The specific content, size, and structure are not detailed in the available metadata.

TextNewsMediaText Corpus+1

0 views

Media & Communication

Amazon Reviews Multi: Multilingual Customer Feedback Corpus

Multilingual Amazon customer reviews hosted as raw JSONL.GZ files for direct loading. The dataset is a mirror of the original 'amazon_reviews_multi' corpus, uploaded by user goosmanlei. It was last updated on the platform on 2026-03-18.

TextMultilingualE CommerceCustomer FeedbackNatural Language ProcessingAmazon Reviews+1

0 views

Media & Communication

Source-Based Fake News Classification with Author and Publication Metadata

A preprocessed dataset for classifying fake news based on source information, derived from the 'Getting Real about Fake News' corpus. The data includes features such as author names, publication dates, and source citations to assess news trustworthiness. It was published in a 2020 paper and is shared under a CC0-1.0 license.

TextTabularMedia BiasFake News DetectionText ClassificationSocial Media Analysis+1

0 views

Media & Communication

100 News Articles Collected via NewsAPI

Real-time news articles dataset collected using NewsAPI and Python. The dataset contains 100 articles, though the specific sources, time range, and collection methodology are not detailed. It was posted on Kaggle, but the author, organization, and license are unknown.

TextNews ArticlesMedia ContentReal TimeText Data+1

0 views

Media & Communication

Top Movies Ranking List

A list of top movies, sourced from Kaggle. The dataset's specific size, features, and creation details are not provided in the metadata. Its content and structure require verification after download.

TabularRankingMoviesEntertainment+1

0 views

Media & Communication

Facebook SimSearchNet++: 100 Million Vector Embeddings with HNSW Index

Facebook SimSearchNet++ is a collection of 100 million vector embeddings, likely for similarity search tasks. The dataset includes a pre-built Hierarchical Navigable Small World (HNSW) index for efficient nearest neighbor retrieval. It was published on Kaggle by Facebook.

MultimodalMachine LearningSimilarity SearchComputer VisionHnsw IndexEmbeddings+1

0 views

Media & Communication

IMDb Movies Dataset

A dataset of movies sourced from IMDb, a major online database for films and television. The dataset is hosted on Kaggle, a popular platform for data science projects. Specific details such as the number of records, included features, and time period covered are not provided in the available metadata.

TabularMoviesImdbEntertainment+1

0 views

PreviousPage 315 of 549Next