Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,980 datasets
2000 FullHD videos with audio tracks and mouse fixation data from over 5000 observers form a novel audio-visual saliency dataset. The collection includes diverse content such as movies, sports, and live videos, with a mean duration of 18 seconds. This dataset was created by ANDRYHA for the CVPR-NTIRE Video Saliency Prediction Challenge 2026.
U.K. Northern Hemisphere sea-level pressure and 500mb geopotential height data from the United Kingdom, processed by the DSS. The dataset contains daily and monthly gridded data, covering the period from December 1944 to December 1946.
The Tropical Cyclone Motion (TCM-90) Research Initiative analyses were produced using a four-dimensional data assimilation system at the National Meteorological Center. The horizontal resolution is 50 km, with analyses from 1000 mb to 100 mb at 50 mb intervals. Special surface analyses include surface pressure, latent and sensible heat fluxes, and sea-surface temperatures.
City of Bloomington, Indiana, provides a geospatial dataset of parks and recreation facilities owned or maintained by the city. The data layer includes features such as neighborhood parks, community parks, nature preserves, recreational sports parks, and cemeteries. It was last updated on March 8, 2026.
Unitywater Sewer Infrastructure data from the City of Moreton Bay's Data Hub. The dataset includes sewer pressure main information and was last updated in March 2026.
2025 transcripts from the Changelog News podcast, generated from a linked GitHub repository. The dataset was authored by willtheorangeguy and last updated on the platform in April 2026.
Complete transcripts from the 2024 episodes of the Changelog News podcast. The dataset was generated from a GitHub repository and uploaded to Hugging Face by the user willtheorangeguy. It was last updated on the platform in April 2026.
Complete transcripts from the 2023 episodes of the Changelog News podcast. The dataset was generated from a GitHub repository and uploaded to Hugging Face by the user willtheorangeguy. The dataset was last updated on the platform in April 2026.
TMDB Movies 2026 is a dataset from Kaggle. It likely contains metadata and details about movies sourced from The Movie Database (TMDB). The specific content, volume, and creation details require verification after download.
A dataset contains extracted data from a study evaluating depression-related videos on TikTok. It includes video characteristics, publisher types, engagement metrics, and quality scores from mDISCERN, JAMA, and GQS evaluations. Two independent reviewers assessed the videos, with the data supporting analysis of video quality, reliability, and educational value.
Alisa is a dataset hosted on Kaggle. The title suggests it contains both compressed and original versions of media files, likely for comparison or analysis. Metadata is minimal; actual content requires verification after download.
Kaggle hosts a dataset for a rental product recommendation system. The dataset likely contains user-item interaction data for building recommendation models. Specific details on size, columns, and origin are unavailable from the provided metadata.
Web archives of Open Access collections from more than 3,000 websites of Ukrainian cultural institutions, such as museums, libraries, and archives. The archives were produced by the volunteer group Saving Ukrainian Cultural Heritage Online (SUCHO), which includes more than 1,300 international professionals. The data was saved during the 2022 invasion of Ukraine to preserve digitized cultural heritage before servers were potentially destroyed or offline.
A text dataset titled 'Lk News Docs' was published by the author 'nuuuwan' on the Hugging Face platform. The dataset was last updated on April 24, 2026. The specific content, size, and structure are not detailed in the available metadata.
Multilingual Amazon customer reviews hosted as raw JSONL.GZ files for direct loading. The dataset is a mirror of the original 'amazon_reviews_multi' corpus, uploaded by user goosmanlei. It was last updated on the platform on 2026-03-18.
A preprocessed dataset for classifying fake news based on source information, derived from the 'Getting Real about Fake News' corpus. The data includes features such as author names, publication dates, and source citations to assess news trustworthiness. It was published in a 2020 paper and is shared under a CC0-1.0 license.
Real-time news articles dataset collected using NewsAPI and Python. The dataset contains 100 articles, though the specific sources, time range, and collection methodology are not detailed. It was posted on Kaggle, but the author, organization, and license are unknown.
A list of top movies, sourced from Kaggle. The dataset's specific size, features, and creation details are not provided in the metadata. Its content and structure require verification after download.
Facebook SimSearchNet++ is a collection of 100 million vector embeddings, likely for similarity search tasks. The dataset includes a pre-built Hierarchical Navigable Small World (HNSW) index for efficient nearest neighbor retrieval. It was published on Kaggle by Facebook.
A dataset of movies sourced from IMDb, a major online database for films and television. The dataset is hosted on Kaggle, a popular platform for data science projects. Specific details such as the number of records, included features, and time period covered are not provided in the available metadata.