Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,957 datasets
AfrIFact is a dataset for cultural information retrieval, evidence extraction, and fact-checking in African languages. It was created by Masakhane and last updated on April 2, 2026. The dataset is designed to assess the veracity of online claims, particularly those concerning healthcare and culture in low-resource linguistic contexts.
The PaleoMagnetic Archival Group (PMAG) Database contains peer-reviewed paleomagnetic, rock, and geomagnetic data from the nasa_earthdata platform. It includes measurements from magnetometers and model calculations. The database is a prototype and is still under construction.
TMDB Movie Metadata includes ratings, popularity scores, and release dates for films from 1957 to 2026. The dataset is sourced from The Movie Database (TMDB) and is hosted on Kaggle. It is intended for analysis of film industry trends over a 70-year period.
TMDB movie metadata includes ratings, popularity scores, and release dates. The dataset covers films released between 2000 and 2026, sourced from The Movie Database platform. Its author, organization, and specific size are unknown.
Video compression data generated using the VVC (Versatile Video Coding) standard's VTM 24.0 reference software. The data was created using All Intra mode at a Quantization Parameter (QP) of 32. The dataset is intended for research in video compression algorithms and codec development.
A dataset for detecting hate speech against LGBTQIA+ people in Brazilian Portuguese. It contains comments collected from three social media platforms related to the 'Entre Amigues' podcast. The dataset was created by Veronyka and was last updated on March 23, 2026.
Amazon reviews data from the Stanford Network Analysis Project (SNAP) includes 34,686,770 reviews from 6,643,669 users on 2,441,053 products spanning 18 years up to March 2013. The provided subset contains 1,800,000 training and 200,000 testing samples labeled with polarity. Authors Xiang Zhang, Junbo Zhao, and Yann LeCun published related research in 2015.
THUCNews is a text classification dataset published on Kaggle. The title suggests it likely contains Chinese news articles categorized for machine learning tasks. The dataset's author, organization, and specific details are not provided in the available metadata.
BBC News content collected via web scraping and published on Kaggle. The dataset likely contains news articles and headlines, though the specific volume, time period, and exact content are unconfirmed from the provided metadata.
A list of 250 top-rated movies from the Internet Movie Database (IMDb). The dataset is published on Kaggle, though its specific creation date and update frequency are unknown. It likely contains information such as titles, ratings, and votes for each film.
Movie posters and magazine covers composited into realistic phone-like scenes. The dataset appears designed for computer vision tasks involving document classification within a synthetic environment. Its author, organization, and specific scale are unknown.
A WFS service provides the urban development plan 'Nachverdichte Stumpenhof - Nördlicher Teil' for the city of Plochingen, Germany. The data is transformed according to the INSPIRE directive and is based on an XPlanung dataset in version 5.4. The dataset is maintained by the Bundesamt für Kartographie und Geodäsie and was last updated on March 30, 2026.
Kaggle hosts a dataset titled 'movies'. The dataset's specific content, size, and origin are not detailed in the provided metadata. Metadata is minimal; actual content requires verification after download.
MOVIES is a dataset hosted on the Kaggle platform. Its specific content, size, and origin are not detailed in the provided metadata. The dataset likely contains information related to films, which could include titles, genres, ratings, or cast details.
Trueque is a human-reviewed benchmark dataset for evaluating large language models on Latin American knowledge and cultural appropriateness. It is an initial beta release (version 0.1) created by latam-gpt. The dataset was last updated on April 1, 2026.
Binomial regression model results for IPV-induced type-two polio seroconversion across different ages and dosing schedules. The model was fitted to a review of 19 seroconversion studies conducted between 1985 and 2022. The dataset was authored by Elizabeth J. Gray and published on figshare.
A multivariable linear regression analysis investigating factors associated with intracompartmental pressure. The dataset, created by Heng Zhang and shared under a CC-BY-4.0 license, was last updated on March 25, 2026. It is stored in an XLS file with a size of 5.5 KB.
A dataset profiling reviewer behavior on online retail platforms, created by Luisa Stracqualursi. It was last updated on March 25, 2026. The data is stored in an XLS file and is 5.5 KB in size.
A framework for profiling reviewer behavior on online retail platforms, comparing approaches to balance scalability and interpretability. The dataset was authored by Luisa Stracqualursi and last updated on March 25, 2026. It is a 5.5 KB XLS file shared under a CC-BY-4.0 license.
Mary Hynes published a dataset on figshare detailing participant counts in a clinical trial for depression. The data includes numbers for TAU (n=46) and CFT (n=45) conditions across three time points: baseline, post-treatment, and three-month follow-up. The dataset is 5.5 KB in size and was last updated in March 2026.