Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,012 datasets
A collection of posts scraped from Reddit subreddits. The description mentions it includes title, author, score, comments, flair, and text content. The dataset's author, size, and last update date are unknown.
Reddit posts and comments archived over a period of more than 10 years. The data is sourced via the PullPush service and includes full text content. The dataset is hosted on Kaggle, but specific details on volume, authorship, and licensing are not provided.
Hacker News Who Is Hiring Scraper contains structured job listings scraped from the monthly 'Who is Hiring?' threads on Hacker News. The dataset likely includes job titles, companies, and salary information posted by the community. It was scraped from the Hacker News platform, but the specific author, time range, and exact data volume are unknown.
4,000 Twitter posts in Urdu, English, and Roman Urdu are labeled for depression severity. The dataset features 4-class severity labels verified by both large language models and human annotators. It was created in 2024-2025 and shared on Kaggle.
ViNewsFact is a Vietnamese multimodal evidence dataset designed for retrieval and fact-checking tasks. The dataset appears to contain news articles and likely contains associated multimodal evidence. The author, organization, and specific scale are unknown.
Hiligaynon News Articles is a text dataset published on the Hugging Face platform by the user welyjesch. The dataset was last updated on 2026-04-09. Its content likely consists of news articles written in the Hiligaynon language, a major language of the Philippines.
A telephone directory for the Department of Culture, Youth and Family of the Poltava City Council, published on the States site of Ukraine. The dataset was last updated on 2026-03 06:26:00.022479 and is available in spreadsheet and document formats.
Review checkpoints likely contain evaluation metrics or performance data for machine learning models. The dataset is hosted on Kaggle, a platform for data science and machine learning projects. The specific content, size, and origin of the data are unknown from the provided metadata.
A collection of Turkish-language podcast content aggregated by the author 'yt-data-1'. The dataset was last updated on Hugging Face on April 3, 2026. The specific source, size, and content details are not provided in the available metadata.
Anonymous review-time evidence packages for auditing LLM guardrails. The dataset appears to contain logs or records generated during the evaluation of large language models. Its provenance and scale are unspecified.
Amino acid composition data for 39 coral skeleton samples from four massive Porites spp. genotypes. The samples were cultured in an aquarium under controlled seawater pCO2 levels of 180, 260, 400, and 750 µatm and temperatures of 25 and 28°C. Data were collected between August 2020 and December 2022 by researchers including Celeste Kellock and Nicola Allison, with interpretation by a team from the British Geological Survey.
This scoping review synthesizes 20 studies published between 2000 and 2024 regarding remediation strategies for clinical reasoning deficits in medical residents. Created by Jovian Philip Swatan, the data maps identification methods, interventions, and institutional barriers extracted from seven major medical databases including PubMed and MEDLINE.
Reviews compiled by the Brevard County Chapter of Moms for Liberty, a political advocacy group. The dataset was authored by Jennifer D McGrew and last updated in March 2026.
An image dataset likely containing paintings from the Impressionist art movement, sourced from the WikiArt online encyclopedia. The dataset is hosted on Kaggle, but its specific scale, composition, and creation details are not provided in the available metadata. Further verification is required to confirm the exact number of images, artist coverage, and image attributes.
49,600 English-language Steam reviews of the video game Cyberpunk 2077. The reviews are stratified by patch era, likely reflecting player sentiment across different game updates. The dataset's author, organization, and license are unknown.
Dutch news articles published by NOS, one of the biggest online news organizations in the Netherlands. The data was obtained by scraping the NOS website and includes articles from January 1, 2010 onward. Titles and content have been cleaned and normalized.
Encompassing expression data for cathepsin D (CTSD) and cathepsin G (CTSG) from 58 human fracture hematoma samples collected 0-19 days post-trauma and from neutrophils polarized into N0, N1, and N2 phenotypes from five human donors. It was created by Lu, Fangzhou to investigate the association of these cathepsins with fracture healing phases and specific neutrophil phenotypes. The data shows CTSD expression increased over healing time, while CTSG remained constant, and differential expression between N1 and N2 neutrophil phenotypes.
Top-Rated_movies is a dataset published on the Kaggle platform. The dataset likely contains information about films with high user or critic ratings. Metadata such as column definitions, size, and license are currently unknown.
Replication code for the paper 'Asset Prices When Investors Underestimate Discount Rate Dynamics' published in the Review of Asset Pricing Studies. The package includes scripts and documentation but excludes proprietary data from CRSP, Compustat, and I/B/E/S due to licensing restrictions. Users must obtain the required data separately to execute the code.
Hand-curated data on mobile phones includes specifications, user reviews, and revenue information for the period 2019 to 2020. The dataset was sourced from Kaggle, but the original author and organization are unknown. The total number of rows and specific file formats are not provided.