Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,972 datasets
movielens-20m-dataset is a collection of movie ratings and tags published on Kaggle. The dataset likely contains user-movie interactions, which are foundational for building recommender systems. Its specific scale and collection methodology are not detailed in the provided metadata.
A dataset hosted on Kaggle concerning depression detection. The title suggests it contains text posts from the Reddit platform, likely intended for training or evaluating models related to mental health. The author, organization, and specific collection details are unknown.
RedditV1 contains social media posts likely related to mental health discussions. The dataset is published on Kaggle, but its specific size, author, and update date are unknown. Its content appears to be text data sourced from the Reddit platform.
A dataset likely containing text for classifying news articles as fake or real. It was published on Kaggle. The specific source, size, and creation date are unknown.
A dataset of news content sourced from the Telegram messaging platform. The dataset is published on Kaggle, but its size, time range, and collection methodology are unspecified. Content likely contains text from news channels or groups on Telegram.
R code for a systematic review and meta-analysis registered under PROSPERO ID CRD420251145491. The document contains scripts for generating forest plots and funnel plots to analyze studies on bone mineral density and osteoporosis. It is a methodological resource for replicating the statistical analysis of the registered review.
Air pressure measurements collected by weather sensors deployed at the Halftide Rocks AWS site. The data was gathered by the Australian Ocean Data Network and covers a time range from 26 July 2000 to 19 December 2009.
A collection of reviews published on Kaggle. The dataset likely contains user-generated text feedback. Its specific source, size, and time range are unknown from the provided metadata.
Social media text data sourced from Facebook search posts. The dataset is intended for natural language processing and sentiment analysis tasks. Its author, organization, and specific scale are unknown.
MEWRK is a management information database from the Social Security Administration. It stores data related to the development, adjudication, and effectuation of Title II Work Continuing Disability Reviews. The dataset was last updated on 2026-04-03.
Telephone monitoring results from service centers collected by the Office of Quality Review (OQR) field staff. The dataset is published by the Social Security Administration on the Data.gov platform. It was last updated on April 3, 2026.
AgroCoT is a Chain-of-Thought benchmark for evaluating reasoning abilities in Vision-Language Models (VLMs) for agriculture. It contains 4,759 curated samples designed to test logical reasoning and problem-solving, particularly in zero-shot scenarios. The dataset was created by author wenyb and is hosted on HuggingFace.
This dataset summarizes current community solar policies and related stipulations by state in the United States. It is updated multiple times per year by the Department of Energy.
This dataset lists community solar projects, including provisions for low-income and low- and moderate-income households. It is updated multiple times per year and includes both completed and pending projects. The data has been reviewed but may contain errors or missing information.
This dataset summarizes state-level community solar policies and low-income stipulations in the United States as of April 2024. It is maintained by the Department of Energy and NREL, who invite user input for updates and corrections. The dataset is deprecated, with current data available at a linked repository.
List of community solar projects identified through various sources as of June 2022. It has been reviewed but may contain errors or missing information. The dataset is deprecated, with current data available from the National Renewable Energy Laboratory (NREL).
This dataset lists community solar projects identified through various sources as of December 2021. It is maintained by the Department of Energy's National Renewable Energy Laboratory (NREL). The data is deprecated, with current versions available via a separate link.
This dataset lists community solar projects identified through various sources as of December 2020. It is a revision that includes projects with total installed capacity under state-level programs in 2020 but missing project-level details. The data is maintained by the Department of Energy and the National Renewable Energy Laboratory (NREL).
This dataset lists community solar projects identified through various sources as of June 2020. The data has been reviewed but may contain errors or missing information, and it is no longer current.
Presenting a list of community solar projects identified through various sources as of May 2020. It has been reviewed but may contain errors or missing information. The dataset is deprecated, with current data available from a linked source.