Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,956 datasets
Trending movies data published on the Kaggle platform. The dataset's specific content, such as movie titles, ratings, or release dates, is not detailed in the available metadata. Its size, structure, and collection methodology are unknown.
CoderForge-Preview is a test-verified coding agent dataset containing between 100,000 and 1,000,000 records, released by togethercomputer in February 2026. It provides trajectories for training software engineering agents and has demonstrated a performance increase from 23.0% to 59.4% pass@1 on the SWE-Bench Verified benchmark when used for fine-tuning.
Woods Hole Oceanographic Institution collected electric field, temperature, and conductivity data from the ATLANTIS II research vessel from November 14 to 21, 1971. Electric field was recorded every half-second, while temperature and conductivity were recorded every second using a freely-falling and rising rotating vehicle in the deep ocean. The data were processed using QDEM and PROFEQ computer programs to derive east and north components of water velocity and create an equally spaced pressure series.
SUPERChem is a multimodal reasoning benchmark dataset for chemistry. The dataset was created by ZehuaZhao and was last updated on March 31,ๆไปฌๅ็ฐไบไธไธช้ฎ้ข๏ผ่ฏท็จๅๅ่ฏใ
A 1977 Australian National Antarctic Research Expedition visited the Bunger Hills on March 2nd to collect biological and geological samples. The dataset is a scanned report summarizing the findings and reviewing earlier scientific work in the region by other nations. It was created by R.J. Barker and sourced from the Australian Antarctic Data Centre.
Kaggle hosts a dataset of 1,000 films scored across more than 25 criteria. The criteria are described as factors that viewers actually care about. The author, organization, and specific column definitions are not provided.
User reviews in Indonesian collected from the Google Play Store. The dataset is intended for sentiment analysis research and focuses on reviews for the ChatGPT and Gemini applications. The dataset was sourced from Kaggle, but the author, organization, and specific collection date are unknown.
Canadian data from Statistics Canada details industry expenses as a percentage of total operating costs for real estate agents and brokers (NAICS 53121). The dataset provides annual data covering two years.
54676 gene expression measurements from 151 breast cancer tissue samples, curated into 6 cancer subtype classes. The data originates from the CuMiDa repository, which handpicked and preprocessed this dataset from the GEO database for machine learning benchmarking. CuMiDa's curation process involved steps like sample quality control, background correction, and normalization to create a reliable source.
A gene expression dataset for brain cancer containing 130 samples and measurements for 54,676 genes, curated into 5 classes. It originates from the CuMiDa repository, which provides handpicked and biologically preprocessed microarray datasets from the Gene Expression Omnibus (GEO) for machine learning. The dataset is referenced in computational biology publications from 2019.
Town of Cary, North Carolina provides a dataset of site and subdivision plans. The data includes projects under review, recently approved, or actively being constructed and is updated as needed. The dataset is associated with an interactive development map on the town's website.
3,154 autonomous cultural intelligence agents are provided, each representing a deployable specialist. The collection was created by author 'joannaslh' and was last updated in April 2026. Each agent contains structured identity files, vocabulary, and cultural reference lists.
3,154 autonomous cultural intelligence agents are available, each designed as a deployable specialist. The collection was created by joannaslh and was last updated in April 2026. Each agent contains a defined identity, vocabulary, cultural references, source monitoring, territory, and LIGO commerce routing information.
Nepal Tourism Reviews is a dataset hosted on Kaggle. The dataset likely contains user-generated reviews related to tourism in Nepal. The specific content, size, and collection details are not provided in the available metadata.
IMDB_Movies_list is a dataset published on the Kaggle platform. The title suggests it contains a list of movies, likely sourced from the Internet Movie Database. The specific contents, scale, and creation details are not provided in the available metadata.
Urdu language data likely related to educational or cultural reasoning tasks. The dataset is published on Kaggle, but its specific contents, size, and creation details are not provided in the metadata. Users must download the dataset to verify its exact nature and scope.
Mental Health Prediction Dataset is hosted on Kaggle. It is designed to predict anxiety, depression, and burnout from lifestyle factors. The dataset's author, organization, size, and last update date are unknown.
2026-03-24 updated dataset from the Water Corporation detailing drainage pump station locations. It includes pressure adjustment points for water management infrastructure. Specific row counts and column details are unavailable.
Water Corporation pressure adjustment point data, published by Asset Registration. The dataset includes geographic features for pump stations and related infrastructure. It was last updated in March 2026.