Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,040 datasets
Business News Dataset from Google News API is a real-time business news collection prepared for data analysis and machine learning. The dataset's author, organization, and specific size are unknown. Its last update date is also unknown.
A collection of Vietnamese news articles labeled for classification tasks. The dataset is hosted on Kaggle and appears to contain text data organized into four distinct topics. The specific number of articles, source, and creation date are unknown.
Supplying CryoEM maps, atomic models, and validation reports for Lenacapavir, authored by Robert Dick and released in March 2026. It contains the structural biology data and documentation specifically prepared for manuscript reviewers to verify the drug's molecular architecture.
reviewWA is a text dataset published on Kaggle. The title suggests it likely contains user reviews, but the specific content, volume, and origin are unconfirmed. Metadata is minimal; users must download the data to verify its scope and quality.
Popular Movie over the years is a dataset hosted on Kaggle. The dataset likely contains information about film popularity across different time periods. The specific content, size, and origin are not detailed in the provided metadata.
Movies analyzing dataset is a collection of data related to films, published on the Kaggle platform. The dataset's specific contents, such as columns, size, and origin, are not detailed in the available metadata. Further verification after download is required to confirm its scope and structure.
Unconfined compressive strength data for rocks from the TilTil and ElTeniente mines in Chile. The dataset includes basic index tests for porosity and density, plus elastic wave velocity measurements for selected samples. Laboratory data was collected under a NERC grant focused on geological safety and fracture damage in mining.
Steam Reviews English is a text dataset of user reviews for video games on the Steam platform. The dataset was authored by SebastianHops and was last updated on Hugging Face in March 2026. The specific volume, time range, and review features are not detailed in the available metadata.
A collection of Bangla news articles aggregated from diverse data streams. The corpus is intended for natural language processing tasks involving the Bengali language. Specific details on volume, creation date, and authorship are not provided in the input metadata.
800 instructions in Tamil are culturally grounded, likely providing context-specific prompts for language models. The dataset was created by Adaption, as indicated in the description. Its release date and update frequency are unknown.
UCI Drug Review data, originally from the UC Irvine Machine Learning Repository, is hosted on Kaggle. The dataset likely contains user-generated reviews and ratings for pharmaceutical drugs. Its exact size, features, and collection period are unspecified in the provided metadata.
Movie_genre_revenue_metadata is a dataset from Kaggle. Its title suggests it contains information linking film genres to financial performance. The specific contents, scale, and origin require verification after download.
Kaggle hosts a dataset titled 'review-chekpoints--2026-04-29--13238-13238'. The title suggests it relates to checkpoints, likely for reviewing or evaluating machine learning models. No further metadata is available to confirm its specific contents, size, or origin.
GeoSure Compressible Deposits data from the British Geological Survey assesses the potential for ground compression under load. The dataset provides complete national coverage for Great Britain, identifying superficial deposits like peat or alluvium that may compress and cause subsidence.
Scheduled Ancient Monuments are structures or archaeological sites designated as nationally significant, requiring permission for any alterations. The dataset is managed by Newcastle City Council and was last updated in March 2026. Specific details on the number of monuments or data attributes are not provided.
NCEI Accession 0000145 contains measurements of pressure, temperature, salinity, PAR, chlorophyll, and sigma-t from a research cruise in the Southern Ocean. The data were collected by NOAA's National Centers for Environmental Information and cover a period from October 1997 to November 1999. This dataset likely provides vertical profile data from Conductivity-Temperature-Depth (CTD) instruments, a standard tool for physical oceanography.
Kaggle hosts a 40,000-row dataset focused on depression analysis. It includes SHAP explanations, lexicon DES, and ECID fairness metrics for BERT models. The dataset appears to be balanced for training and evaluation.
Curated movie metadata dataset for building recommendation systems. The dataset is hosted on Kaggle, but its specific size, authorship, and update date are unknown.
Participant totals for the Austin Arts, Culture, Music, and Entertainment (ACME) Department's Lifelong Learning program, broken down by site and fiscal year. The dataset is maintained by the City of Austin and was last updated in March 2026. Specific row counts and column details are unavailable.
Texas Railroad Commission UIC well data contains monthly injection volume and pressure reports submitted on RRC Form H-10. The dataset is structured with one record per UIC permit number per month. It is maintained by the City of Austin and was last updated in March 2026.