Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,003 datasets
A dataset of reasoning traces generated using the Gemini 3 Pro Preview model with high reasoning depth. It contains 1000 examples, including some sourced from an existing 250-example dataset, and was created by author REXX-NEW for model distillation purposes. The dataset was last updated on February 25, 2026.
Temperature profile and pressure data were collected using CTD casts from NOAA Ships RONALD H. BROWN and KA'IMIMOANA. The data cover the North and South Pacific Ocean from June to December 1999 and were submitted by Kristene McTaggart of the Pacific Marine Environmental Laboratory with support from the GOALS and PACS projects.
Kazakh Instruction V2 is a dataset of self-instruct data pairs for the Kazakh language. It was created by translating the Stanford Alpaca instruction dataset via Google's API, with manual corrections and additions of Kazakh names, places, history, and culture. The dataset, authored by AmanMussa, was last updated on February 23, 2026.
The-Tweets-of-Wisdom is a collection of tweets, retweets, and retweets with comments from 40 Twitter accounts that frequently post self-help and motivational content. The dataset was scraped using the Tweepy API, with the creator's script available for review. The data was released under a CC0 1.0 license.
IMDb-Indonesian-Movies contains information on 1,262 Indonesian films. The data was gathered from IMDb.com using an IMDb-Scraper tool and cleaned into a CSV file. It includes 11 columns such as title, year, genre, rating, and cast information.
The IMDB Dataset of 50k+ Movies contains over 50,000 Indian movie titles. It likely includes details such as genres and ratings. The dataset was sourced from Kaggle, but its author, license, and last update date are unknown.
500 top-ranked films are evaluated using combined ratings from IMDb, Rotten Tomatoes, Metacritic, Letterboxd, and Google. This dataset likely contains a multi-platform score for each movie, enabling comparative analysis of critical and audience reception. The original author and specific collection date are unknown.
An experimental dataset titled 'exp71_film' published on Kaggle. The dataset's specific content, size, and creation details are not provided in the metadata. Its title suggests it contains data related to film, potentially for analysis or machine learning tasks.
Kaggle hosts the KaoKore dataset, which appears to focus on Japanese art. The dataset likely contains images of faces from artworks, annotated for expression analysis. Specific details on the collection size, creator, and update date are not provided in the available metadata.
A curated dataset of popular Indian movies containing ratings, genres, and popularity metrics. The dataset is hosted on Kaggle, but its author, organization, and last update date are unknown. The specific number of movies, rows, and file formats are also unspecified.
A high-resolution vector shoreline for Newport News and Norfolk, Virginia, compiled from imagery by the National Oceanic and Atmospheric Administration (NOAA). The data is structured using the C-COAST attribution scheme to facilitate translation into international hydrographic standards. This resource is a member of a larger NOAA shoreline data collection.
160 college-educated white male professionals from Indianapolis, New York, Paris, and Clermont-Ferrand participated in 2-hour semi-directed interviews between 1986 and 1988. The study, conducted by Michรจle Lamont and published by Harvard University Press, compares cultural definitions of a 'worthy person' across French and American upper-middle classes. Data includes 452 digitized audio files and interview transcripts assessing social perceptions, values, and traits.
NOAA's Shoreline Data Rescue Project provides a high-resolution historical shoreline for the Newport News Waterfront, Virginia. The data were automated from NOAA National Ocean Service maps based on imagery interpretation and field surveys. The attribution follows the C-COAST scheme, influenced by the S-57 standard for hydrographic data.
High-resolution vector shoreline data for Norfolk, Hampton Roads, and Newport News, Virginia, compiled from imagery by the National Oceanic and Atmospheric Administration. The data is structured using the Coastal Cartographic Object Attribute Source Table (C-COAST) attribution scheme and is suitable as a GIS data layer. This resource is a member of a larger NOAA shoreline data collection.
Management information from the Social Security Administration's integrity review process. The dataset was last updated on March 10, 2026. The data is published on the Data.gov platform under a specified license.
A high-resolution vector shoreline dataset for the Ports of Newport News and Norfolk/Hampton Roads, Virginia, compiled from imagery. The data is structured using the NGS Coastal Cartographic Object Attribute Source Table (C-COAST) attribution scheme and is provided by the National Oceanic and Atmospheric Administration. The metadata describes both line and point shapefiles suitable for GIS applications.
A high-resolution vector shoreline dataset for the Ports of Norfolk and Newport News, Virginia, compiled from imagery. The data, provided by the National Oceanic and Atmospheric Administration, includes line and point shapefiles attributed using the NGS C-COAST scheme. The metadata was last updated on 2026-03-13.
A high-resolution vector shoreline dataset for the ports of Norfolk and Newport News, Virginia, compiled from imagery. The data is provided as line and point shapefiles with attribution based on the NGS-developed C-COAST scheme. It is published by the National Oceanic and Atmospheric Administration and was last updated in March 2026.
Ports of Newport News and Norfolk/Hampton Roads, Virginia, are covered by this high-resolution vector shoreline dataset. The data were compiled from imagery by the National Oceanic and Atmospheric Administration and are structured using the NGS C-COAST attribution scheme. The metadata describes both line and point shapefiles, which may be suitable for use in a geographic information system.
A dataset associated with a systematic literature review on the Balanced Scorecard in knowledge institutions. The data was published by author ECHEVERRI URREGO, DANIEL on the Harvard Dataverse platform and last updated on 2026-04-16. The raw description mentions Vosviewer and Bibliometrix, suggesting it likely contains bibliometric analysis data.