Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,946 datasets
Ionospheric observations have been carried out in Japan since the 1930s using vertical sounding ionosondes. The ionosondes produce ionograms, which are recorded digitally and graphically on 35mm film, and data is collected from stations via an ISDN network. The ionograms are processed automatically into numerical values and summary plots, with manual scaling performed by specialists at the Kokubunji station.
3270 stations across the Indian Ocean, Arabian Sea, Bay of Bengal, Laccadive, and Andaman & Nicobar Seas provide surface meteorological data from 1976 onward. Data includes sea surface temperatures, dry and wet bulb temperatures, wind direction and speed, pressure, cloud type and amount, and sea state. Observations were collected during cruises of R.V. Gaveshani and ORV Sagar Kanya, supplemented by wind data from the Indian Meteorological Department for eight coastal stations from 1980 to 1984.
This resource provides open-source design files for a 25-cm² electrolysis cell hardware developed for research and development testing. It was created by the National Renewable Energy Laboratory (NREL) and funded by the U.S. Department of Energy's H2NEW consortium to enable low-temperature electrolysis testing at elevated pressures. The package includes drawings, materials lists, and procedures for fabrication and assembly.
TikTok-10M is a large-scale dataset containing 10 million short-form posts from TikTok, curated to bridge the gap between academic video datasets and actual user-generated content. It was created by author Nikioooo and last updated on Hugging Face in April 2026. The dataset is designed to provide researchers with authentic patterns and characteristics of modern short-form video content.
Replication data for a forthcoming article in the Review of Economics and Statistics. The dataset likely contains information related to international trade, export complexity, and service outsourcing. It was authored by Giuseppe Berlingieri and is hosted on the Review of Economics and Statistics Dataverse.
Short-video marketing and engagement dataset for cultural tourism analysis. The dataset was sourced from Kaggle, but the author, organization, and specific collection details are unknown. The last update date and dataset size are unspecified.
512 million synthetic persona records across 77+ countries and 39 languages, unified from 18 open-source datasets. The database is partitioned, compressed, and stored in a Parquet warehouse, optimized for querying. It was created by Kasher13 and last updated on March 29, 2026.
Moreton Bay Council's polygon dataset identifies areas where 5% Annual Exceedance Probability flood data has limited reliability. The data, authored by moretonbaygis and last updated in March 2026, provides spatial definitions and reasons for this lower reliability.
Moretonbaygis polygon dataset identifies areas with limited flood data reliability for a 1% Annual Exceedance Probability event. The City of Moreton Bay's Data Hub maintains this spatial record, last updated in March 2026. It documents specific locations and reasons for lower model confidence.
Moreton Bay GIS provides a polygon dataset describing areas of known lower reliability or under review for 0.1% Annual Exceedance Probability flood models. The dataset stores spatial definitions and reasons for limited data reliability. It is maintained by the City of Moreton Bay's Data Hub.
A polygon dataset from the City of Moreton Bay describes areas where flood data for a 0.1% Annual Exceedance Probability storm tide event has limited reliability. The dataset, updated in March 2026, provides spatial definitions and reasons for this lower reliability.
A polygon dataset from the City of Moreton Bay describes areas with limited reliability in 1% Annual Exceedance Probability storm tide flood modeling. The data, last updated in March 2026, indicates specific locations and reasons for lower data confidence or review status.
City of Moreton Bay's Data Hub provides a polygon dataset describing areas with limited reliability in 5% Annual Exceedance Probability storm tide flood modeling. The dataset indicates specific locations and reasons for lower data reliability or areas subject to review. It was last updated in March 2026 by moretonbaygis.
City of Moreton Bay's Data Hub provides a spatial dataset describing the expected reliability of flood data. It defines polygons indicating areas where the council assesses the flood model data has limited reliability or is subject to review. The dataset was last updated in March 2026.
Summary data from a study on premovement suppression of corticospinal excitability. The dataset includes a CSV file with summary data, a PDF codebook detailing variables, and an associated R script for analysis, supporting a referenced manuscript. It was authored by Anthony Carlsen and hosted on Harvard Dataverse.
Harvard Dataverse hosts raw diagnostic measurements supporting a study on titanium under extreme conditions. The data includes X-ray diffraction images, Velocity Interferometry (VISAR) records from all four quads for every experimental shot, and a CeO2 reference for calibration. Author Saransh Soderlind deposited this replication dataset, last updated on April 28, 2026.
Air pressure data collected by sensors deployed on the John Brewer weather station site. The dataset covers a specific 10-month period from 31 July 1987 to 30 May 1988. It was collected and managed by the Australian Ocean Data Network.
Vanderbilt TV News Abstracts is a collection of textual summaries from television news broadcasts. The dataset was authored by Gaurav Sood and is hosted on the Dataverse platform, with a last recorded update in May 2026. Additional details about the data are referenced on a GitHub repository.
A collection of Dutch tweets and features gathered in April 2022 using the Twitter API. A small portion of the tweets are annotated by volunteers for the main task of identifying rumours. The dataset was contributed by Nicky van der Linden and is licensed under CC0-1.0.
OpenForesight is a dataset of forecasting questions generated from news articles using retrieval-augmented prompts. It is designed to evaluate language models' ability to make predictions about future events using relevant context. The dataset was created by nikhilchandak and was last updated on March 31, 2026.