Loading...
Loading...
Offline RL trajectories, game data, robot demonstrations, RLHF, multi-agent interaction
10,046 datasets
Merged 15-minute resolution data from UK solar generation, Agile electricity tariffs, and smart meters. The dataset is hosted on Kaggle and is intended for training reinforcement learning agents. The author, organization, and specific temporal coverage are not provided.
Standing out with labeled IT support email tickets categorized by priority levels and functional queues. It contains raw customer inquiries paired with the corresponding agent responses to support the development of automated helpdesk systems.
Weekly transaction data from a retail sales context, aggregated by time. The dataset is hosted on the UCI Machine Learning Repository and is tagged for retail and time-series analysis. Specific details on the number of rows, columns, and temporal range are not provided in the input metadata.
A free sample intended for Security Operations Center (SOC) analytics and incident monitoring. It contains tabular data tagged for financial transaction fraud detection.
Long-Data-Col-rp_pile_pretrain is a subset of the togethercomputer/Long-Data-Collections dataset, specifically the pretrain split files rp_sub.jsonl.zst and pile_sub.jsonl.zst. The dataset was created by BEE-spoke-data and last updated on December 29, 2025. To focus on long text, rows where the text contains fewer than 250 characters were dropped.
Sacred Realm is a dataset on paperswithcode, a platform for machine learning resources. The title suggests it contains historical and archaeological information related to the development of synagogues. Its specific contents, such as text, images, or structured data, require verification after download.
Supportive Measures for Balancing Work and Child Care is a dataset published on the japan_data platform. The data is provided by the National Personnel Authority (人事院) of Japan, authored by its International Affairs Division (事務総局国際課). It was last updated on 2026-01-28 13:04:13.265641.
A multi-table dataset of 5.5 million transactions for fraud detection and risk modeling. The dataset is hosted on Kaggle and is tagged for use in automated machine learning and data cleaning tasks. The original author, organization, and specific collection details are unknown.
Delivering a collection of 2,000 e-commerce transactions spanning 20 different countries. It is designed to facilitate the analysis of global retail trends, focusing on sales performance and profitability across diverse geographic regions.
Aggregating coded transcripts from focus groups with homeless-experienced adults who have opioid use disorder. The data explores experiences and attitudes toward peer recovery support. The author is Danielle R. Fine.
Featuring uncoded transcripts from focus groups with homeless-experienced adults who have opioid use disorder. The discussions explore experiences with and attitudes toward peer recovery support.
A dataset from a program aiming to improve long-term health outcomes for relatives of intensive care unit survivors through a nurse-led e-health intervention. The data was authored by van Mol, Margo and last updated on January 26, 2026. Specific details on the number of rows, columns, and data structure are not provided in the input.
Replication materials support the paper 'A Political History Forecast of Bloc Support in the 2025 German Federal Election' by Quinlan, Schnaudt, and Lewis-Beck. The data was published in 2025 and deposited by Dr. Stephen Quinlan at Harvard Dataverse. It was last updated on January 20, 2026.
The Quran Semantic Annotation Corpus is a multi-label, semantically tagged corpus covering all verses of the Quran. It is designed for linguistic and natural language processing tasks involving religious text.
Kaggle hosts a dataset listing top AI companies involved in military support activities between the United States and Israel. The dataset was collected from authentic sources, though the specific sources and collection methodology are not detailed. The number of companies, data fields, and update frequency are unknown.
30,000 raw retail transactions documenting the purchasing behavior of 100 customers across 5 regions. This Kaggle-hosted dataset provides the raw inputs necessary for calculating Recency, Frequency, and Monetary (RFM) metrics. The data is specifically curated for retail analytics and customer segmentation workflows.
Upcoming events in Chicago, managed by the Office of Emergency Management and Communications. The dataset focuses on events with needs related to traffic management and public safety. It is designed to support city operations and may change to meet that primary purpose.
Median sales prices for condominiums, three-family, two-family, and single-family homes are derived from arms-length transaction records at the Middlesex South Registry of Deeds. Cambridge Community Development Department staff analyze these records to filter for unrelated-party sales, which are considered the best indicator of true market value. The dataset was last updated in November 2025.
46 Brazilian water systems provide data for a two-staged approach to estimate inflow and infiltration (I&I). The dataset supports sensitivity analysis for water balance and pollutant mass flux models. It was created by Gabrielle Migliato Marega to accompany a research paper on enhancing insights in data-scarce contexts.
Data Management and Sharing Plan (DMSP) for the Creating Access to Resources and Economic Support (CARES) research project. It describes the scientific data to be generated and used, outlining a strategy for data management and sharing. The plan was authored by Larissa Jennings Mayo-Wilson.