Loading...
Loading...
Source code corpora, bug reports, vulnerability databases, network intrusion detection, malware samples
1,635 datasets
Encompassing programming language tokens curated for the CodeXGLUE benchmark to support the task of next-token prediction. It provides structured code sequences designed to evaluate model performance using token-level accuracy metrics.
Lutsk City Council decisions permit certain trade, restaurant, and service sector businesses to operate after 22:00. This list, published on the State site of Ukraine, was last updated on 2022-10-26. The data is available in CSV and Excel formats.
Approximately 1,500 commits with code diffs, sourced from CommitPackFT, provide a benchmark for evaluating model retrieval tasks. The dataset, created by cassanof, covers 13 languages including Python, JavaScript, Go, and Rust. It was last updated on Hugging Face in April 2024.
Triton code snippets extracted from GitHub repositories that are governed by permissive licenses such as MIT, Apache, and BSD. Each record maps a specific code snippet to its functional categorization, repository metadata, and direct source URL.
Multiple programming language datasets for line-level code completion tasks within the CodeXGLUE benchmark. It provides unfinished code lines and their preceding context to evaluate model performance using exact match and edit similarity metrics.
A benchmark for commit message generation featuring code changes and English natural language descriptions across six programming languages: Java, Python, Go, JavaScript, PHP, and Ruby. It is constructed from GitHub repositories with permissive licenses to ensure reproducibility and legal compliance.
Recommendations of the Luhansk Regional Territorial Branch of the Antimonopoly Committee of Ukraine for 2018. The dataset originates from the States site of Ukraine and was last updated on December 11, 2018. The data is provided in a WORD file format.
The Stepanets Village Council in Ukraine's Cherkasy region published this set of budget program passports for the 2019 fiscal year. Each resource in the set is a separate passport, likely detailing planned expenditures and objectives for specific municipal programs. The data was last updated on the State site of Ukraine in May 2020.
Mukachevo, Ukraine, provides a record of official decisions made by its City Council's Executive Committee during the 2018 calendar year. The dataset is sourced from the States site of Ukraine and was last updated on the platform in September 2019. The specific content, column structure, and number of records are not detailed in the available metadata.
A list of all operating commissions, working groups, and committees under the executive committee of the Burshtyn City Council. The dataset provides basic information about these administrative bodies and was last updated on March 27, 2025. It originates from the States site of Ukraine and is available via the eu_open_data platform.
Berdyansk City Council in Ukraine provides reports on the implementation of passports for budgetary programs related to capital construction, reconstruction, and technical supervision. The data is sourced from the States site of Ukraine and was last updated on February 5, 2025. The specific content and scale of the reports are not detailed in the available metadata.
Records detail the satisfaction of requests for public information and the consideration of appeals submitted by citizens to the executive committee of the Burshtyn City Council. The dataset originates from the States site of Ukraine and was last updated on January 22, 2025. The specific volume of records and their internal structure are not detailed in the provided metadata.
EditPackFT is a dataset for training large language models on instructional code editing tasks. It was created by the author 'nuprl' and last updated on February 29, 2024. The dataset is derived from CommitPackFT and provides formatted training windows for code transformation.
Open-source hardware designs and debugging information categorized for hardware security research. These examples target hardware-specific security vulnerabilities within the LLM4SecHW framework to support the training of Large Language Models on hardware description languages.
European Directive 2007/60/EC on flood risk assessment and management requires the production of flood risk management plans. This dataset from the Bureau de Recherches Géologiques et Minières, last updated in 2019, describes homogeneous areas of economic activity to map flood exposure for the Saint-Dié and Baccarat regions. It is used to produce maps of exposed issues at an appropriate scale for flood risk management plans.
European Directive 2007/60/EC requires flood risk management plans to reduce impacts on health, environment, and economic activity. This dataset contains issues related to Water Framework Directive protected areas, produced for reporting under the Flood Directive by the Bureau de Recherches Géologiques et Minières. The data was last updated on March 29, 2019.
Homogeneous areas describing a type of economic activity on an IRR, produced for reporting purposes for the European Flood Directive. The data set is used to produce maps of exposed issues at an appropriate scale, contributing to flood risk management plans. It was produced by the Bureau de Recherches Géologiques et Minières and last updated on 2019-03-29.
A 2019 dataset from the Bureau de Recherches Géologiques et Minières (BRGM) used to produce maps of issues exposed to flooding at an appropriate scale. It supports flood risk management plans required by European Directive 2007/60/EC and French national law, aiming to reduce negative consequences on economic activity and other areas.
European Directive 2007/60/EC mandates flood risk management plans to reduce impacts on health, environment, heritage, and economic activity. This dataset from the Bureau de Recherches Géologiques et Minières provides homogeneous zones describing economic activity types to map flood exposure issues. It was last updated on April 1, 2019.
A table of quantitative issues reported for each analytical grid and flood scenario, produced for reporting under the European Flood Directive. The dataset was created by the Bureau de Recherches Géologiques et Minières and last updated on March 29, 2019. It is used to produce maps of exposed issues at an appropriate scale, contributing to flood risk management plans.