Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,721 datasets
Generated artifacts for the VANTAGE research project on speculative decoding for code editing. The dataset stores repository-relative paths and artifacts used by the paper and summarization scripts. It was created by faizancodes and last updated on June 2, 2026.
A registry of information assets from the Comptroller General of the Department of Córdoba, Colombia, for the 2020 fiscal year. The dataset is published by www.datos.gov.co and was last updated on 2026-05-18. It includes 12 columns describing the assets, such as their format, location, and classification.
Indicadores de Calidad (Enero - Diciembre 2019) contains all quality indicators for the year 2019 from the E.S.E Hospital Nuestra Señora de la Candelaria. The dataset is hosted on the Socrata platform via www.datos.gov.co and was last updated on 2026-05-18. Columns suggest monthly, semiannual, and annual performance data against set targets.
Freiburg im Breisgau's official development plan 1-045 for the Augustinerplatz area, provided as a Web Map Service (WMS). The dataset is published by the Bundesamt für Kartographie und Geodäsie. The last update date is unknown.
Historic Environment Records from Cornwall and Scilly document archaeological and historic building interventions, termed 'Events'. These records, often linked to planning conditions or academic research, are used to update the regional Historic Buildings, Sites and Monuments Record (HBSMR) and are contributed to the national OASIS project and Archaeological Data Service. The data is provided by the Government Digital Service under an Open Government Licence.
York Council conducted a public consultation between September and October 2015 to gather resident priorities for its Council Plan. Responses were collected via drop-in sessions at West Offices, an online survey, and questionnaires sent to partners and businesses. The published responses have had personal identifiers redacted to comply with data protection requirements.
Six new zircon U-Pb geochronological data points obtained via Sensitive High-Resolution Ion Micro Probe (SHRIMP) from plutonic igneous rocks in Tasmania. The data were collected between July 2012 and June 2013 by the collaborative Geochronology Project between Mineral Resources Tasmania and Geoscience Australia. Five samples are from the Eastern Tasmanian Terrane and one from the Western Tasmanian Terrane.
Version 3.2 of the 1km-resolution regional-scale biogeochemistry and sediments model for the Great Barrier Reef, forced by a 1km hydrodynamic model. The dataset was retired by its authors in February 2026 due to an error causing unrealistic Chlorophyll-a levels. The model ran in near-real-time mode, updating daily, until January 2024 when sensor damage halted river-flow data input.
Manually collected from the book Fath Al-Kabir Al-Muta‘al fi I‘rab Al-Mu‘allaqat Al-‘Ashr Al-Tiwal, this dataset provides detailed linguistic and semantic annotations for the complete Ten Mu‘allaqat poems. Each entry represents a single verse and includes fields for poet name, verse text, vocabulary explanation, meaning, and grammatical analysis. The dataset was created by SarahALo and last updated on Hugging Face in May 2026 to support Arabic Natural Language Processing and educational applications.
Data from March 2010 details permitted waste management sites in England and Wales. It combines standard permitting system fields with additional information from permits, re-categorizing sites into more helpful categories. The dataset includes permit references, operator names, site locations, permitted throughput, and activity descriptions.
Environment Agency waste permitting data at the end of March 2010. The dataset brings together standard permitting fields with additional information from permits and re-categorizes sites into more helpful categories. It includes details such as Permit Reference, Operator Name, Site Location, Maximum permitted throughput, and activity descriptions.
SAE-LWIR is the first publicly available dataset generated with MODTRAN for atmospheric compensation in standoff long-wave infrared hyperspectral imaging. The dataset supports the paper 'Set-Based Transformer for Atmospheric Compensation in Standoff LWIR Hyperspectral Imaging' presented at IGARSS 2026. It was created by researchers from the Universidad Industrial de Santander in Bucaramanga, Colombia.
811 Chinese tertiary EFL learners provided two waves of data for validating the L2 Emotion Regulation Strategies Questionnaire (L2ERSQ). Huiyuan Gu created this domain-specific instrument, which demonstrates a confirmed 7-factor structure and longitudinal measurement invariance. The dataset, last updated in 2026, supports research on emotion regulation in second language acquisition.
A 5.5 KB Excel file containing measures and items used in a multi-study research project on retail design. The dataset, authored by Mathias C. Streicher and last updated in April 2026, examines how in-aisle fixtures and shopping aids like carts influence purchasing behavior through spatial crowding and perceived control.
Antioquia, Colombia's list of public entities subject to audit by the General Comptroller's Office of Antioquia. The dataset includes entity names, locations, and identification codes. It is published by datos.gov.co and was last updated on 2026-05-18.
A 2026 study by the Australian Ocean Data Network details the fluvial deposition and geomorphology of the Lachlan River terminus. It describes three distinct depositional environments within the swamp: the sinuous Lachlan channel, the extensive Phragmites Marsh, and surrounding overflow areas. The analysis focuses on the river's low-gradient termination and sediment characteristics.
Eastern Australia's Great Cumbung Swamp, the terminus of the low-gradient Lachlan River, is documented in this scientific description. The Australian Ocean Data Network provides details on three distinct depositional environments: the Lachlan channel, Phragmites Marsh, and overflow areas. The record was last updated in April 2026.
Language Decoded Data is a multilingual code dataset for the Language Decoded project, part of Cohere's research. The dataset includes configurations for Phase 3 with sizes of 103k, 20k, and 5k rows for Conditions 1 and 2, and Phase 2 configurations remain available for reproducibility. It was last updated by user 'legesher' on Hugging Face on 2026-05-31.
5.4 MB of source data from a study on reversible phase transformations in manganese(II) chlorides. The data, authored by Aibo Li and shared under a CC-BY-4.0 license, supports findings on thermal quenching for high-precision information encryption and thermal energy storage applications. It was last updated on May 26, 2026.
Evaluation reports from Global Affairs Canada, periodically conducted to review the performance of programs and projects. The information gathered helps improve the design and implementation of upcoming international development initiatives. Each evaluation results in a report, with the dataset last updated on 2026-05-28.