Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,727 datasets
A metadata catalog describing information publications from the Municipal Ombudsman's Office (Personería Municipal) of Floridablanca, Colombia. The dataset includes columns for responsible party, generation date, language, title, update frequency, category, storage medium, and format. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on May 18, 2026.
Datos.gov.co provides data on part-time university faculty across multiple periods. The dataset includes columns for Total Horas, CATEGORIA, FORMACION, Periodo, PROGRAMA, and GENERO. It was last updated on 2026-05-18.
VIIRS/NPP Day/Night Band 6-Min L1B Swath SDR 750m NRT provides calibrated satellite imagery from the Suomi NPP satellite's Day/Night Band (DNB). The DNB is a panchromatic channel sensitive to visible and near-infrared radiation from 500 nm to 900 nm, enabling observation from daylight down to low-light conditions at night. Data products are generated from 6-minute swaths and feature on-orbit radiometric calibration with stray light corrections.
Keppel Bay's seabed morphology reveals the former path of the Fitzroy River across the continental shelf to a position now under approximately 60 m of water. The dataset likely contains information on sediment distribution, sub-bottom profiles, and palaeochannels, sourced from the Australian Ocean Data Network. It was last updated on 2026-04-16.
SGI-Bench is a scientist-aligned benchmark for evaluating Scientific General Intelligence (SGI) in large language models. It spans 10 disciplines and contains more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset was created by InternScience and was last updated on Hugging Face in June 2026.
Multi-Doc-2025 is a financial question-answering benchmark built from SEC Form 10-K annual reports of S&P 500 companies. The dataset is designed to evaluate retrieval-augmented generation and financial QA systems under cross-company, cross-year, and hybrid-modal reasoning settings. It was created by Anonymous-Team-HC-RAG and last updated on 2026-05-27.
5% of global greenhouse gas emissions originate from the chemicals industry, a sector considered difficult to abate. This dataset models decarbonization pathways for olefins, aromatics, methanol, ammonia, and chlor-alkali production across more than 2,600 facilities in North America, Europe, the Middle East, and China. The scenario-analysis study by Tubagus Aryandi Gunawan, last updated in April 2026, explores least-cost timelines under varying investment environments.
Interference areas for soil energy systems in Breda, likely indicating zones where installation or operation is restricted. The dataset is provided by the Dutch Ministry of the Interior and Kingdom Relations and is available in multiple geospatial formats. Its update frequency is irregular.
The Surat Basin in Australia contains six sedimentary cycles, each hundreds of metres thick, spanning the Jurassic and Cretaceous periods. The cycles, thought to result from global sea-level changes, are described in documents from the Australian Ocean Data Network. The dataset was last updated on 2026-04-16.
Audit findings from the Comptroller General of the Department of Córdoba, Colombia, for the 2019 and 2020 fiscal years. The dataset includes results categorized by fiscal, disciplinary, administrative, and penal findings, along with associated financial detriment values. It was published by the Colombian open data portal www.datos.gov.co and last updated on May 18, 2026.
Administrative staff records with updates for the 2019-1 period, sourced from datos.gov.co. The dataset includes columns for gender, cost center, academic formation, period, and job title. It was last updated on 2026-05-18.
A research paper and associated data presenting the Convolutional Neural Network-Gated Transformer Network (CT-GateNet) for music genre classification. The method was tested on three public datasets, achieving accuracies of 98.72% on GTZAN, 89.42% on FMA-SMALL, and 69.07% on FMA-Medium. The dataset, authored by Yunyan Ma and last updated in April 2026, includes a data augmentation strategy based on a denoising diffusion probabilistic model.
4.1 MB of source data and code supporting a manuscript on EU import tariffs for Chinese electric vehicles. The data includes Total Cost of Ownership calculations for all available EV models in 20 EU member states based on a 2024H1 baseline. Hao Dou authored this work, which uses a NEOCC model calibrated on 2019-2025 TCO data to project EV adoption through 2035.
Lineaments interpreted from a 100-meter resolution Digital Terrain Model using geophysical remote sensing techniques. The dataset was compiled by GHD for the Secure Allocation Future Entitlements (SAFE) Project report on geological structures and groundwater flow. It was last updated on April 9, 2026.
An inventory of information generated, obtained, acquired, or transformed by the obligated subject, with its corresponding publication scheme. The dataset includes columns such as 'La Información está Publicada?', 'Despacho', 'Tipo', 'Frecuencia de Actualización', 'Formato', 'Descripción', and 'Link de publicación'. It is hosted on the Colombian open data platform www.datos.gov.co and was last updated on 2026-05-18.
Heavy container trailers generate complex aerodynamic interactions affecting tyre wear particle dispersion. The dataset contains results from a computational fluid dynamics simulation and scaled-down experimental validation of a novel collection strategy. Weisong Wang published the study in April 2026.
Fractional cover estimates of photosynthetic vegetation, nonphotosynthetic vegetation, and exposed soil derived from Landsat imagery for two ranches in the Brazilian Amazon. The dataset contains six GeoTIFF files generated using spectral mixture analysis on 30-meter resolution images. It covers the period from 1996 to 2002 and is associated with the ORNL_CLOUD organization.
Haoyu Yang's dataset on figshare, last updated April 13, 2026, contains materials related to novel randomization designs for estimating causal effects under network interference. The 5.8 MB repository includes CSV files, R code, PDF documentation, and a ZIP archive. The work introduces and evaluates two methods, NetRR and NetMM, for experimental design in networked settings.
LBA-ECO ND-01 provides a georegistered Landsat time series for five areas in Rondonia, Brazil, from June 1975 to June 2000. The dataset includes 53 total scenes from Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus sensors, coregistered to Brazilian PRODES reference imagery. It was produced by ORNL_CLOUD and includes GeoTIFF files for individual spectral bands alongside calibration data.
6 columns describe the inventory of public information generated or controlled by the E.S.E. Hospital Regional de Chiquinquirá. The dataset is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18. It likely contains metadata about information categories, formats, and preservation methods.