Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,938 datasets
World Bank data on energy production, use, dependency, and efficiency for Somalia. The data is compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. It was last updated on HDX in May 2026.
Simulation parameters for evaluating the EDCC-RPL routing protocol in IoT networks. The dataset, created by Muhammad Asif Habib and last updated in April 2026, is a 5.5 KB Excel file. It likely contains the input parameters for simulations that demonstrated up to 32% lower energy consumption and 18% higher packet delivery ratio.
A 5.5 KB dataset comparing routing protocol objective functions for low-power IoT networks. The data was created by Muhammad Asif Habib and last updated on April 21, 2026. It supports a study proposing the EDCC-RPL algorithm, which integrates Expected Transmission Count, delay, and child count metrics.
World Bank data on energy production, use, dependency, and efficiency for Somalia. The data is compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset was last updated on HDX in May 2026.
55,035 Polish-language posts from X/Twitter containing the keywords 'Ukrainians' and 'in Poland' collected during the first year of Russia's full-scale invasion of Ukraine in 2022. The dataset was created by Tomasz Piróg and triangulates frame analysis with stance detection, network analysis, and engagement measurement. It was last updated on 2026-05-05.
Quarterly data reported by Finagro on mandatory investments in TDA Classes A and B, as required by Resolution 3 of 2000 from the Central Bank of Colombia's board. The dataset contains total required investment amounts for different types of credit establishments, with values reported in thousands of Colombian pesos. The data is published by the Colombian government's open data portal, datos.gov.co, with a last update timestamp of 2026-05-18.
CORPOBOYACA HISTÓRICO EXPEDIENTES CONCESIÓN DE AGUAS SUPERFICIALES is a dataset of administrative records for surface water concession permits, managed by the Grupo de Gestión Integral del Recurso Hídrico of Corpoboyacá. The data includes columns for application status, applicant identification, municipality, and geospatial coordinates (X, Y). It was last updated on 2026-05-18 16:41:31 and is available via the socrata platform on datos.gov.co.
A synthetic dataset simulating candidate profiles for technology jobs in a Brazilian context, combining technical and demographic features. It includes nine partitions across three sizes (1k, 5k, 10k instances) and three bias conditions (debiased, biased, extreme bias). The dataset was created by Carvalho and last updated on May 11, 2026.
Experimental data from a study on a novel crystalloid protein in the malaria parasite Plasmodium yoelii. The dataset, authored by Mayumi Tachibana and shared under CC-BY-4.0, was last updated on 2026-05-18. It likely contains quantitative and qualitative results from targeted gene disruption experiments assessing impacts on parasite organelle structure and infectivity.
Mayumi Tachibana's research data, last updated May 2026, investigates a novel crystalloid protein in Plasmodium yoelii. The dataset likely contains experimental results showing that disruption of the pycryph2 gene leads to irregular crystalloid microstructure and significantly reduces sporozoite invasion of mosquito salivary glands and mouse liver. These findings indicate the protein's role in forming well-organized crystalloids and maturing functional sporozoites.
A 71.6 KB Excel file from figshare contains experimental data on a novel crystalloid protein in the malaria parasite Plasmodium yoelii. The dataset, authored by Mayumi Tachibana and last updated in May 2026, likely contains results from targeted gene disruption of the pycryph2 gene. It describes the protein's role in crystalloid organelle structure and its impact on sporozoite infectivity in mosquitoes and mice.
Experimental data from a study on a novel crystalloid protein, PyCryPH2, in the malaria parasite Plasmodium yoelii. The dataset, authored by Mayumi Tachibana and last updated in May 2026, likely contains measurements related to the disruption of the pycryph2 gene and its effects on crystalloid microstructure and sporozoite infectivity in mice and mosquitoes. The findings indicate PyCryPH2's role in forming well-organized crystalloids and contributing to functional sporozoite maturation.
Jonathan Niyorukundo's research data on breeding colored sweet corn hybrids. The dataset likely contains phenotypic and biochemical measurements from nine colored sweet corn inbreds and 20 hybrids, selected for color, sweetness, and texture. The study demonstrates the accumulation of carotenoid and flavonoid pigments by the prime eating stage.
5,293 UK invertebrate, bryophyte, and lichen species have annual occupancy and trend estimates from 1970 to 2015. Data were generated from observations collated by UK recording societies using a Bayesian occupancy model, producing posterior samples, summary statistics, and annual growth rates. Estimates are provided at the country level for England, Scotland, Wales, Northern Ireland, and for the UK and Great Britain.
1,526 online reviews for seven commercially available peel-and-stick vinyl floor tile products from Amazon. The dataset was processed through domain-specific cleaning, lexicon construction, and a Naïve Bayes classifier by Ziqing Lin and last updated in May 2026. Reviews span different brands, price tiers, thicknesses, and visual patterns.
Geotiff files of seabed bathymetry and backscatter data for the Solitary Islands Marine Park, acquired by the New South Wales government between 31 August 2022 and 31 July 2023. The data was collected using an R2Sonic 2022 multibeam sonar system onboard the RV Bombora and processed to a 5-meter resolution. This dataset provides a baseline for mapping seabed types and is part of the SeabedNSW program.
The North Wollongong (Bellambi Point to Stanwell Park), NSW Bathymetry Acquisition (20170028S) dataset contains 5m resolution bathymetry and backscatter data for the seabed. The NSW Department of Planning and Environment acquired the data using a multibeam sonar onboard the RV Bombora between August 13, 2017, and March 4, 2022. This baseline dataset was created to map the spatial distribution of seabed types as part of the SeabedNSW program.
101 high-resolution satellite images from four Arctic Ocean sites were classified to map melt ponds on sea ice during three summers. The dataset includes tables of pond coverage and size statistics for 500-meter grid cells, derived from surface type maps with a 1-meter resolution. Data are stored in Excel, ASCII, and GeoTIFF formats, providing both processed statistics and the original image-derived products.
Monthly historical tariffs applied by EPM for water and sewage services in the residential sector from January 2017 to May 2026. The tariff-setting process follows guidelines from the Water and Basic Sanitation Regulation Commission (CRA). The dataset includes basic tariff components for services in Medellín and other municipalities served by EPM in the Aburrá Valley.
45,031,396 documents across 41 European languages provide a multilingual pretraining corpus. The data is built from HPLT Monolingual v3 web crawl sources and spans Germanic, Romance, Slavic, Celtic, Baltic, Finno-Ugric, Greek, and other language families. Every document has an HPLT WDS quality score of 10 or higher.