Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,597 datasets
A catalog of 3,561 blazars and candidate blazars, providing coordinates and multi-frequency data. The 5th edition, maintained by NASA HEASARC, includes 1,151 BZB, 1,909 BZQ, 274 BZG, and 227 BZU sources, with updates based on CDS archives. All sources have a confirmed radio band detection and, with noted exceptions, published spectroscopic information.
Isotope-based measurements from published studies quantify microbial metabolic responses to nitrogen addition across global terrestrial ecosystems. The dataset, compiled by Lei Zhang and shared on figshare in 2026, includes metrics for microbial carbon growth, respiration, nitrogen mineralization, and use efficiencies. It evaluates how ecosystem type, nitrogen addition rate, experimental duration, and soil properties regulate these context-dependent responses.
World Bank data on energy production, use, dependency, and efficiency for Armenia. The data is compiled by the World Bank from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset was last updated on 2026-04-27.
Han Zhao's dataset on figshare contains measurements from 141 poplar saplings across different ontogenetic stages. It quantifies aboveground biomass, biomass allocation, hydraulic resistance partitioning, leaf gas exchange, and water potential. The data supports empirical scaling rules for large-scale vegetation models.
Leonharper's Naime Corpus V1 is a multilingual text dataset for language model pre-training, containing approximately 28.1 billion tokens across over 38 million documents. The data is tokenized using the Qwen3-8B tokenizer and formatted into sequences of length 4096. It was last updated on Hugging Face in May 2026.
Benthic recycling accounted for 63% and 72% of the annualized nitrogen and phosphorus input, respectively, to Port Phillip Bay. Measurements of oxygen, ammonium, nitrate, phosphate, silicate, and other solutes were taken using benthic chambers at various sites during the summers of 1994 and 1995. The data, from Geoscience Australia, distinguishes four bay regions and quantifies nutrient regeneration rates and denitrification efficiency.
Mount Meager, British Columbia, Canada, is the source of ash used in rheology experiments. The dataset contains shear rate sweep measurements for monodisperse ash grain sizes of 500 µm, 250 µm, 125 µm, and 63 µm, tested across a range of volumetric gas flow rates. Data were generated using an Anton Paar MCR302 rotational rheometer with a powder flow cell, funded by NERC Grant NE/W003767/1.
105 articles from The Guardian and The New York Times, compiled by Stela Lechpammer for a chapter in the Bloomsbury volume '30 Years of Pokémon'. The corpus focuses on two key periods, 2016–2017 and 2022–2025, with five additional articles from the early 2000s for historical context. Each record includes article title, publication date, media outlet, URL, and a unique ID for replicability.
NASA's ISS-RapidScat project provides Ku-band scatterometer data from the International Space Station, covering latitudes from approximately 61 degrees North to 61 degrees South. The dataset contains Level 1B geo-located Sigma-0 measurements and antenna pulse geometries, derived from a complete historical re-processing for consistent calibration. Data are provided in single-orbit HDF-4 files and are intended for expert use.
Guarne, Antioquia, Colombia's E.S.E Hospital Nuestra Señora de la Candelaria recorded all consultation reasons for 2019. Data is classified by hospital service, age group, and gender, based on ICD-10 codes. The dataset is provided by www.datos.gov.co and was last updated in May 2026.
World Bank data on energy production, use, dependency, and efficiency for Argentina. The data is compiled by the World Bank from sources including the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset was last updated on 2026-04-27.
Western Australia's PBS Tandem Drive Network 2B.3 data provides geospatial information on road networks for heavy vehicles. The dataset is maintained by Main Roads Western Australia and updated weekly, ensuring current access to route and restriction details. Specific row and column counts are not provided in the input.
Main Roads Western Australia provides a weekly updated network dataset for oversize divisible vehicle access. The data includes route conditions and restrictions for heavy vehicles across Western Australia. Users must verify current accuracy via the official Main Roads WA website.
Public investment data for projects related to peace building and victim reparation in Colombia. The dataset covers 16 subregions and 170 prioritized municipalities from 2015 to 2019. It is published by the Colombian government via the datos.gov.co platform.
United Arab Emirates data on energy production, use, dependency, and efficiency compiled by the World Bank from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset is provided in CSV format and was last updated on 2026-04-27. It originates from the World Bank Group's data portal and is shared under a CC-BY-4.0 license.
World Bank Group data on energy production, use, dependency, and efficiency for Andorra, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset was last updated on 2026-04-27 15:55:01.982250. It is provided under a CC-BY-4.0 license.
PBS Tandem Drive Network data provides road access information for heavy vehicles in Western Australia. The dataset is maintained by Main Roads Western Australia and receives weekly updates to ensure current accuracy.
Albania energy and mining data compiled by the World Bank from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset covers topics such as energy production, use, dependency, and efficiency. It was last updated on 2026-04-27.
World Bank data on Angola's energy and mining sectors, compiled from sources like the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset covers topics such as energy production, use, dependency, and efficiency. It was last updated on 2026-04-27.
27.5 million bridge records for heavy vehicle route planning in Western Australia. The dataset is maintained by Main Roads Western Australia and updated weekly to reflect changes in the Restricted Access Vehicle (RAV) network.