Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,493 datasets
A 15-kilometer resolution geospatial dataset maps land cover across the Former Soviet Union. It contains sixty distinct land cover classes, with a specific focus on forest types accounting for 38 of those classes. The dataset was produced by the National Aeronautics and Space Administration and covers the period from 1984 to 1993.
Forest cover data provides a 1:2 million scale map for the Krasnoyarsk Region in Russia, distinguishing thirty-two land cover classes. The dataset was digitized from maps published in the Atlas of Forests of the USSR in 1973. It is hosted by the National Aeronautics and Space Administration and is available across multiple platforms.
LongVQUBench is a benchmark for evaluating long-term video quality understanding in large vision-language models. It features 1,200 videos and 1,500 QA pairs across three hierarchical evaluation levels. The dataset was created by Aarna004 and was last updated on 2026-06-23.
A 2004 feasibility study commissioned by SenterNovem and conducted by TNO-NITG investigates geothermal heat extraction from the Slochteren Sandstone formation for the city expansion Groningen Meerstad. The report includes an inventory of subsurface aquifers, mapping of future heat demand, a proposed doublet location considering the Groningen gas field, and a global cost estimate. The dataset is a PDF report published under a CC-BY-4.0 license by the Dutch Ministry of the Interior and Kingdom Relations.
Analytic dataset used in a study investigating associations between serum vitamins A, C, D, and E and liver fat accumulation in lean and non-lean adults. The data originates from the 2017โ2018 National Health and Nutrition Examination Survey (NHANES) and was analyzed by Caijuan Hong. The dataset is 182.0 KB in size and was last updated in April 2026.
Global Affairs Canada periodically conducts evaluations of its priorities, programs, and projects. This collection contains the evaluation reports for the Canada Fund for Local Initiatives (CFLI) from the 2015-16 to 2020-21 fiscal years. The reports serve as a management tool for reviewing program performance and informing future design and implementation.
NASA's AVIRIS-NG instrument collected this Level 1A dataset of unrectified surface radiance as part of the Surface Biology and Geology High-Frequency Time Series (SHIFT) campaign. The data covers nearly 1,656 square kilometers in Santa Barbara County, California, and nearby coastal waters, captured at approximately weekly intervals. It includes radiance images, geometric lookup tables, and observation parameters across the 380-2510 nm spectral range at 5-nm intervals.
LEDGER โ Long-Context KPI Question Answering & Page Retrieval is part of the LEDGER benchmark for evaluating long-context document understanding. It supports page-level retrieval tasks where a natural-language question about a financial KPI must be answered from an annual report, with TREC-style graded relevance judgments provided. The dataset was created by artefactory and was last updated on June 7, 2026.
Six coal seams from the 2011 Victorian coal model area have average quality distributions for six key chemical and energy measures. The dataset includes metrics like Total Iron, Acid Extractible Sodium, Moisture, Net Wet Specific Energy, Total Ash, and Sulphur percentages. It was published by the Department of Energy, Environment and Climate Action and last updated in April 2026.
A collection of official classifiers from the State Customs Service of Ukraine, used in the process of registering customs declarations. The dataset includes classifiers for customs regimes, types of declarations, modes of transport, units of measurement, and penalties, among others. It is sourced from the States site of Ukraine and was last updated on 2026-05-05.
REGISTRO DE PUBLICACIONES CONTRALORรA GENERAL DEL MUNICIPIO DE MANIZALES is a log of content publications made by the Municipal Comptroller's Office of Manizales on its official website. The dataset includes records from 2020 onwards and is published by the Colombian open data portal www.datos.gov.co. It was last updated on May 18, 2026.
A study by Maลกa Skelin Klemen, last updated April 30, 2026, investigated the effects of chestnut-derived ellagitannins (FT50) on metabolic dysfunction in mice. The dataset likely contains body weight, adiposity, glucose tolerance, insulin levels, and pancreatic beta cell activation metrics from C57BL/6J male mice fed a western diet with or without FT50 supplementation for 12 weeks. The 145.6 KB PDF file is licensed under CC-BY-4.0.
LAXMAYDAY's Anime Inpainting Data contains 9,734 reproducibly sampled anime images for reference-based image completion tasks. Each sample includes a deterministic partial crop of a source image, an English instruction prompt, and the complete target image. The dataset was generated with seed 251100 and was last updated on June 15, 2026.
72 historical monochrome maps depict Alberta's landscape from aerial photographs taken between 1949 and 1952. The maps display the Alberta Township System, hydrographic features, highways, roads, pipelines, transmission lines, and municipalities. They are provided by the Government of Alberta in PDF, TIF, and geo-referenced PNG formats, but coverage for the province is incomplete.
1949 to 1952 aerial photographs form the basis for this historical map series covering Alberta. The Government of Alberta provides these maps, which display township systems, hydrography, roads, pipelines, and municipalities. Maps are available in PDF, TIF, and some geo-referenced PNG formats, but coverage is incomplete and not GIS-ready.
The General Status of Alberta Wild Species Report provides the basis for this official statistic, which calculates the percentage of assessed wild species considered 'at risk'. This performance measure supports commitments under the Accord for the Protection of Species at Risk, an agreement by provincial, territorial, and federal wildlife ministers. The data is produced by the Government of Alberta and was last updated in April 2026.
Planimetric Series maps initiated in 1949 display Alberta Township System, hydrographic features, highways, roads, pipelines, transmission lines, and municipalities. The maps are derived from aerial photographs taken between 1949 and 1952 and are provided by the Government of Alberta. Coverage for the province is incomplete and it is not known if further coverage will be added.
21,500 question instances derived from 2,150 facts extracted from English Wikipedia summaries. The dataset was created by Google and last updated on Hugging Face in June 2026. Each fact is structured as a proposition between a subject and an object entity.
74 monochrome maps from the Planimetric Series, initiated in 1949 and derived from aerial photographs taken between 1949 and 1952. The maps display the Alberta Township System, hydrographic features, provincial highways, roads, pipelines, transmission lines, and municipalities. They are provided by the Government of Alberta in PDF and TIF formats, with some geo-referenced PNG files also available.
Alberta's historical Planimetric Series maps, initiated in 1949 and derived from aerial photographs taken between 1949 and 1952. The maps display township systems, hydrographic features, highways, roads, pipelines, transmission lines, and municipalities. They are provided by the Government of Alberta in PDF, TIF, and geo-referenced PNG formats.