Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,460 datasets
Estudiantes Admitidos en Programas Académicos - Institución Universitaria Colegio Mayor de Antioquia contains information on students admitted to academic programs at the Colegio Mayor de Antioquia university institution. The dataset includes details on the admission year, semester, and general characteristics of the admitted population, such as gender and academic program. It is published by www.datos.gov.co and was last updated on 2026-05-18.
Activity data used to create the national greenhouse gas inventory for Colombia. The dataset spans the timeline from 1990 to 2021 and is updated biennially according to statistical operations. It is hosted by the Colombian open data platform www.datos.gov.co.
Enrollment records from the Institución Universitaria Colegio Mayor de Antioquia detail student counts by academic program, semester, and municipality. The data includes columns for year, semester, program name, sex, and municipality, suggesting it can track enrollment trends. It was last updated on the Socrata platform in May 2026.
Enrollment records for students at the Institución Universitaria Colegio Mayor de Antioquia, provided by www.datos.gov.co. The data includes details on academic program, semester, year, and student characteristics like biological sex and residence. The dataset was last updated on 2026-05-18.
OSCAR_robot is the robot half of the training corpus for OSCAR, an Omni-Embodiment Action-Conditioned World Model for Robotics. It contains curated, filtered, and deduplicated multi-embodiment robot teleoperation episodes re-rendered into a unified conditioning format featuring a kinematic-skeleton overlay. The dataset was authored by zywu2115 and last updated on Hugging Face in June 2026.
Paragraphs from UN Security Council resolutions adopted since 2003 that contain language related to Weapons and Ammunition Management (WAM). Each row is a single paragraph classified by resolution type and thematic category, including Arms Embargoes and Disarmament. The dataset is provided by the United Nations Peace and Security Data Hub and was last updated in April 2026.
250 prompts designed to evaluate language models' ability to infer intent and generate complete, fully-featured implementations from minimal instructions. The dataset was created by syntropy-ai and was last updated on 2026-06-22.
Registro de Activos de Información is an inventory of public information generated, obtained, acquired, transformed, or controlled by the Bucaramanga Chamber of Commerce in fulfillment of its public registry function and administration of public resources. The dataset is available in multiple formats including CSV, JSON, XML, and RDF. It was last updated on May 18, 2026.
Tasa de cobertura educativa en el Departamento de Risaralda relates the number of enrolled students of the theoretical age for an educational level to the total projected population of that same age. The dataset includes columns for Ano (Year), Municipio (Municipality), Codigo_Municipio (Municipality Code), Tasa (Rate), and Variable. It is hosted by www.datos.gov.co and was last updated on 2026-05-18.
R code for generating forest plots, funnel plots, and risk of bias assessments for a systematic review. The code is provided in a 17.8 KB DOCX file by author Himel Mondal and was last updated in June 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
A 106.0 KB PDF document published on figshare by Sushil Acharya on 2026-04-20. It describes and likely contains normalized well-log datasets, including gamma ray, bulk density, neutron porosity, and compressional sonic slowness logs, used to test a semi-automated depth alignment workflow.
Chandra X-Ray Observatory observations include Performance Verification, calibration, and all subsequent cycles of Guaranteed Time and General Observer targets. The HEASARC updates this database twice-weekly by querying the Chandra X-Ray Center. NASA HEASARC provides this service, with data products available from the Chandra Data Archive for archived observations.
Wet Normering Top Incomes (WNT) data from the Dutch Ministry of the Interior and Kingdom Relations details remuneration exceeding standardized caps for top officials in public and semi-public sectors. The dataset aggregates 11 data sources covering designated and externally hired top officials, supervisory officers, and non-top officials for the years 2013, 2014, and 2015. It is published under a CC0-1.0 license.
The Index of Classified and Reserved Information is an inventory of public information generated, obtained, acquired, or controlled by the obligated entity that has been classified as confidential or reserved. It includes columns such as Tiempo que Cobija la Clasificación, Fecha de Calificación, Fundamento Jurídico de la Excepción, and Descripción. The dataset is hosted on the Colombian open data platform www.datos.gov.co and was last updated on 2026-05-18.
1.9 MB of experimental data on aromatic cation salts designed for photoinduced protein ligation. The dataset, shared by Pranab C. Saha on figshare, includes structure–reactivity relationships for probes enabling labeling with green light. It contains results from mass spectrometry-based proteomic analysis showing distinct protein enrichment from mitochondria and the endoplasmic reticulum.
IndicVoices-R-Malayalam is a subset of the IndicVoices-R multilingual text-to-speech corpus containing only Malayalam language data. The dataset comprises 31,106 audio samples totaling 79.74 hours of speech. It was created by the author 'trysem' and is hosted on Hugging Face, with a metadata timestamp of June 2026.
A program initiated in 2006 with $58.9 million in funding over five years to acquire pre-competitive geoscience data for onshore energy prospects in Australia. The dataset, from Geoscience Australia, includes seismic, gravity, geochemistry, heat flow, radiometric, magneto-telluric, and airborne electromagnetic data. It aims to support exploration for geothermal, petroleum, uranium, and thorium resources.
Registro de Activos de Información catalogs information assets generated or managed by the Institute for the Development of Antioquia (IDEA). The dataset includes metadata on categories, formats, languages, and availability. It was last updated on 2026-05-18 18:23:18 and is published on the Colombian open data portal.
Terminal-Lego-15k is a large-scale collection of Docker-verified, Terminal-Bench-style agentic tasks built from real StackOverflow technical issues. The dataset was created by SWE-Lego through a pipeline that filters questions, converts them via cascaded LLM generation, and retains tasks only after Docker round-trip verification. It was last updated on June 4, 2026.
Fifteen gravity cores were collected from Prydz Bay and the Mac.Robertson Shelf during Voyage 7 of the 1992/93 Antarctic shipping season. The Australian Ocean Data Network hosts this cruise preview report detailing a marine geoscience program aimed at studying Quaternary environmental change. Sampling targeted sites to elucidate sedimentation processes and past glacier behavior on the Antarctic shelf.