Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,460 datasets
Subsidies for rural and urban housing improvements assigned to applicants of the Cambia Mi Casa (CMC) and Mi Casa Ya (MCY) programs. The data is generated from the databases of the Subdirección de Subsidios de Vivienda Familiar (SSFV) and the Subdirección de Apoyo Técnico (SPAT) of the DIVIS within the Ministry of Housing, City and Territory. The last issuance date recorded in the metadata is 2026-05-05.
Seabed backscatter data covering 330 km^2 offshore from Cape Naturaliste to Cape Leeuwin was collected by Geoscience Australia in March 2020 and January-February 2021. The survey was a collaborative project with multiple universities and funded by the National Environmental Science Program Marine Biodiversity Hub. The dataset is a 4 m resolution 32-bit geotiff file processed from Kongsberg EM2040C multibeam sonar data.
March 2020 and January-February 2021 bathymetry data collected by Geoscience Australia in the South-west Corner Marine Park. The dataset covers 330 km^2 offshore from Cape Naturaliste to Cape Leeuwin and was produced from Kongsberg EM2040C multibeam sonar data processed with CARIS software. The survey was a collaborative project with several universities funded by the National Environmental Science Program to build baseline information for benthic habitats.
RedlineBench measures contract negotiation as a sequence of judgment calls. It captures multi-turn redlining workflows through simulations grounded in realistic SaaS transactions and attorney-generated explanations. The dataset was created by crosbylegal and was last updated on 2026-06-17.
Talan authored a dataset on figshare analyzing how geoeconomic fragmentation of critical mineral supply chains disrupts energy access and sustainable development across different income levels. The dataset is available as a 1.8 MB XLSX file and was last updated on June 4, 2026. It is shared under a CC-BY-4.0 license.
Australia's Identified Mineral Resources is an annual national assessment providing a long-term view of mineral resources available for mining. The 2012 revised edition includes evaluations of long-term trends, world rankings, exploration results, and mining industry developments. It was published by the Australian Ocean Data Network and last updated in June 2026.
Student records from the School Feeding Program (PAE) in the municipality of Sogamoso, Boyacá, Colombia. The dataset includes beneficiary counts, school and campus details, academic grade, and socioeconomic indicators. It was published on the Colombian open data portal and last updated on 2026-05-18.
The International Seabed Geomorphology Mapping Working Group (ISGM-WG) developed a two-part seabed geomorphology classification scheme. This product includes four linked vocabularies of terms to support standardized global seabed mapping, published by Geoscience Australia. The vocabularies were last updated on 2026-05-05.
A 1984 policy delineates 8 land use zones in Alberta's Eastern Slopes Region, classified under Protection, Resource Management, or Development. The digital representation was created from paper maps and regional digital files, with a positional accuracy of +/- 500 metres. The Government of Alberta maintains this dataset under the OGL-CA-2.0 license.
U.S. National Security Strategies from 2002 to 2022, analyzed for non-traditional security threats and resilience discourse. The dataset includes six policy texts, prevalence data for generating a figure, and thematic codes for resilience discourse and foreign policy traditions. Author Peter Ferguson contributed this replication data to the Foreign Policy Analysis Dataverse, last updated in June 2026.
An inventory of public information generated, received, and controlled by the ITRC Agency in fulfillment of its functions. The dataset contains metadata on published and available records, including categories, formats, and descriptions. It was last updated on 2026-05-18 and is hosted on the Colombian open data portal.
Abhay Kumar's study on figshare, last updated in April 2026, investigates nitrate removal from hazardous waste landfill leachate using an electrocoagulation process. The research includes experimental data on removal efficiency, energy consumption, and the influence of co-existing ions. The findings detail optimized operating conditions and kinetic modeling results.
2,246 vertical profiles of temperature and salinity were collected in the Beaufort Sea from August to October 2022. The data, sampled within ~200 km of the sea ice edge, are gridded onto a uniform 0.1 dbar pressure grid from the surface to 200 dbar. This dataset supports research on how meltwater-driven salinity anomalies affect ocean stratification and sea ice growth.
A literature review synthesizing current knowledge on hikikomori syndrome, a condition characterized by prolonged social withdrawal. The work examines proposed definitions, diagnostic criteria, epidemiology, risk factors, psychiatric comorbidities, and intervention approaches. It was authored by Jean-François Carmel and last updated on June 20, 2026.
Public information generated, obtained, acquired, or controlled by the Empresa de Servicios Públicos de Tocancipá S.A. ESP, which has been classified as confidential or reserved according to Law 1712 of 2014 (Transparency Law). The dataset includes columns for responsible personnel, classification dates, document series, and update frequency. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
IndicContextEval is a benchmark dataset for evaluating context utilisation in audio large language models. It contains 16,884 utterances totaling 55.93 hours of speech from 555 speakers across 8 Indic languages. The dataset, created by AI4Bharat, covers 23 professional domains and includes multiple speech styles and prompt levels.
Air quality measurements from the SISAIRE monitoring system in Colombia. The data spans from May 2003 to May 2021 and includes particulate matter readings. The dataset is hosted by the Colombian open data portal, www.datos.gov.co.
Datos.gov.co provides data on solid waste tonnage collected across 58 micro-routes, disaggregated by commune, month, and year. The dataset covers waste disposed at a sanitary landfill from January 2016 to October 2024. Columns include commune code, year, month, route order, weight in tons, micro-route identifier, and commune name.
A corpus-based study analyzing the frequency, choices, and tonal variations of epistemic and effective modals used by Chinese tertiary-level EFL learners in a national-level standardized speaking test. The dataset likely contains quantitative results from an opinion-giving task, compiled by Wei Wang. The data was last updated on April 24, 2026.
The Newcastle HF ocean radar system provides real-time sea water velocity data for the Central Coast of New South Wales, Australia. It consists of two SeaSonde stations at Sea Rocks and Red Head, operating at 5.2625 MHz with a maximum range of 200 km. The data is managed by the Australian Ocean Data Network and was last updated in May 2026.