Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,603 datasets
Geoscience Australia Data provides a geological map covering an area of approximately 800 square miles in the Western Highlands of New Guinea. The data originates from a reconnaissance geological survey conducted in 1959, describing rock formations from the Jurassic to the Pleistocene periods. The survey notes a large basic pluton as a source of gold, platinum, and minor copper mineralisation.
Geoscience Australia Data provides reduced Total Magnetic Intensity (TMI) point-located data from the Narryer survey. The dataset consists of 415,090 line-kilometres of data acquired in 2024 by the WA Government. It has been processed with corrections for manoeuvre noise, diurnal variations, and the geomagnetic reference field.
A synthetic benchmark dataset of 14,500 observations supports the reproducibility of an AI-enhanced life cycle assessment and total cost of ownership workflow for sustainable supply chains. It includes variables such as route distances, payload factors, energy consumption, emissions, and cost modifiers. The dataset was created by Tomasz Neumann and last updated on figshare in April 2026.
A public information registry from the Municipal Institute of Education for Work and Human Development of Yumbo (IMETY) in Colombia. The dataset inventories information assets generated or controlled by the institute, including document series and descriptions. It was last updated on 2026-05-18 and is available via the datos.gov.co platform.
Visual Resource Management (VRM) polygons from the Bureau of Land Management aim to protect scenic values on public lands by minimizing visual impacts from surface-disturbing activities. The dataset is provided by the U.S. Department of the Interior and was last updated in April 2026. It is available in multiple geospatial formats including GeoJSON, KML, and CSV.
5028 Block 2 (Southeast) raw-edited radiometric point-located data were acquired in 2024 by the WA Government. The dataset consists of 415,090 line-kilometres of gamma-ray spectrometric data collected at 100m line spacing and 50m terrain clearance. It includes raw 256-channel gamma-ray spectra, raw window counts, and GNSS heights.
Samples from the Rudiger Point-Cape Ruge area, New Britain, reveal two distinct age groups, challenging previous assumptions of a conformable sequence. A volcanolithic sandstone sample, NG34B, is identified as the youngest marine sediment recognized in New Britain, dating to the late Pliocene-middle Pleistocene. The dataset, from Geoscience Australia, correlates planktonic Zone N.18 with a normally magnetized interval.
HK-LegiCoST is a three-way parallel corpus containing over 600 hours of Cantonese audio, aligned with standard traditional Chinese transcripts and English translations at the sentence level. It was created by researchers including Cihan Xiao, Henry Li Xinyuan, Jinyi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, and Sanjeev Khudanpur, with a paper published on arXiv in 2023. The dataset is hosted on Hugging Face by the user Borrison.
SOFA Score data likely contains clinical measurements for assessing organ failure in intensive care settings. The dataset is a 5.5 KB Excel file authored by Adolfo Ruiz-Sanmartín and last updated on April 24, 2026. It is hosted on figshare under a CC-BY-4.0 license.
One of 27 constituent maps from the 'Australia's Maritime Jurisdiction Map Series' (GeoCat 71789). This map depicts Australia's extended continental shelf approved by the Commission on the Limits of the Continental Shelf in April 2008, along with treaties and various maritime zones around Heard Island and McDonald Islands. The background uses bathymetric data from Smith and Sandwell (1997) and land imagery from NASA's Blue Marble.
Legacy product from the Australian Ocean Data Network, last updated on 2026-06-16. The dataset consists of notes from a geomorphological study of Christmas Island in the Indian Ocean. Available file formats are HTML and PDF, but no abstract or sample data is provided.
A statistical analysis dataset comparing the net reclassification improvement (NRI) of a new minimal residual disease (MRD) threshold (0.05%) against a traditional threshold (0.1%) at the end of treatment course 2. The dataset, authored by Xiong-yu Liao and shared on figshare, contains results from a multivariable model comparison with significance assessed via bootstrap resampling (1,000 iterations). It was last updated on May 8, 2026.
Net reclassification improvement analysis compares a new minimal residual disease threshold of 0.05% to a traditional 0.1% threshold. The 5.5 KB Excel file contains results from a multivariable model comparison, with statistical significance assessed via 1,000 bootstrap iterations. Authored by Xiong-yu Liao and last updated in May 2026, it is shared under a CC-BY-4.0 license on figshare.
93.5% mean agreement was achieved by large language models during deductive coding against an expert consensus adjudication. This dataset contains results from a blinded mixed-methods comparison of two LLMs (ChatGPT-5 and Claude 4 Sonnet), an LLM-based coding application, and human analysts on a focus-group transcript. It was authored by Callum Hill and published on figshare in April 2026.
93.5% mean agreement was achieved by large language models during deductive coding of a focus-group transcript against an expert consensus adjudication. Callum Hill published this dataset in 2026, containing detailed performance metrics comparing LLMs and human analysts on qualitative coding tasks. The data supports a methodological framework for metric-based comparisons in thematic analysis.
A 2026 collation of previously disparate information on Australia's near-pristine estuaries, summarized on a state-by-state and national basis. The information was compiled by Geoscience Australia from scientific articles, reports, government data, and expert advice. The dataset emphasizes current knowledge and management practices.
415,090 line-kilometres of raw-edited Total Magnetic Intensity (TMI) data acquired in 2024 by the WA Government over the Narryer survey area. The dataset includes raw and compensated TMI measurements, diurnal variations, fluxgate magnetometer readings, and raw altimeter and GNSS heights. This line dataset was collected at 100m line spacing and 50m terrain clearance for geological mapping and mineral exploration.
Geoscience Australia Data presents results from a 1971 marine geophysical survey and stratigraphic drilling campaign on the Queensland Plateau. The data includes acoustic stratigraphy from seismic reflection profiling tied to a drill site, used to deduce the sedimentary and structural history of the plateau. The analysis compares the plateau's evolution to generalized rift-margin models.
An inventory of public information generated, obtained, acquired, transformed, and controlled by the Barranquilla University Institution (IUB) that has been selected as classified and reserved. The dataset includes columns such as 'Nombre del responsable de la información', 'Fecha de la calificación', and 'Plazo de clasificación o reserva'. It is hosted on the Colombian open data platform www.datos.gov.co and was last updated on 2026-05-18.
Western Australia radiometric survey data acquired in 2024 by the WA Government. The dataset consists of 415,090 line-kilometres of raw-edited point-located gamma-ray spectrometric measurements. It includes 256-channel gamma-ray spectra, raw window counts, and GNSS heights, supporting geological and environmental studies.