Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,603 datasets
Western Australia radiometric survey data acquired in 2024 by the WA Government. The dataset consists of 415,090 line-kilometres of raw-edited point-located gamma-ray spectrometric measurements. It includes 256-channel gamma-ray spectra, raw window counts, and GNSS heights, supporting geological and environmental studies.
2024 Total Magnetic Intensity (TMI) data from the Narryer survey, consisting of 415,090 line-kilometres. The data was acquired by the WA Government and processed by Geoscience Australia Data to highlight subsurface geology. It has undergone multiple corrections including diurnal, geomagnetic reference field, and micro-levelling.
Geoscience Australia Data presents a critique of magnetostratigraphic coverage for the Phanerozoic eon. The work assesses constraints from the Cambrian to the Recent, noting varying levels of detail across geological periods. It was last updated on 2026-05-14.
Agenzia per l'Italia Digitale provides a regional technical cartography map series at a 1:5,000 scale. The map is based on traditional aerophotogrammetry methods and consists of 2,620 individual map sheets, or 'elements'. It was last updated in 1998 and provides a general representation of morphology, waters, vegetation, and man-made works.
A technical cartography of a regional territory created using traditional aerophotogrammetry methods. The map consists of 2620 individual 'elements' or sheets, produced by the Agenzia per l'Italia Digitale, with data last updated in 1979. It provides a detailed representation of morphology, waters, vegetation, and man-made works for topographic reference.
Mac-Robertson Land and Kemp Land in Antarctica are the focus of this geological dataset. It describes high-grade metamorphic rocks, likely containing information on their distribution and characteristics. The dataset is a legacy product published by the Australian Ocean Data Network, with a last recorded update in June 2026.
Til Corpus is a multilingual text dataset hosted on HuggingFace by TilQazyna. It contains a large volume of text data, with a tiered structure including raw, clean, and premium subsets. The dataset was last updated on 2026-06-20.
Derivative scene assets for a curated subset of clips from the Action100M dataset. The preview version is built by the Rice RobotPI Lab and is not for general release. Its schema and contents may change.
The Pan-Canadian Geoscience Strategy is a collaborative framework document developed by federal, provincial, and territorial geological surveys. It outlines five national priorities, including advancing framework geoscience and mineral potential modelling, and was created by the National Geological Surveys Committee. The document is published by the Government of Yukon under the OGL-CA-2.0 license.
Nogai Unified Corpus v1 is the largest publicly available, curated textual dataset for the critically endangered Nogai language. It was engineered by ansarzeinulla to solve data scarcity for this historically 'zero-resource' Turkic language, enabling machine learning tasks. The dataset was last updated on June 6, 2026.
Yao Shen's dataset, last updated April 2026, explores parameter choices for an improved size-consistent second-order Brillouin–Wigner perturbation theory (BW-s2). It contains test results on bond-dissociation curves and errors against reference data across 14 chemical energy difference datasets to recommend optimal nonempirical parameter choices.
A GIS analysis using 6 biophysical variables classified the ocean floor into 11 categories, comprising 53,713 polygons termed 'seascapes'. The dataset, hosted by the Australian Ocean Data Network, was last updated on 2026-04-16. It was created to objectively identify candidate sites for high seas marine protected areas.
Geoscience Australia collected seabed backscatter data covering 330 km^2 offshore from Cape Naturaliste to Cape Leeuwin. The data was acquired in March 2020 and January-February 2021 using a Kongsberg EM2040C multibeam sonar system and processed into a 4 m resolution geotiff. This project was a collaboration with several Australian universities and funded by the National Environmental Science Program to build baseline habitat information.
2019 audit results from the Comptroller General of Cauca, Colombia. The dataset likely contains findings from fiscal oversight processes, including administrative, fiscal, disciplinary, and penal observations. It is published on the datos.gov.co platform via Socrata and was last updated on 2026-05-18.
An inventory of public information assets generated, obtained, acquired, transformed, or controlled by the Colombian Aeronautics Industry Corporation (CIAC). The dataset includes columns for language, storage medium, publication status, and content description. It was last updated on 2026-05-18 and is available via the datos.gov.co platform.
Data from a 2026 study by Cheng Cheng on figshare, assessing composite hydrogels for periodontal bone defects. The dataset likely contains results from in vitro and in vivo experiments measuring physicochemical properties, antibacterial activity against Porphyromonas gingivalis, and osteogenic differentiation of rat bone marrow mesenchymal stem cells. The file is a 13.4 KB DOCX document.
61 veterans receiving care for chronic non-malignant pain and 38 control veterans completed computerized tasks probing attentional capture, delay discounting, and symptomatology. The dataset, authored by James M. Bjork and last updated in April 2026, likely contains behavioral and psychological assessment results. It was collected from participants within the Veterans Health Administration system.
170+ hours of streaming egocentric visual observations and 7K+ timestamped questions form this benchmark. It is designed to evaluate user-centric continual spatial intelligence in dynamic environments from a first-person point of view. The dataset was created by cocowy1 and was last updated on the Hugging Face platform in June 2026.
High-resolution multibeam bathymetry maps seabed geomorphology across approximately 25,500 km² of the northern Lord Howe Rise plateau in the Tasman Sea. The map, produced by Geoscience Australia Data, identifies geomorphic units including ridges, valleys, plateaus, volcanic peaks, and an extensive network of polygonal furrows. The data was last updated on 2026-04-30.
Five image folders named 2, 3, 4, 5, and 6 each correspond to a different Font Type used for rendering text. The dataset contains regional text masks that correspond to the generated images, intended for training and evaluating text rendering and layout control models. It was created by ArtmeScienceLab and last updated on 2026-05-28.