Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,339 datasets
An unpublished database analyzing front pages and opinion columns from major-circulation newspapers for gender stereotypes targeting presidential candidates Sheinbaum and Gálvez. The 189.1 KB XLSX file served as the basis for academic publications and conference presentations. It was last updated on 2026-05-15.
A meta-epidemiological study protocol analyzes publication delays in systematic reviews. The dataset likely contains records of interventional, RCT-based meta-analyses published in top-tier general medical journals and the Cochrane Database of Systematic Reviews between 2023 and 2025. Jia Song authored this protocol, which was uploaded to figshare in April 2026.
A cleaned Wikipedia corpus combines Serbian and Croatian Wikipedia articles. Croatian text has been transliterated to Cyrillic script, and wiki markup, infoboxes, and stub articles have been removed. The corpus was compiled by RafaelUI and is available on Hugging Face.
A geospatial analysis compares the levelized cost of heat and carbon removal for three decarbonized thermal energy sources across the United States. The study uses detailed process models for sedimentary basin geothermal, concentrated solar, and heat pump technologies, with sorbent-based direct air capture as a case study. The dataset was authored by Caleb H. Geissler and last updated on April 28, 2026.
Experimental data for a series of novel podophyllotoxin derivatives designed to target LAT1 transporters for esophageal cancer treatment. The dataset includes results for the lead compound B11, showing a 64.6% tumor growth inhibition in mice and a more than 4-fold improvement in tolerability compared to etoposide. The data was authored by Manwei Jia and last updated on 2026-04 14.
Wistar rats (Rattus norvegicus) were the source for isolated cardiac mitochondria used to study the direct effects of dapagliflozin. The dataset, authored by Itanna Isis Araújo de Souza and last updated in May 2026, contains findings on oxygen consumption, ATP production, ROS generation, and membrane potential. It is shared under a CC-BY-4.0 license as a 1.9 MB DOCX file.
Lord Howe Island shelf in NSW was surveyed by Geoscience Australia in 2008. The survey mapped seabed bathymetry and characterized benthic environments using sediment sampling, rock coring, underwater video, and current measurements. The lh_back_8m grid is a processed backscatter product covering 1034 sq km, derived from EM300 data.
Australian marine physical environmental data includes metadata for 37 variables collated by the Marine Biodiversity Hub. Bathymetry, geomorphology, seabed sediment, and seabed exposure data were produced by Geoscience Australia, while bottom-water and surface-water parameters were produced by CSIRO. All data were transformed to a common datum (WGS84) and gridded at a 0.01-degree cell size.
Geomorphological features of the Great Artesian Basin, including offshore extents beneath the Gulf of Carpentaria. The dataset classifies features into five categories based on depositional environment: Marine, Fluvial, Aeolian, Playa-lacustrine, and Erosional terrain. It was produced by Geoscience Australia and is available via the Australian Ocean Data Network.
The Australian Ocean Data Network hosts a collection of abstracts for academic papers on sulphide ore formation in sedimentary rocks. The abstracts cover topics including models of ore formation, metal sources, lead isotopic systematics, and diagenetic mineralization, with specific references to deposits like Mount Isa and Coxco. The dataset was last updated on 2026-05-05.
VisReason is a large-scale dataset designed to advance visual Chain-of-Thought reasoning in multimodal large language models. It supervises a human-like, global-to-local reasoning process where models first form a holistic hypothesis about a scene before iteratively zooming into salient regions. The dataset was created by Y-Research-Group and was last updated on June 21, 2026.
A 5.5 KB Excel file containing a list of symbols related to smart charging algorithms for electric vehicles. The dataset was created by Felix Wieberneit and last updated on April 22, 2026. It supports research demonstrating a potential 37% annual reduction in carbon intensity from controlled EV charging.
Sarah Hornfeck's dataset, last updated April 22, 2026, presents sgRNAs used for generating stable Kis. The 5.5 KB Excel file contains data from a study highlighting the importance of analyzing proteins at endogenous levels, showing colocalization of Rab11 and LAMP1 varied drastically between endogenous and ectopic expression conditions.
An initial value algorithm examines the time-dependent evolution of electromagnetic fields from oblique scattering of bounded pulses from an infinite planar dielectric interface. The qubit lattice algorithm (QLA) is utilized, which is almost fully unitary, leading to excellent conservation of electromagnetic energy. The dataset was created by Min Soe, George Vahala, Linda Vahala, Efstratios Koukoutsis, Abhay K. Ram, and Kyriakos Hizanidis and was last updated on June 23, 2026.
A free preview subset of a larger proprietary dataset developed by Egomnia S.p.A. The data consists of a raw Italian text corpus derived from content sourced from the italia.progettotalia.it website. The full dataset is not included in this repository and can be purchased separately.
Harris Greenstone Domain GIS data delineates a late Archean-Proterozoic tectonostratigraphic terrane within South Australia's Gawler Craton. The dataset characterizes the Archean Harris Greenstone Belt, including komatiite, basalt, and banded iron formation, metamorphosed during the ~2440 Ma Sleafordian Orogeny. Its interpretation is based on aeromagnetic and gravity surveys, supplemented by diamond drillcore, to map structures beneath thin Quaternary and Eocene cover.
Geoscience Australia conducted a marine survey on the Lord Howe Island shelf in 2008 to map seabed bathymetry and characterize benthic environments. This dataset provides feeding guild counts per sample, aggregated from species-level data collected during that survey. The data and samples were acquired using the National Facility Research Vessel Southern Surveyor.
A Persian (Farsi) question–answer corpus for social engineering and cybersecurity derived from curated knowledge articles extracted from authoritative reference books. The dataset was created by author smd20 and last updated on June 13, 2026. It is designed for supervised fine-tuning, retrieval-augmented generation evaluation, and domain-specific language model benchmarking.
Results from geological mapping, aerial imagery collection, and field observations at Antarctic Specially Protected Area (ASPA) No. 143 Marine Plain are presented. The dataset likely contains polygons outlining recommended helicopter landing areas and surficial geology maps derived from aerial photos, satellite imagery, and a digital elevation model. The work was presented at the SCAR Open Science Conference 2024 and is hosted by the Australian Ocean Data Network.
A 2000/2001 regional seafloor mapping study by Geoscience Australia's South and Southwest Regional Project. The report delineates four major geomorphological features and defines five acoustic echo facies for the Great Australian Bight area, digitized into a GIS. It was produced to support future Regional Marine Planning.