Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,787 datasets
A study by Simon Langener, uploaded to figshare in 2026, evaluating the use of Embodied Conversational Agents (ECAs) in Immersive Virtual Reality to simulate peer pressure to drink alcohol. The dataset, 120.1 MB in size, contains results from a repeated measures experiment with twenty patients with Mild to Borderline Intellectual Disability and Alcohol Use Disorder. It assesses the ECA's persuasiveness and effects on perception, emotional state, and coping behavior using actor-recorded dialogues in dominant-friendly and dominant-hostile styles.
Geoscience Australia conducted a regional mapping program addressing stratigraphic and structural exploration risk in the Triassic succession of the Roebuck Basin. The data pack comprises seismic horizon grids and isochron grids generated from the TR10.0_SB, TR17.0_SB, and J10.0_SB horizons, alongside fault maps. Seismic horizons were mapped using 2D and 3D surveys, including AGSO s110, AGSO s120, PGS New Dawn, and 3D surveys like Admiral and Beagle.
20 generated text records related to entheogenic spirituality and psychedelics use. The dataset was created by the author 'chaosste' using Unsloth Recipe Studio and was last updated on the Hugging Face platform in May 2026.
NASA's AERDB_M3_ABI_G17 dataset provides monthly aggregated aerosol measurements from the GOES-17 satellite's Advanced Baseline Imager. Its 48 Science Data Set layers include statistical summaries like the mean and standard deviation of daily aerosol optical depth, calculated from at least three valid daily observations per grid cell. This Level-3 product offers a 1 x 1-degree resolution global grid, with an initial release covering May 2019 through April 2020.
Reporte Estadísticas Tramite Registro Nacional Automotor provides administrative data on vehicle registry transactions in Colombia. The dataset includes columns such as Número Placa (license plate) and DESCRIPCION TRAMITE (transaction description). It was last updated on the datos.gov.co platform in May 2026, with data starting from January 2025.
NASA's AERDB_D3_ABI_G17 dataset provides daily Level-3 aerosol optical depth measurements from the GOES-17 satellite's Advanced Baseline Imager. The data is aggregated to a 1x1 degree global grid, with each cell representing the arithmetic mean of at least three quality-filtered retrievals per day. This product contains 48 Science Data Set layers in netCDF4 format and is part of a 12-product suite from the ESROGSS-funded project.
Hanjiang Dong constructed a dataset of energy flows across air routes using 2,150,481 monthly flight records from 2014 to 2024, sourced from Cirium. The dataset covers scheduled and performed flights in China and includes similarity-based topological features for candidate edges. Multi-class labels for edge status changes (Addition, Removal, Retention) are assigned by comparing an edge's status at consecutive time points.
Ashmore Reef and Cartier Island Marine Parks in Western Australia are covered by this bathymetric dataset. The data was derived from WorldView-3 multispectral satellite imagery using a physics-based inversion method by EOMAP Australia Pty Ltd and EOMAP GmbH & Co.KG. It was acquired in 2022-2023 as part of the Australian Government's Marine Parks Grant - Round 3.
A 2024 marine survey collected bathymetry, seabed feature, and shallow geology data along a proposed subsea cable route within Australia's South-west Corner Marine Park. Geoscience Australia Data published this dataset, which was acquired by EGS using the RV Bold Explorer vessel. The dataset is not to be used for navigational purposes.
Geoscience Australia Data provides a hydrogeological inventory for the Money Shoal Basin, a large passive margin basin in northern Australia. The dataset groups descriptive attributes into themes including location, geology, hydrogeology, and land use. The sedimentary succession spans from the Mesozoic to the Cenozoic era, reaching a maximum thickness of 4,500 meters.
Anthropomorphic AI is an open-source research toolkit for authoring and interacting with embodied Intelligent Virtual Agents (IVAs) in extended reality (XR). The toolkit, authored by Ke Li and last updated on 2026-04-20, provides rich multimodal capabilities including speech, gaze, gestures, facial expressions, and vision. It was evaluated through four use case demonstrations and two pilot evaluations in immersive VR.
The Laura Basin in Australia contains descriptive hydrogeological and geological information for areas bounded by spatial groundwater features. The dataset groups attributes into themes including location, geology, hydrogeology, groundwater management, and land use. It was published by Geoscience Australia Data and was last updated in April 2026.
A sedimentological analysis of Keppel Bay, a macrotidal embayment linking the Fitzroy River to the Great Barrier Reef shelf. The dataset, from Geoscience Australia Data, includes seabed samples, shear-stress modelling, and three-dimensional acoustic imaging results. It characterizes sediment transport pathways, tidal sand ridges, and subaqueous dunes in this mixed wave- and tide-dominated system.
INFORME GENERAL CURSOS DEL PUNTO VIVE DIGITAL DEL MUNICIPIO DE CHIA contains registration data for digital skills courses offered by the Punto Vive Digital (PVD) center in Chía, Colombia, for the first semester of 2023. The dataset is hosted on the Colombian open data portal www.datos.gov.co and was last updated on May 18, 2026. It includes columns for participant demographics and course details.
Geoscience Australia Data published a study on April 30, 2026, confirming the presence of dolomite and magnesite in living crustose coralline algae for the first time. The research uses chemical micro-analysis to identify three mineral phases—magnesium calcite, dolomite, and magnesite—within the algae skeleton. A mass balance approach quantifies the potential for dolomitization and links it to dolomite found in a raised Pleistocene reef.
A 62.2 MB dataset from figshare, last updated 2026-05-18, contains numerical simulation results for a multi-buoy floating wave energy converter. Author cong zhang analyzed motion response, mooring tension, power output, and capture width ratio under varying ocean parameters. The dataset is shared under a CC-BY-4.0 license.
A database of 45 noble-gas containing molecules provides structures and bond energies calculated using high-level quantum chemistry methods. The structures were calculated by CCSD(T)/aug-cc-pVTZ methods and bond energies were obtained using CCSD(T)/CBS. Many wavefunction-based and density functional theory methods have been benchmarked against these 45 accurate bond energies.
British Geological Survey data from a 2.95-meter-long lake-sediment core (YC2) from Yaal Chac, Mexico. The dataset contains an interannual to sub-centennial resolution record of carbonate oxygen and carbon isotopes, bulk sediment geochemistry, and sedimentology. Data were published in Metcalfe et al (2022) Quaternary Science Reviews and dated using radiocarbon and short-lived radio-isotopes.
Mineralogical data for carbonate-bearing fluorapatite from the Bukusu, Catalao II, Sokli, Kovdor, and Glenover carbonatite complexes. The dataset is associated with a 2021 research report published in Mineralium Deposita. It originates from a NERC-funded project and is hosted by the British Geological Survey.
ChemCoTBench-V2 is a public 5,620-sample active benchmark for evaluating chemical reasoning in large language models. The dataset, created by fresnellll, evaluates both final-answer correctness and process-level reasoning, pairing model-facing inputs with verified formal reasoning traces. It was last updated on June 3, 2026.