Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,454 datasets
Australian bathymetry data collected by Geoscience Australia and other agencies. The dataset combines measurements from satellite altimetry, singlebeam echosounders, multibeam echosounders, and airborne laser systems (LADS). It was last updated on 2026-05-05.
A report generated from a periodic evaluation of Global Affairs Canada's priorities, programs, and projects. The evaluation serves as a management tool for reviewing program performance, with gathered information intended to improve the design and implementation of upcoming initiatives. The report is published by Global Affairs Canada and was last updated on 2026-05-28.
An inventory of public information generated, obtained, acquired, or controlled by the Municipality of Fusagasugá, Colombia, that has been classified as confidential or reserved under Law 1712 of 2014. The dataset is structured using a template from MINTIC and was last updated on May 18, 2026. It is published by www.datos.gov.co.
A supplementary file from a study evaluating a multi-party conversational system for social robots. The system, implemented on a Furhat robot, combines multimodal perception with a large language model and was tested with 30 participants across two interaction scenarios. The PDF document reports results including addressee accuracy and face recognition reliability from experiments conducted by author Giulio Antonio Abbo.
OPUS Neapolitan Translations provides nearly 1 million parallel translation examples across Italian, English, and Neapolitan. The dataset was created by author Gdacciaro, starting from an OPUS English-Italian parallel corpus and generating Neapolitan translations using a translation model. It was last updated on June 14, 2026.
4.5 MB of data files, R scripts, and HTML files from a study on numeral acquisition in Dutch kindergartners with and without suspected Developmental Language Disorder (DLD). The collection includes CSV files for tasks like Rote Counting, Tell Me, and Give Me, with scores, accuracy, and response categorizations. The dataset was authored by H.M. de Vries and last updated on April 9, 2026.
VSTAT is a video-based benchmark for evaluating the visual state tracking capability of Multimodal Large Language Models (MLLMs). It contains 834 video clips paired with 1,500 questions whose answers cannot be inferred from any single keyframe or short segment. The dataset was created by nyu-visionx and was last updated in June 2026.
A geospatial dataset provides a simplified representation of the Braunschweig urban area and its surroundings. The data is provided by the City of Braunschweig under the Data License Germany - Attribution - Version 2.0. The dataset is aggregated by the Bundesamt für Kartographie und Geodäsie.
A dataset from figshare authored by Laura M. Vowels, last updated on 2026-04 27. It contains results from Study 2, which examined participants' perceptions of large language model (LLM)-generated responses for psychosocial risk assessment. The 9.5 KB Excel file likely contains ratings on accuracy, empathy, and clinical usefulness across risk domains like suicide, intimate partner violence, and substance misuse.
Global Affairs Canada periodically conducts evaluations of its priorities, programs, and projects. These evaluation reports serve as a practical management tool for reviewing program performance and improving future program design and implementation. The reports are published by Global Affairs Canada and were last updated in May 2026.
Global Affairs Canada periodically conducts evaluations of its priorities, programs, and projects. A report is generated for each evaluation, serving as a practical management tool for reviewing performance. The information gathered helps improve the design and implementation of upcoming programs and initiatives.
A hybrid econometric analysis combines event-driven time-series and comparative cross-sectional data covering 19 economies and a World aggregate. The study proposes and calibrates five novel econometric models to assess the war's impact. The dataset was authored by Gabriel Osei Forkuo and last updated on April 27, 2026.
The UNP Index of Classified and Reserved Information is an inventory of public information generated, obtained, acquired, or controlled by the entity that has been classified as confidential or reserved. The dataset is published on the Colombian open data portal datos.gov.co and was last updated on 2026-05-18. It includes columns detailing the legal basis for classification, responsible offices, classification dates, and document descriptions.
ESRI grids provide sea salinity values interpolated to a 0.1-degree spaced grid across the Australian region. The data originates from the CARS2000 climatology, which synthesizes mean and seasonal fields from oceanographic archives like the World Ocean Atlas 98 and CSIRO Marine. CARS2000 maps resolve a mean value and annual sinusoid at each point, covering depths from 0 to 2000 meters.
25,500 km² of seabed on the northern Lord Howe Rise plateau in the Tasman Sea is mapped using high resolution multibeam bathymetry. The Australian Ocean Data Network provides this dataset, which interprets geomorphic units like ridges, valleys, volcanic peaks, and polygonal furrows. The data was last updated on 2026-05-05.
Source data reported in the figures of a 2024 Circulation paper titled 'Sustained but Decoyed Activation of the JAK1-STAT Pathway by Aberrant Protein Aggregation Exacerbates Proteotoxicity.' The dataset was authored by Xuejun Wang and published on figshare in May 2026. It is a single XLSX file sized at 813.7 KB.
FCA: Financial promotions quarterly data 2024 Q2 provides a summary of the UK Financial Conduct Authority's enforcement actions from 1 April 2024 to 30 June 2024. The data includes key messages, actions against authorized and unauthorized firms, and examples of work to ensure financial promotions are clear, fair, and not misleading. It is published by the Government Digital Service under the OGL-UK-3.0 license.
548 detailed plans, including an index map, document all types of occupancy for buildings and urban spaces in the City of Montreal in 1949. The dataset is provided by the Government and Municipalities of Québec and is available in CSV and XLS formats. It was last updated on April 17, 2026.
70 plans, including an index map, document all types of occupancy of buildings and urban spaces in the City of Montreal in 1949. The dataset is provided by the Archives de la Ville de Montréal and requires a specific credit mention for use. It is part of a collection of historical urban data from Montreal.
The Coral Sea Marine Park offshore northeastern Australia features deep and mesophotic seabed environments. Bathymetry data and seafloor imagery were collected by the Schmidt Ocean Institute's RV Falkor during surveys FK200830 and FK200902 in August and October 2020. The research was led by Geoscience Australia and James Cook University with multiple collaborative partners.