Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,354 datasets
A continuous vector polygon layer of building footprints in Quebec, created from a bird's eye view. The dataset integrates geometries from various partnerships, produced using artificial intelligence and automated extraction algorithms, and includes raw (NC-0) and manually validated (NC-1) data.
A standardized grid identifies Quebec sectors with available flood zone information from old-generation maps and ratings. The dataset integrates three sources: territories flooded in 2017 and 2019, the 2017-2019 Special Intervention Zone, and flood zone mapping from MRCs. It is provided for informational purposes only and has no legal value.
Supra Titles 115K is a curated dataset of 115,000 chat titles designed for training models to generate concise, descriptive titles from a user's first message. It was created by SupraLabs and last updated on June 14, 2026. The dataset is derived from the training pipeline for the experimental Supra Title model family.
Greater London Authority conducted a survey of over 2,000 young Londoners in August 2020 regarding the planned suspension of the under-18 free travel card. The survey includes responses from nearly 400 care-experienced young people, highlighting impacts on education and access to services. The dataset was last updated on 2026-06-24.
Over 3,000 young Londoners aged 16-24 were surveyed in June 2020 in partnership with the Museum of London. The dataset focuses on key issues, concerns, and challenges affecting young people, and what is needed for them to thrive across the capital. The data is being reviewed and will be made available again soon.
Qualitative research and polling with Londoners was undertaken by the Greater London Authority's Opinion Research team in 2019. The research explored in-depth Londoners’ perspectives of the public realm to help inform the Mayor's Public London Charter. The report covers experiences and uses of public spaces, views on privately-owned public spaces, and views on the future of the public realm.
London-based qualitative and quantitative research conducted by the Greater London Authority's Opinion Research team in February 2022. The research explores public awareness, willingness, and information needs regarding home retrofitting. Focus groups were segmented by income and housing tenure.
Survey data from 1,089 London residents aged 18+ collected by YouGov for the Greater London Authority between 5th – 11th October 2022. The figures are weighted to be representative of all London adults. The dataset captures public opinion on solar energy adoption.
34.5 KB of data from two sediment cores (3100-1 and DH5-0) in the East China Sea's hypoxic zones, used to reconstruct multi-decadal to centennial-scale oxygen variability. The dataset, authored by Jiawei Kan and shared on figshare under CC-BY-4.0, estimates anoxygenic photosynthetic carbon fixation (AnP_C-F) from bacteriopheophytin a profiles. AnP_C-F increased by 1.79 times in core DH5-0 and 1.59 times in core 3100-1 between the period before 2000 and after 2000.
OPDC's Grants Data is published by the Greater London Authority. The dataset relates to the work of the Old Oak and Park Royal Development Corporation, a Mayoral Development Corporation focused on regeneration across Ealing, Brent, and Hammersmith & Fulham boroughs. Its last update was recorded as 2026-06-24 21:03:22.246589.
Dense-Set is a curated benchmark for evaluating text-to-image retrieval systems on challenging, visually dense scenes. It was created by AbdulmalekDS and published in 2026 alongside research presented at the ICML 2026 Workshop on Efficient Multimodal Question Answering. The benchmark comprises subsets extracted from the COCO and Flickr30K datasets.
Geoscience Australia Data published a legacy dataset on the morphology of part of the central New South Wales continental shelf. The data relates to offshore heavy-mineral prospects. No abstract or detailed description is available for this legacy product.
Vero-1.6M is an expanded reinforcement learning dataset containing 1.6 million samples curated from 59 source datasets for training vision-language models. It was created by zlab-princeton and last updated on June 11, 2026. The dataset is described as a fully open recipe for multi-task visual reasoning.
Very Long Baseline Interferometry (VLBI) Intensive Earth Orientation Parameter (EOPI) solutions are derived from 1-hour experiments measuring radio signals from distant quasars. The data series provides precise measurements of Earth rotation, length of day, polar motion, and station velocities. Products are contributed by analysis centers of the International VLBI Service for Geodesy and Astrometry (IVS) and support research in solid Earth science, tides, and fundamental astronomy.
SynthLabs Chat Final Cleaned V2 is a cleaned instruction-following chat dataset designed for supervised fine-tuning of reasoning-capable language models. Each example is a two-turn conversation with explicit chain-of-thought reasoning separated from the final answer. The dataset was authored by mkurman and last updated on June 20, 2026.
RULER is a benchmark designed to evaluate effective context length and long-context behavior beyond simple retrieval. The dataset contains pre-generated JSONL files organized by target context lengths of 4096, 8192, 16384, 32768, and 49152 tokens. It was authored by sxiong and last updated on June 19, 2026.
Prydz Bay in Antarctica contains a trough mouth fan built by the Lambert Glacier-Amery Ice Shelf system. The stratigraphy, derived from ODP Site 1167, indicates the bulk of the fan was deposited prior to 780,000 years ago, with few major debris flow deposits since. This data, sourced from Geoscience Australia, suggests extreme ice advances to the shelf edge ceased during the mid Pleistocene.
A hydrogeological inventory for the Money Shoal Basin, a large passive margin basin in northern Australia. The dataset contains descriptive attribute information grouped into themes including location, geology, hydrogeology, and groundwater management. It was published by Geoscience Australia and last updated on 2026-05-14.
Geoscience Australia conducted a regional mapping program targeting stratigraphic and structural exploration risk in the Triassic succession of the Roebuck Basin. The data pack includes seismic horizon grids and isochron grids generated from three key seismic horizons: TR10.0_SB, TR17.0_SB, and J10.0_SB. Fault maps compiled at the TR10.0_SB and J10.0_SB horizons are also included.
CN-3 is a potent RET inhibitor with IC50 values below 5 nM against all tested clinically relevant RET mutants, including solvent-front, gatekeeper, hinge, and catalytic loop variants. The dataset, shared by Zi-Xuan Wang on figshare in May 2026, includes results from cellular proliferation assays and kinase profiling. It shows CN-3 selectively suppressed RET-driven cell lines like TT and LC-2/ad and demonstrated dose-dependent antitumor efficacy in xenograft models.