Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,673 datasets
ToolMind is a large-scale, high-quality tool-agentic dataset with 160,000 synthetic data instances generated using over 20,000 tools and 200,000 augmented open-source data instances. Its data synthesis pipeline constructs a function graph based on parameter correlations and uses a multi-agent framework to simulate realistic user–assistant–tool interactions. The dataset was created by mlx-community and was last updated on June 6, 2026.
Geospatial boundaries identify offshore U.S. areas where oil, gas, renewable energy, or mineral leasing is restricted by Presidential orders or acts like the National Marine Sanctuaries Act. The Department of the Interior's Bureau of Ocean Energy Management maintains this data, which was last updated in April 2026. It includes areas designated as Marine National Monuments and other withdrawals.
A 3D dataset of the Knightly Mill Road bridge in Augusta County, Virginia, originally constructed around 1915. Data was collected on 30 March 2026 using four FARO Focus 3D terrestrial LIDAR scanners and processed with FARO Scene and Autodesk ReCap software. The dataset was created by a team from the University of Virginia.
BELDA provides precomputed mention and entity descriptions for biomedical entity linking. The dataset includes descriptions for the XL-BEL corpus across 10 languages and the NCBI corpus in English. It was authored by wenliang liang and last updated on June 1, 2026.
Research data from figshare, authored by Haley N. Bridge, last updated on 2026-04-28. The dataset likely contains results from experiments on the enzymatic bromination of native peptides using flavin-dependent halogenases, followed by Suzuki–Miyaura cross-coupling for structural diversification. It includes analysis of diverse peptide sequences, including antimicrobial, cell-penetrating, and G protein-coupled receptor agonist scaffolds.
A methodology establishes metal zoning patterns using whole rock geochemistry from run-of-mine samples across 3 main deposits and 6 veins in the Keno Hill-Galena Hill mining camp. The model's essential character is defined by analyses of Ag, Pb, Zn, Ca, and the Zn/Ag ratio, with additional detail from Hg, Co, and Ni. A companion K-Ar age study indicates mineralization occurred approximately 87±2 million years ago.
Government of Yukon documentation details the geological prerequisites, concentration mechanisms, and evaluation techniques for placer gold deposits. The resource describes formation processes like erosion and accumulation in meander curves, alongside exploration methods using aerial photography and bulk sampling.
The British Geological Survey maintains one of the world's largest databases on mineral production and trade, covering more than 70 economically important mineral commodities. Annual production statistics by mass are recorded for individual countries, grouped by continent, with import and export data available up to 2002. The data is compiled from primary official sources and is used by government, industry, and researchers for policy, economic analysis, and commercial strategy.
A 1.1 GB dataset from figshare supports research into brain implants that measure neural magnetic fields. Computational modeling illustrates how dense neuron networks are easier to distinguish via magnetic spike templates. The dataset, authored by Ziad Ali and last updated in April 2026, is shared under a CC-BY-4.0 license.
Benthic chamber measurements of oxygen, ammonium, nitrate, nitrite, phosphate, silicate, TCO2, and alkalinity define solute exchange rates between sediment and water in Port Phillip Bay. Data from the summers of 1994 and 1995 across various sites show benthic recycling accounted for 63% and 72% of the annualized N and P input to the entire bay, respectively. The dataset, sourced from the Australian Ocean Data Network, also includes radon-222 and CsCl spike injection measurements to study bio-irrigation.
A bathymetry survey covering an area east of the Approaches to Newcastle, NSW, acquired from 4 December 2020 to 15 January 2021. The survey was conducted for the Australian Hydrographic Office by Guardian Geomatics using a Kongsberg EM 2040-07 multibeam echosounder. Data was processed with Caris Hips & Sips software and exported as a 30-meter resolution, 32-bit floating point GeoTIFF grid.
A cleaned mathematical supervised fine-tuning dataset designed for instruction tuning and mathematical capability adaptation. The dataset introduces a simplified instruction–response format and removes intermediate reasoning contamination. It was created by author kaushik-harsh-99 and was last updated on 2026-06-07.
A 2021 bathymetry survey of Lacepede Channel, Western Australia, acquired between 19 May and 22 September. The data was collected for the Australian Hydrographic Office by Fugro using a Kongsberg EM2040 Mk II multibeam sonar and processed with Caris Hips & Sips software. The final product is a 30-meter resolution, 32-bit floating point GeoTIFF grid.
Australia's Identified Mineral Resources 2010 provides estimates of the country's mineral resources as of December 2009, based on data from Geoscience Australia. The report compares these long-term resource estimates with short-to-medium term industry ore reserves and includes analysis of mineral exploration expenditures for 2008-09. Data on mine production is sourced from the Australian Bureau of Agricultural and Resource Economics and Sciences, with world rankings calculated from United States Geological Survey publications.
Supplementary materials and structured data from a systematic literature review on Renewable Energy Certificates (RECs), International RECs (I-RECs), and Guarantees of Origin (GOs). The review followed PRISMA 2020 guidelines and addressed five core research questions on investment, storage, regulation, demand, and business models. The dataset was created by Flavio Geraldo Nogueira and last updated in June 2026.
A metadata catalogue containing key information for datasets available on the Government of Canada's Open Data portal. It includes multiple flattened resources such as datasets metadata, resources metadata, and resource views metadata. The data was last updated in March 2026 by the Treasury Board of Canada Secretariat.
A computational study investigating the structural impacts of two genetic variants (Trp240Arg and Arg226Cys) in the IL2RG protein. The dataset, authored by Aswini S and last updated in April 2026, contains results from homology modeling, molecular dynamics simulations, and protein-protein docking analyses. It includes binding free energy calculations comparing wild-type and variant complexes with IL-2 and IL-21 cytokines.
A bitext corpus assembled from upstream releases of mtdata and OPUS projects. The dataset includes source and target sentences with ISO 639-3 language codes and origin sub-corpus identifiers. It was created by author natgillin and last updated on June 2, 2026.
Yulong Su published a 2.3 GB dataset on figshare in 2026 containing seismic waveform and event data. The collection includes SAC waveform files and Excel tables detailing 94 events for PKPPcP analysis and 57 events for PKPPcP–PKKPab phase pairs analysis, along with corresponding event-station pair information. The data is used to study de-degeneracy effects of specific seismic phases and implications for 3-D Earth's mantle heterogeneity.
Estimates to the nearest thousand of employed people in London who have more than one job. The data is derived from the UK Office for National Statistics' Annual Population Survey, with records starting from 2004. It is published by the Greater London Authority.