Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,903 datasets
1489 breath-specific measurements of esophageal catheter PmusPTP, following the removal of 97 outliers (6.1%) using Cook's method. The dataset compares measured values to those calculated with a modified Otis model, with data shown in cmH2O/sec. Author Guillermo Gutierrez published this 78.9 KB Excel file on figshare in April 2026.
Statistics Canada provides data on energy expenses for the mining sector at the national level. The data is categorized by energy source and North American Industry Classification System (NAICS) codes. The dataset was last updated on 2026-06-03.
Corpoboyacá, a Colombian environmental authority, provides records of administrative procedures for granting rights to use water from deep wells. The dataset includes applicant details, application status, geographic coordinates, and administrative act information. It was last updated on 2026-05-18 via the Socrata platform.
180 synthetic measurements of Wi-Fi signal strength (dBm) as a function of distance and obstacles. The dataset was generated by Rehalan, Chirag and published on Harvard Dataverse in June 2026. It covers distances from 1 to 30 meters and obstacles from 0 to 5 walls, incorporating realistic noise variations.
The Harris Greenstone Domain is a late Archean-Proterozoic terrane in the Gawler Craton of Australia. This dataset is a GeoPDF map produced by the Australian Ocean Data Network, last updated in 2026, which layers geological interpretations of aeromagnetic, gravity, and drillcore data. The map details lithological zones, magnetic features, and prospective mineral systems like Ni-Cu-PGE sulphide and lode-Au.
The MAPLand Act Easement Areas layer depicts federal interests in non-federal land, known as easements, that may provide public recreational access. It is managed by the U.S. Forest Service and includes easements granted to or reserved by agencies like BLM, NPS, FWS, USBR, USACE, and FS. The data represents some level of public access, but users should consult restriction information and agency details.
1.1 MB of data and code from experiments on an LLM-driven agent for the Storm Water Management Model (SWMM). The agent integrates geospatial awareness to automate parameter adjustment for urban flood simulation, tested in the Lili Town, Suzhou, China study area. The dataset, authored by Yani Zhong and last updated in April 2026, includes JSON and ZIP files.
Replication Data for "Citizen-Elite Toxicity and Political Equality Online" contains an analysis of Twitter conversations between citizens and candidates during the 2021 German national election. The dataset, authored by Jana Belschner for Perspectives on Politics, includes a full sample of 875,028 tweets to examine correlations between candidate identity, role, and behavior with the frequency, form, and consequences of toxic replies.
Trafford Council records of Public Health Funerals conducted under the Public Health (Control of Diseases) Act 1984. The dataset covers the period from the financial year 2015/16 through the second quarter of 2023/24 and is published with a Freedom of Information disclaimer regarding information limitations. Data is provided by actual date as well as financial and calendar year.
A 1.7 GB dataset of images processed by the 3Snet-CLID computational super-resolution method. The method integrates a hybrid supervised/self-supervised deep learning network for denoising with Richardson–Lucy deconvolution. It was created by Fudong Xue and last updated on May 5, 2026.
Eight seismic profiles totalling about 2000 km, plus bathymetric data, were collected in February 1992 to assess seabed morphology and offshore mineral resources around Christmas Island. The Australian Ocean Data Network compiled a new 1:1,000,000 scale bathymetric map using this data and digital depths from the USA National Geophysical Data Bank. This map provides more detail on seamount distribution and the Java Trench structure than older 1970s-era maps.
April 2026 snapshot of Australian facilities that process mined materials into usable products. The dataset was compiled by Geoscience Australia and its predecessor organisations. It provides location, plant type, processing types, processed commodities, and processing output for each plant, where known.
A review document authored by Tan-Huy Chu, uploaded to figshare in May 2026, provides an overview of therapeutic cancer vaccines (TCVs). The text traverses fundamental concepts, epitope spreading, antigen selection, delivery platforms, and the current clinical landscape. It specifically examines the transition from monotherapy to combination regimens and proposes a translational concept for targeting minimal residual disease.
Weekly, daily, and sub-daily products provide precise satellite orbits and clock data generated by International GNSS Service (IGS) Analysis Centers. These combined orbit and clock solutions are used to determine precise station coordinates, gravity field parameters, and Earth orientation parameters. The data is hosted by NASA's CDDIS and represents the official IGS final products.
NASA CDDIS hosts precise satellite orbit and clock products from the International GNSS Service (IGS). These products are generated by IGS Analysis Centers on sub-daily (ultra-rapid), daily (rapid), and weekly (final) schedules, providing satellite position, velocity, and clock data. The data supports applications requiring high-precision geodetic measurements, including station coordinate determination and Earth orientation parameter estimation.
A PDF document synthesizing evidence on sham transcranial direct current stimulation (tDCS) controls from January 1, 2010, to August 31, 2024, with a targeted update through September 2025. Authored by Milos Ljubisavljevic and shared on figshare under a CC-BY-4.0 license, it reviews blinding integrity and test–retest reliability. The document proposes five specification-grade recommendations to improve sham quality in mechanistic and clinical research.
A database compiling information on electricity generation and energy use for all remote communities in Canada. Natural Resources Canada aggregates this data from multiple sources including Statistics Canada, provincial governments, utilities, and public reports. The dataset was last updated on 2026-04-27.
A research dataset accompanies a paper proposing Act-Env Matching, a method for positioning life-sized avatars in augmented reality telepresence. The method uses a multimodal large language model to determine avatar placement based on a remote user's activity label and a video of the local environment. The dataset, authored by Hideki Deguchi and last updated in May 2026, likely contains experimental results and demonstration materials supporting the proposed system.
Simon De Jaegher's dataset contains gut microbiota sequencing results from 152 individuals, including 37 Alzheimer's disease patients, 65 Parkinson's disease patients, and 50 age-matched healthy controls. The data was generated using full-length 16S rRNA gene sequencing and analyzed with a unified bioinformatic and statistical framework. It was last updated on May 12, 2026.
38 raw thermogravimetric measurements of portland cement pastes hydrated with carbonic anhydrase enzyme at four CO₂ partial pressures and three ages. The data, produced by Zakira, Umme and deposited in the Texas Data Repository, includes derived summary tables of equivalent-CaCO₃ and Ca(OH)₂ contents computed by a documented tangent-line method.