Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
39,941 datasets
Argo is a global partnership deploying over 3,000 autonomous profiling floats across the world's ice-free oceans. These floats measure temperature and salinity from the upper 2000 meters of the water column approximately every 10 days, transmitting data via satellite. Argo Australia contributes to this effort by maintaining an array of floats providing real-time observations of the oceans surrounding Australia.
A dataset of 3,591 subnational administrative divisions across 34 geospatial variables accompanies a paper on statistical guardrails for web-based cartogram generation. It includes 17 intensive and 17 extensive variables for testing extensivity violations in area cartograms. The data and R analysis pipeline were published by Adi Singhania on figshare under a CC-BY-4.0 license.
45 radiocarbon results from 11 sites show sea level fell smoothly from +1 meter 6000 years ago to its present position. The dataset examines fringing reefs and storm ridges between 14° and 20° S to reconstruct Holocene environmental changes. It was contributed by the Australian Ocean Data Network and last updated in 2026.
From March to July 2023, a cross-sectional survey of 4,303 participants in Ordos, Inner Mongolia, collected data on pollen-induced allergic rhinitis. The dataset, authored by Ting-ting Ma and published under CC-BY-4.0, includes prevalence rates across ecological zones and results from skin prick tests for 16 allergens.
Five distinct acoustic facies were defined for the Great Australian Bight seabed based on 3.5kHz echo-sounding records and high-resolution seismic profiles. This regional seafloor mapping study was conducted by Geoscience Australia during 2000/2001 to support marine planning. The results, including delineated geomorphological features like the continental shelf and slope, have been digitized into a GIS.
The Great Artesian Basin and its offshore extents beneath the Gulf of Carpentaria are classified by geomorphological features. The dataset groups features into categories based on depositional environment, including marine, fluvial, aeolian, playa-lacustrine, and erosional terrain. It was published by Geoscience Australia and is accessible via the Australian Ocean Data Network.
Vitrinite reflectance measurements from 0.55% to 0.93% Rmax and Rock-Eval Tmax values from 421°C to 447°C were collected from Permian sandstone reservoirs in the Northern Denison Trough, Bowen Basin, Australia. The data was generated by Geoscience Australia as part of a study to evaluate CO2 storage potential and understand thermal maturity. The dataset was last updated on 2026-05-14.
Four large language models were evaluated on their ability to generate personalized exercise prescriptions using the FITT-VP framework. Claude 3.7 achieved the highest total score of 50.23 out of 60, while DeepSeek R1 scored the lowest at 40.30. The dataset, authored by Huan Feng and last updated in May 2026, contains the study results and analysis in a 374.6 KB document.
An index calculated by the Unidad de Planificación Rural Agropecuaria (UPRA) to estimate and identify areas with potential informal land tenure at the property level across Colombia. The index is based on properties meeting criteria such as lacking a property registration number or having recorded false ownership. Its calculation does not follow a pre-established periodicity but serves as a preliminary analysis tool for a specific moment in the country.
A dataset from figshare, authored by Qibang Sui and last updated on June 3, 2026, describes the discovery of Werner Syndrome RecQ Helicase (WRN) inhibitors. It details the rational design of two compound classes, spirocyclic compounds and benzo-fused heterocyclic analogs, and the in vitro and in vivo evaluation of a lead candidate named Q15. The dataset includes results on potency, selectivity, oral pharmacokinetics, and antitumor efficacy in models of microsatellite instability-high tumors.
72155736 bytes of supplementary materials accompany a probabilistic framework for extracting sentiment from text. The method uses a joint semi-parametric model on text and ordinal responses, employing word screening and normalized ranks to achieve consistent sentiment ranking without full model estimation. Its application to Dow Jones Newswires demonstrates effectiveness in extracting return-predictive signals.
Logs from the State Library of Queensland's Unstacked interface, a real-time visualization of user interactions with the online catalogue. The dataset includes bibliographic information about viewed items and activity details for each view, such as date/time and access location. It is published by the State Library of Queensland under a CC-BY-4.0 license.
Namibia is the geographic focus of this dataset from an open-guise study on social meaning and linguistic choice. It contains evaluations by Namibian German speakers of a speaker using Standard German and Namibian German variants. The dataset was authored by Antje Sauermann and last updated on 2026-05-29.
A research paper presents findings from an open-guise study examining social meaning evoked by linguistic choices among the German-speaking minority in Namibia. The study evaluated Namibian German speakers' perceptions of a speaker using Standard German, lexical borrowings, and grammatical changes characteristic of Namibian German. The dataset likely contains the evaluation ratings and associated social meaning data from this study.
An on-demand service provides four elevation datasets for New South Wales, Australia. Datasets include a 2-metre contour model for the full state and 5-metre digital elevation, slope, and aspect models for Western NSW. The data is produced by DCS Spatial Services using LiDAR and photogrammetric sources.
ATom cloud and coarse aerosol data contains particle size distributions and cloud type classifications measured by the University of Vienna's CAPS instrument aboard the NASA DC-8 aircraft. The dataset covers four campaigns conducted between 2016 and 2018, providing 1 Hz resolution classifications for cloud-free, aerosol-cloud transition, liquid, mixed-phase, and cirrus cloud regimes. It is produced by the National Aeronautics and Space Administration.
1210 records compiled for 22 distinct fault categories in a two-level voltage source inverter driving an induction motor load. The dataset was acquired on an experimental bench and includes features derived from 3-phase current signals via empirical mode decomposition. Raja Singh R published the dataset on figshare under a CC-BY-4.0 license.
Four intensive field-observation periods from 1986 to 1992 provide coordinated satellite, airborne, and surface data on cirrus and marine stratocumulus clouds. This product was designed to improve cloud and radiation parameterizations used in general circulation models (GCMs). The MAPS analysis system combines profiler, ACARS, surface, and radiosonde data with a 3-hour forecast model to generate parameters like Pressure, Montgomery Streamfunction, and Wind Speed.
Twenty-four distinct terrestrial biosphere models contributed estimates for six core ecosystem variables across 47 eddy covariance flux tower sites in North America. The data represents a benchmark for model inter-comparison within the North American Carbon Program, with outputs standardized into NetCDF format from original submissions. A related processed dataset provides gap-filled observations and aggregated time steps for direct evaluation.
NASA's CDDIS provides daily, 10-second granularity satellite and receiver clock products derived from the International GNSS Service Real-Time Service. These products are combination solutions from multiple global analysis centers, covering GPS, GLONASS, Galileo, Beidou, and other systems since 2011. The dataset is generated from real-time data streams and formatted according to the RTCM SSR standard.