Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
10,908 datasets
A subset of the pretraining data for the Being-H0.5 model, which focuses on scaling human-centric robot learning for cross-embodiment generalization. The dataset was created by the author group BeingBeyond and was last updated on the platform in April 2026. The full description and details are available on the original dataset page.
6,000 manually annotated social media comments collected from YouTube, Facebook, and Instagram. The dataset, Dz-Emotion, is the first large-scale resource for emotion detection in the Algerian Arabic dialect (Darija), labeled according to Ekman's six basic emotions. It was created by Houdna-khilouf and last updated on HuggingFace in April 2026.
Mars Express Radio Science data collected during the extended mission phase from 2015-01-01 to 2016-12-31. It is an occultation measurement covering a specific observation window on 2015-08-30. The dataset originates from the National Aeronautics and Space Administration (NASA).
Three vessels collected temperature-depth profiles in the Bering Sea and North Pacific Ocean from January 30 to March 3, 1982. The National Oceanographic Data Center processed the data into the standard C116/C118 format, which records temperature at non-uniform inflection points to define the profile curve. This dataset represents a specific historical snapshot of ocean conditions.
February 2 to March 10, 1973, oceanographic data was collected from R/V J.M. Gilliss and R/V C. Iselin cruises I7304 and S7303 in the Gulf of Mexico. The dataset contains high-resolution CTD/STD profiles processed to the NODC F022 standard, likely reporting temperature, salinity, density, and possibly dissolved oxygen at depth intervals as fine as 1 meter. Cruise information, position, date, time, and environmental conditions are reported for each station.
NCEI Accession 0161868 contains surface underway measurements of carbon dioxide and related oceanographic variables. Data were collected aboard the SOOP M/V Equinox in the Caribbean Sea, Gulf of Mexico, and North Atlantic Ocean from January 2, 2017, to January 6, 2018. The dataset includes mole fractions of CO2 in air and seawater, sea surface temperature, salinity, and calculated fugacity values.
Geoscience Australia Data reports the bathymetric expression of the Fitzroy River palaeochannel on the continental shelf of the southern Great Barrier Reef. The dataset, last updated on 2026-03 25, provides data on a major sediment transport pathway that differs from the previously discovered Burdekin palaeochannel by not being buried. It offers insights into the response of major rivers to sea level change in a mixed siliciclastic-carbonate sedimentary province.
Differentially expressed genes in Plasmodium falciparum 3D7 parasites at the ring and schizont stages. The dataset is a 27.7 KB Excel file authored by Jing Wu and last updated in April 2026.
RALP1-knockdown parasites at ring and schizont stages show differential gene expression, essential for schizont maturation and erythrocyte invasion. The dataset, authored by Jing Wu, is a 45.3 KB Excel file containing Table S3 from the related study. It was last updated on figshare in April 2026.
A 14.4 KB dataset quantifying c-Fos expression in the trigeminal nucleus of rat brainstems to validate nociceptive activation in migraine-like states. The data was authored by Pelin Kocdor and last updated on April 9, 2026. It is shared under a CC-BY-4.0 license on figshare.
49.8 KB of tabular data in XLSX format from a study on the cytochrome b5-like protein VdPBP1 in Verticillium dahliae. The dataset supports findings on how VdPBP1 mediates electron transfer in ergosterol biosynthesis to confer resistance to the fungicide terbinafine. It was authored by Huan Li and last updated in March 2026.
29 cruise data sets collected from the SOOP M/V Nuka Arctica lines in the North Atlantic Ocean from 2008-01-08 to 2009-01-07. The data include measurements of mole fraction of CO2 in the equilibrator headspace, barometric pressure, sea surface temperature, and fugacity of CO2 in sea water. These data were collected by researchers from the University of Bergen, Bjerknes Centre for Climate Research, and the University of Gothenburg.
A dataset of spatially resolved X-ray diffraction patterns from a high-pressure cerium hydride sample, produced using 4th generation synchrotron facilities. The data, uploaded by Lucas H. Francisco in March 2026, is 99.2 MB in size and is intended to challenge traditional analysis methods for identifying elusive crystal phases in colossal datasets.
9.4 MB of spatially resolved X-ray diffraction data from a high-pressure cerium hydride sample, produced using 4th generation synchrotron facilities. Lucas H. Francisco published this dataset on figshare in March 2026 to demonstrate a physics-informed clustering method for identifying elusive structural phases. The analysis framework is designed for colossal datasets where traditional methods and direct human inspection become unfeasible.
Bin Chen's experimental data supporting findings on photochemical nanomotors reversing anxiety- and depressive-related behaviors in rodents. The 12.6 MB dataset is available in XLSX format and was last updated on April 9, 2026. Source data are provided with the associated paper.
ReexpressAI created OpenVerification1, the first large-scale, open-source dataset for research on LLM output verification and uncertainty quantification. The dataset, last updated on 2026-04-25, is designed for binary classification of whether a model's response correctly addresses a given prompt or question.
AI4Privacy's PII-Masking-2M European release provides a preview of 50 sample entries from a dataset for masking Personal Health Information. The full dataset contains 200,000 entries, focusing on European coverage. This preview was last updated in April 2026.
AI4Privacy's PII-Masking-2M European release provides a preview of 50 sample entries from a dataset for masking Personal Health Information. The full dataset contains 200,000 entries, focusing on European coverage. This preview was last updated in April 2026.
Database from a manuscript on psychological uses of AI in adolescence. The dataset likely contains survey responses for scale development and cross-cultural invariance testing. It was authored by GALINDO DOMINGUEZ and hosted on Harvard Dataverse, with a last update recorded on 2026-05-21.
50 sample entries provide a preview of the PII-Masking-2M corpus, focusing on European work and HR information where personally identifiable information has been redacted. The full dataset contains 200,000 records, created by AI4Privacy and released in April 2026. This preview demonstrates the data structure and label distribution without exposing the original sensitive text.