Loading...
Loading...
Brain imaging (fMRI, EEG), neural recordings, connectome, cognitive experiments, psychology
1,787 datasets
12,336 participants from the Atherosclerosis Risk in Communities (ARIC) study were assessed for systemic inflammation using biomarkers like fibrinogen and C-reactive protein (CRP) during midlife. Their cognitive function in memory, executive function, and language was tracked over three visits spanning 20 years. The data supports analyses of associations between specific inflammatory markers and long-term cognitive trajectories.
Electrophysiological recordings capture the spontaneous and induced activity of dorsal horn neurons and primary afferents in mouse spinal cord preparations. Data were collected using multi-electrode arrays and suction electrodes, with signals stored via Spike2 software and spikes isolated with Kilosort2. The project, led by Ivan Rivera-Arconada, includes complementary histological data to study spinal cord circuit functionality.
Featuring magnetoencephalographic (MEG) recordings from subjects performing a visual discrimination task that dissociates attention direction from saccade direction. The data was collected to test how posterior alpha power tracks the locus and distribution of covert attention. The author is Akiko Ikkai, and the data was last updated in June 2020.
Full transcripts of user research interviews where the Claude AI assistant interviews people about their use of AI in work. The dataset contains multi-turn chat logs with explicit roles, focusing on human-AI interaction in professional contexts. It was created by Guilherme34 and last updated in December 2025.
A 3D electron microscopy dataset acquired from adult rat hippocampal area CA1, stratum radiatum. The data supports the paper 'Presynaptic vesicles supply membrane for axonal bouton enlargement during LTP' and includes serial section EM images, 3D Blender objects, and quantitative spreadsheets. The dataset was authored by Lyndsey Kirk and last updated on October 15, 2025.
A dataset of 197,180 question and answer pairs covering topics from a Bachelor level psychology curriculum. BoltMonkey created it using personal notes and several LLMs, with manual assessment for veracity and completeness. The dataset was last updated on June 27, 2024.
MegaMath is an open math pretraining dataset containing over 300 billion tokens, curated from diverse, math-focused sources. It was created by the LLM360 Team as part of TxT360, with data re-extracted from Common Crawl using math-oriented optimizations and filtering.
MegaScience contains 1.25 million instances for scientific reasoning post-training, released by GAIR-NLP in July 2025. The collection aggregates multiple public datasets using data selection methods optimized through ablation studies to improve model performance in STEM domains.
Depth contours for the South Florida, Cuba, and Bahamas region, digitized from a 1992 nautical chart. The dataset includes lines for 3, 10, 20, 50, 100, and 1000 fathom depths, plus the Bahamas shoreline. Data was created by the Florida Marine Research Institute and provided by NOAA's National Centers for Environmental Information.
CTD cast data from the R/V Meg L. Skansi captures oceanographic conditions in the Gulf of Mexico during September 2010. The Subsurface Monitoring Unit collected profiles of water density, salinity, temperature, and dissolved oxygen following the Deepwater Horizon oil spill. These quality-assured measurements document the marine environment's physical and chemical state in the immediate aftermath of a major environmental disaster.
MIKASA-Robo is a benchmark suite containing ready-made datasets and pre-trained oracle agent checkpoints for memory-intensive robotic manipulation tasks. The datasets and checkpoints were created by the author 'avanturist' and were last updated on May 11, 2025. Detailed instructions for dataset collection and descriptions are available on the project's GitHub repository.
Extracellular neuronal response activity recorded from the right hemisphere of eight mature cats (Felis catus). Data was collected using multichannel electrodes to measure responses to pure tones, noise bursts, frequency-modulated sweeps, and conspecific vocalizations within primary auditory cortex (A1) columns.
Serial multiplex immunogold labeling (siGOLD) data used to identify peptidergic neurons within a full-body serial-section transmission electron microscopy (ssTEM) connectome of a larval Platynereis annelid. The method employed 11 neuropeptide antibodies to overlay chemical neuromodulatory maps onto synaptic connectivity data. The dataset was created by Réza Shahidi and published in 2020.
Featuring simultaneous recordings from populations of retinal ganglion cells in response to stimuli with varying correlation structures, including natural movies and white noise checkerboards. It was created by Kristina D. Simmons and published in 2020 to investigate how stimulus structure affects neural redundancy and pairwise correlations.
Featuring electrophysiological recordings of hippocampal CA1 single neuron firing and theta activity from rat pups across three developmental stages (P17-19, P21-23, and P24-26). Collected by Jangjin Kim and published in 2020, the data tracks neural responses during six sessions of associative learning using tone and periorbital stimulation.
Comprising 220,000 text entries for training models to mask personally identifiable information (PII). It includes 27 distinct PII classes and targets 749 discussion subjects across education, health, and psychology domains. The dataset is authored by ai4privacy and was last updated in April 2024.
EEG data from 6 subjects recorded while viewing 2,000 images across 40 object classes sourced from ImageNet. The dataset was created by author luigi-s for the paper 'Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models' and was last updated on Hugging Face in October 2024. It is designed for use in ControlNet scenarios for linking brain activity to visual stimuli.
An Arabic language dataset designed to improve reasoning capabilities in AI models. It is derived from the 'cognitivecomputations/dolphin-r1' base dataset and was created by author Jr23xd23. The dataset was last updated on February 25, 2025.
A training dataset used in the paper 'VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models'. The dataset was created by Linslab and the associated paper was published on arXiv in June 2025. It is hosted on Hugging Face and was last updated on 2025-06-24.
The dataset is a curated collection of English mathematics text documents, created by OctoThinker and last updated in July 2025. It is derived from the MegaMath-Web corpus and annotated using the Llama-3.1-70B-instruct model. The dataset is intended for natural language processing and large language model training, with content filtered based on a quality scoring threshold.