EMBED: Racially Diverse Mammography Dataset with 60,000 Annotated Lesions
Available on 1 platform
Sign in to view source links and access this dataset
Description
3.4 million screening and diagnostic mammography images from 110,000 patients collected between 2013 and 2020, with equal representation of Black and White women. The dataset includes 2D, synthetic 2D (C-view), and 3D (DBT) images, with 60,000 lesions linked to structured descriptors and pathologic outcomes. This release represents 20% of the total 2D and C-view data, with DBT, US, and MRI exams to be added later.
Use Cases
Training image classification models for lesion detection based on the 60,000 annotated lesions.
Developing algorithms to predict pathologic outcome severity based on the six severity classes linked to lesions.
Benchmarking model performance across racial groups based on the equal representation of Black and White women.
Researching multi-modal imaging analysis using the combination of 2D, synthetic 2D, and 3D (DBT) images.
Strengths
Large scale with 3.4 million images from 110,000 patients.
Includes 60,000 annotated lesions with linked pathologic outcomes.
Designed for racial diversity with equal representation of Black and White women.
Contains multiple imaging modalities: 2D, synthetic 2D (C-view), and 3D (DBT).
Limitations
This release is only 20% of the total 2D and C-view dataset; DBT, US, and MRI exams are not yet included.
Column-level documentation is absent; field semantics must be inferred after download.
Last update date is unknown; freshness unverified.
Provenance
Source
aws_open_data
Collection Method
Clinical mammography images collected from screening and diagnostic exams, with assistance from Glendor, Inc and MD.ai for de-identification.
Time Range
2013-2020
Freshness
Data collected from 2013-2020; last update date for the dataset release is unknown.
Geography
null
Available under a Custom License for research use only.