Amazon Product Reviews with Sentiment Polarity, 35 Million Records
by Xiang Zhang, Junbo Zhao, Yann LeCun
arff
Available on 1 platform
Sign in to view source links and access this dataset
Description
Amazon reviews data from the Stanford Network Analysis Project (SNAP) includes 34,686,770 reviews from 6,643,669 users on 2,441,053 products spanning 18 years up to March 2013. The provided subset contains 1,800,000 training and 200,000 testing samples labeled with polarity. Authors Xiang Zhang, Junbo Zhao, and Yann LeCun published related research in 2015.
Use Cases
Train sentiment classification models based on labeled polarity (positive/negative) for review text.
Analyze language patterns and topics in customer feedback based on review title and body text.
Benchmark text-based recommendation algorithms based on the integration of user, product, and review data described in the source paper.
Strengths
Large scale with over 34 million reviews, 6.6 million users, and 2.4 million products.
Includes an 18-year time range, providing longitudinal perspective.
Contains explicit sentiment labels (polarity) for a subset of 2 million samples.
Limitations
Column-level documentation is absent; field semantics beyond the three listed must be inferred after download.
Last update date is unknown; data is current only up to March 2013.
Description metadata is limited; actual data quality and completeness require manual inspection.
Provenance
Source
Stanford Network Analysis Project (SNAP), sourced from Amazon and Kaggle.
Collection Method
Collected from Amazon platform; subset created for sentiment analysis benchmarking.
Time Range
18 years up to March 2013.
Freshness
Data spans up to March 2013; last update date for the dataset listing is unknown.
Geography
Global (Amazon platform).
License is attributed to Courant Institute of Mathematical Sciences and Kaggle; users should verify terms.