20,000 rows of synthetic car data generated for price prediction tasks. The dataset is intended for exploratory data analysis, cleaning, and regression modeling, and was created using the Faker library. It is hosted on Kaggle.
Use Cases
- Train regression models for car price prediction based on synthetic car features.
- Practice exploratory data analysis (EDA) on a structured, tabular dataset.
- Demonstrate data cleaning and preprocessing workflows on synthetic data.
- Benchmark machine learning algorithms for regression tasks.
Strengths
- 20,000 rows provide a substantial volume for model training and validation.
- Explicitly designed for regression modeling, EDA, and data cleaning tasks.
Limitations
- Data is synthetic, generated by Faker, and may not reflect real-world patterns or distributions.
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Kaggle
- Collection Method
- Synthetically generated using the Faker library.
- Time Range
- null
- Freshness
- Last update date is unknown; freshness unverified.
- Geography
- null