10,000 rows of synthetic cafe sales data designed for data cleaning training. The dataset is synthetic and likely contains intentionally introduced errors or inconsistencies to simulate real-world messy data. Its author, organization, and license are unknown.
Use Cases
- Practice identifying and correcting data inconsistencies based on the synthetic nature of the data.
- Develop data cleaning pipelines for sales data based on the cafe sales domain.
- Benchmark data cleaning algorithms on a controlled, synthetic dataset.
- Train data wrangling skills on a dataset with known, intentional errors.
Strengths
- 10,000 rows provides a substantial volume for training exercises.
- Synthetic nature allows for controlled introduction of specific data quality issues for targeted practice.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
- The dataset is synthetic and may not reflect the complexity of real-world cafe sales data.
Provenance
- Collection Method
- Synthetic generation.