Hayes-Roth: A Synthetic Classification Benchmark Dataset
by Barbara and Frederick Hayes-Roth
arff
Available on 1 platform
Sign in to view source links and access this dataset
Description
A merged version of the separate train and test sets for the Hayes-Roth database, originally created by Barbara and Frederick Hayes-Roth in March 1989. The dataset contains synthetic instances with five nominal attributes, including hobby, age, education level, and marital status, and a class label determined by a specific rule. It is designed as a classification benchmark where only three of the five attributes are diagnostic.
Use Cases
Testing classification algorithm performance on a synthetic, rule-based dataset.
Demonstrating feature selection concepts, as only three of the five attributes are described as diagnostic.
Teaching classification principles using a dataset with a clearly defined decision rule.
Benchmarking model interpretability on data with known, simple class prototypes.
Strengths
Clear, rule-based class definitions are provided in the description, aiding interpretability.
Dataset origin and creators are explicitly documented.
The dataset is a merged version of standard train/test splits, providing a complete set for analysis.
Limitations
Row count, column details, and file formats are unknown, limiting suitability assessment.
Data is synthetic and generated according to a specific 1989 experiment, which may limit real-world relevance.
Column-level documentation beyond the listed attribute names is absent; field semantics must be inferred.
Provenance
Source
Barbara and Frederick Hayes-Roth, donated by David W. Aha.
Collection Method
Synthetically generated for a classification experiment.
Time Range
Data created in March 1989.
Freshness
Last update date is unknown; freshness unverified.