Name: Hayes-Roth: A Synthetic Classification Benchmark Dataset
Creator: Barbara and Frederick Hayes-Roth
License: us-pd
Keywords: Machine Learning, Benchmark, Tabular Data, Tabular, Educational Demo, Classification, Synthetic Data, Classification Benchmark, Synthetic

Description

A merged version of the separate train and test sets for the Hayes-Roth database, originally created by Barbara and Frederick Hayes-Roth in March 1989. The dataset contains synthetic instances with five nominal attributes, including hobby, age, education level, and marital status, and a class label determined by a specific rule. It is designed as a classification benchmark where only three of the five attributes are diagnostic.

Use Cases

Testing classification algorithm performance on a synthetic, rule-based dataset.
Demonstrating feature selection concepts, as only three of the five attributes are described as diagnostic.
Teaching classification principles using a dataset with a clearly defined decision rule.
Benchmarking model interpretability on data with known, simple class prototypes.

Strengths

Clear, rule-based class definitions are provided in the description, aiding interpretability.
Dataset origin and creators are explicitly documented.
The dataset is a merged version of standard train/test splits, providing a complete set for analysis.

Limitations

Row count, column details, and file formats are unknown, limiting suitability assessment.
Data is synthetic and generated according to a specific 1989 experiment, which may limit real-world relevance.
Column-level documentation beyond the listed attribute names is absent; field semantics must be inferred.

Provenance

Source: Barbara and Frederick Hayes-Roth, donated by David W. Aha.
Collection Method: Synthetically generated for a classification experiment.
Time Range: Data created in March 1989.
Freshness: Last update date is unknown; freshness unverified.
Geography: Synthetic data; no geographic coverage.

License is listed as us-pd (U.S. Public Domain).

Tabular Machine Learning Benchmark Tabular Data Educational Demo Classification Synthetic Data Classification Benchmark Synthetic

Hayes-Roth: A Synthetic Classification Benchmark Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info