Name: Electricity: Subsampled Classification Dataset for Machine Learning Benchmarking
Creator: Eddie Bergman
License: us-pd
Keywords: Machine Learning Benchmark, Tabular Classification, Energy Consumption, Tabular, Classification, Electricity, Synthetic

Description

A 2000-row, 100-column, 10-class subsample of the original electricity dataset, generated with a specific random seed. The dataset was created by Eddie Bergman using a Python script for uniform sampling of rows, columns, and classes. Its license is us-pd, indicating it is in the public domain in the United States.

Use Cases

Benchmarking classification model performance on a controlled subset of electricity data.
Comparing feature selection methods on a dataset with 100 sampled columns.
Testing model scalability and training time on a dataset of 2000 rows.
Evaluating stratified sampling techniques for maintaining class distribution.

Strengths

Controlled subsample size of 2000 rows and 100 columns, enabling consistent benchmarking.
Stratified sampling was used to preserve the original dataset's class distribution.
The generation process is fully reproducible due to the specified random seed of 4.

Limitations

The original dataset's column names and specific features are unknown, limiting interpretability.
Row count and file size are unknown for this specific subsample.
The description lacks details on the temporal coverage and geographic scope of the underlying data.

Provenance

Source: Subsampled from the 'electricity' dataset (ID 44120) on OpenML.
Collection Method: Generated via a Python script performing random uniform sampling of rows, columns, and classes.
Time Range: null
Freshness: Last updated date is unknown; freshness unverified.
Geography: null

License is us-pd (public domain in the United States).

Tabular Machine Learning Benchmark Tabular Classification Energy Consumption Classification Electricity Synthetic

Electricity: Subsampled Classification Dataset for Machine Learning Benchmarking

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info