Electricity: Subsampled Classification Dataset for Machine Learning Benchmarking
by Eddie Bergman
arff
Available on 1 platform
Sign in to view source links and access this dataset
Description
A 2000-row, 100-column, 10-class subsample of the original electricity dataset, generated with a specific random seed. The dataset was created by Eddie Bergman using a Python script for uniform sampling of rows, columns, and classes. Its license is us-pd, indicating it is in the public domain in the United States.
Use Cases
Benchmarking classification model performance on a controlled subset of electricity data.
Comparing feature selection methods on a dataset with 100 sampled columns.
Testing model scalability and training time on a dataset of 2000 rows.
Evaluating stratified sampling techniques for maintaining class distribution.
Strengths
Controlled subsample size of 2000 rows and 100 columns, enabling consistent benchmarking.
Stratified sampling was used to preserve the original dataset's class distribution.
The generation process is fully reproducible due to the specified random seed of 4.
Limitations
The original dataset's column names and specific features are unknown, limiting interpretability.
Row count and file size are unknown for this specific subsample.
The description lacks details on the temporal coverage and geographic scope of the underlying data.
Provenance
Source
Subsampled from the 'electricity' dataset (ID 44120) on OpenML.
Collection Method
Generated via a Python script performing random uniform sampling of rows, columns, and classes.
Time Range
null
Freshness
Last updated date is unknown; freshness unverified.
Geography
null
License is us-pd (public domain in the United States).