Name: Replication Package for Synthesizing Static Rules for Code Smell Detection
Creator: Zijie Huang
Published: 2026-04-16T08:56:57
License: CC-BY-4.0
Keywords: Machine Learning, ZIP, Machine Learning Benchmark, Software Engineering, Benchmark, Text, Tabular, Replication, Replication Package, Code Smell Detection, Software Metrics

Description

A 2026 replication package by Zijie Huang for research on code smell detection. It includes the MLCQ benchmark with 14,739 annotations from 522 repositories, 1,840 developer evaluations, and 40 qualitative interview transcripts. The package contains datasets, source code for a Java subsystem, model implementations, and analysis scripts.

Use Cases

Benchmarking code smell detection models based on the MLCQ dataset with 14,739 annotations
Evaluating developer perceptions of detected smells based on 1,840 human evaluations
Training or testing machine learning models like CodeBERT-Fusion or DCTO for software metric analysis
Conducting qualitative analysis on code quality based on 40 developer interview transcripts

Strengths

Includes a large benchmark dataset (MLCQ) with 14,739 annotations across 522 software repositories
Contains human evaluation data from 1,840 developer assessments for validation
Provides replication scripts and pre-computed results for reproducibility
Includes source code for a real-world Java subsystem (~5.3 KLOC) for case study

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count for individual datasets is unknown, which may limit suitability assessment
Data may reflect bias inherent to the specific repositories and developers sampled

Provenance

Source: figshare
Collection Method: Research replication package containing benchmark data, human study results, and source code.
Freshness: Last updated 2026-04-16 08:56:57

Requires Python 3.10+ and specific dependencies; the CodeBERT-Fusion model requires PyTorch with CUDA support and ~6GB VRAM. License is CC-BY-4.0.

Text Tabular ZIP Machine Learning Machine Learning Benchmark Software Engineering Benchmark Replication Replication Package Code Smell Detection Software Metrics

Replication Package for Synthesizing Static Rules for Code Smell Detection

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info