Search for collections on Wintec Research Archive

Experience: Quality benchmarking of datasets used in software effort estimation


[thumbnail of Article] PDF (Article)
Experience-Quality Benchmarking of Empirical Software Effort Estimation Datasets.pdf - Submitted Version

Download (845kB)
Official URL:


Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous
process and project management activities, including the estimation of development effort and the prediction
of the likely location and severity of defects in code. Serious questions have been raised, however, over the
quality of the data used in ESE. Data quality problems caused by noise, outliers, and incompleteness have
been noted as being especially prevalent. Other quality issues, although also potentially important, have
received less attention. In this study, we assess the quality of 13 datasets that have been used extensively
in research on software effort estimation. The quality issues considered in this article draw on a taxonomy
that we published previously based on a systematic mapping of data quality issues in ESE. Our contributions
are as follows: (1) an evaluation of the “fitness for purpose” of these commonly used datasets and (2) an
assessment of the utility of the taxonomy in terms of dataset benchmarking. We also propose a template
that could be used to both improve the ESE data collection/submission process and to evaluate other such
datasets, contributing to enhanced awareness of data quality issues in the ESE community and, in time, the
availability and use of higher-quality datasets.

Item Type: Journal article
Uncontrolled Keywords: Data quality, benchmarking, empirical software engineering, software effort estimation, noise, missing data
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: Schools > Centre for Business, Information Technology and Enterprise > School of Information Technology
Depositing User: Michael Bosu
Date Deposited: 17 Sep 2019 03:11
Last Modified: 21 Jul 2023 08:22

Actions (login required)

View Item
View Item