International Journal of Hybrid Intelligent Systems
Volume 2, No. 4 (2005), pp. –
Missing Values Imputation for DNA Microarray Data using Ranked Covariance
Vectors
Muhammad Shoaib B. Sehgal, Iqbal Gondal and Laurence S. Dooley
Abstract. Microarray data are used in a range of application areas in
biology, from diagnosis through to drug discovery; however such data often
contains multiple missing genetic expression values that degrade the performance
of statistical and machine learning algorithms. This paper presents a new
k-Ranked Covariance-based Missing Value Imputation (KRCOV) algorithm which
demonstrates superior imputation performance compared to the popular k-Nearest
Neighbour (KNN) technique in estimating missing values in the BRCA1, BRCA2 and
Sporadic genetic mutation samples present in ovarian and breast cancer. By
exploiting the strong correlation between samples, KRCOV consistently
outperforms in terms of estimation error, significance test and classification
accuracy, KNN and other zero-imputation techniques in approximating randomly
occurring missing values in the range 1% to 5%. The Generalized Regression
Neural Network (GRNN) classifier is applied as it repeatedly provides improved
classification performance for ovarian and breast cancer microarray data. The
theoretical foundations of KRCOV are presented and a self-correcting error
property investigated that guarantees the new algorithm generates a lower error
compared with KNN, when estimating randomly introduced missing values, for the
same order of computational complexity.
Keywords: Microarray Data Processing, Missing Value Imputation, k-ranked
Covariance Imputation, Neural Networks and Class Prediction.
Copyright © 2005 Advanced Knowledge International, Australia