Monday, 3 December 2018

MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER DATASET

MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER DATASET 

Soumen Kumar Pati1 and Asit Kumar Das2 

 1Department of Computer Science/Information Technology, St. Thomas‘College of Engineering and Technology, 4, D.H. Road, Kolkata-23 soumen_pati@rediffmail.com 2Department of Computer Science and Technology, Bengal Engineering and Science University, Shibpur, Howrah-03 asitdas72@rediffmail.com 

ABSTRACT

 Microarray is a useful technique for measuring expression data of thousands or more of genes simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust gene identification methods is extremely fundamental. Many gene selection methods as well as their corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination capability is selected and classification rules are generated for cancer based on gene expression profiles. The method first computes importance factor of each gene of experimental cancer dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high class discrimination capability according to their depended degree of classes. Then initial important genes are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans clustering algorithm is applied on each selected gene of initial reduct and compute missclassification errors of individual genes. The final reduct is formed by selecting most important genes with respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples of experimental test dataset. The proposed method test on four publicly available cancerous gene expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to prove the robustness of proposed method compares the outcomes (correctly classified instances) with some existing well known classifiers. 

KEYWORDS 

Microarray cancer data, K-means algorithm, Gene selection, Classification Rule, Cancer sample identification, Gene reducts. 










No comments:

Post a Comment

February Issue Journal! Authors are invited to submit papers!

International Journal on Soft Computing (IJSC) ISSN: 2229 - 6735 [Online]; 2229 - 7103 [Print] https://airccse.org/journal/ijsc/ijsc.html He...