MINING OF IMPORTANT INFORMATIVE GENES AND
CLASSIFIER CONSTRUCTION FOR CANCER DATASET
Soumen Kumar Pati1
and Asit Kumar Das2
1Department of Computer Science/Information Technology, St. Thomas‘College of
Engineering and Technology, 4, D.H. Road, Kolkata-23
soumen_pati@rediffmail.com
2Department of Computer Science and Technology, Bengal Engineering and Science
University, Shibpur, Howrah-03
asitdas72@rediffmail.com
ABSTRACT
Microarray is a useful technique for measuring expression data of thousands or more of genes
simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data
is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the
distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and
robust gene identification methods is extremely fundamental. Many gene selection methods as well as their
corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination
capability is selected and classification rules are generated for cancer based on gene
expression profiles. The method first computes importance factor of each gene of experimental cancer
dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high
class discrimination capability according to their depended degree of classes. Then initial important genes
are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans
clustering algorithm is applied on each selected gene of initial reduct and compute missclassification
errors of individual genes. The final reduct is formed by selecting most important genes with
respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced
by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples
of experimental test dataset. The proposed method test on four publicly available cancerous gene
expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using
important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to
prove the robustness of proposed method compares the outcomes (correctly classified instances) with some
existing well known classifiers.
KEYWORDS
Microarray cancer data, K-means algorithm, Gene selection, Classification Rule, Cancer sample
identification, Gene reducts.
ORIGINAL SOURCE URL : http://airccse.org/journal/ijsc/papers/3312ijsc06.pdf
No comments:
Post a Comment