EVOLVING EFFICIENT CLUSTERING AND
CLASSIFICATION PATTERNS IN LYMPHOGRAPHY
DATA THROUGH DATA MINING TECHNIQUES
Shomona Gracia Jacob1
and R.Geetha Ramani2
1Department of Computer Science and Engineering, Rajalakshmi Engineering College
(Affiliated to Anna University, Chennai)
graciarun@gmail.com
2Department of Information Science and Technology, College of Engineering, Guindy,
Anna University, Chennai.
rgeetha@yahoo.com
ABSTRACT
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
KEYWORDS
Data mining, Clustering, Feature Selection, Classification, Lymphography Data
ORIGINAL SOURCE URL : http://airccse.org/journal/ijsc/papers/3312ijsc09.pdf
No comments:
Post a Comment