AN INTEGRATIVE SYSTEM FOR PREDICTION
OF NAC PROTEINS IN RICE USING
DIFFERENT FEATURE EXTRACTION
METHODS
Hemalatha N. 1,*, Rajesh M. K. 2 and
Narayanan N. K. 3
1AIMIT, St. Aloysius College, Mangalore,
India 2Division of Crop Improvement, Central Plantation Crops
Research Institute, Kasaragod 671124, India 3School of
Information Science and Technology, Kannur University, Kannur, India.
ABSTRACT
The NAC gene family encodes a large family of plant-specific transcription factors with diverse roles in
various developmental processes and stress responses in plants. Creation of genome wide prediction tools
for NAC proteins will have a significant impact on gene annotation in rice. In the present study, NACSVM,
a tool for computational genome-scale prediction of NAC proteins in rice was developed integrating
compositional and evolutionary information of NAC proteins. Initially, support vector machine (SVM)-
based modules were developed using combinatorial presence of diverse protein features such as
traditional amino acid, dipeptide (i+1), tripeptide (i+2), four-parts composition and PSSM and an overall
accuracy of 79%, 93%, 93%, 79% and 100% respectively was achieved. Later, two hybrid modules were
developed based on amino acid, dipeptide and tripeptide composition, through which an overall accuracy
of 83% and 79% was achieved. NACSVM was also evaluated using position-specific iterated – basic local
alignment search tool which resulted in a lower accuracy of 50%. In order to benchmark NACSVM ,
the
tool was evaluated using independent data test and cross validation methods. The different statistical
analyses carried out revealed that the proposed algorithm is an useful tool for annotating NAC proteins in
genome of rice.
KEYWORDS
SVM, NAC, RBF, PSSM, ROC, AUC