BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

 

 

 

 

PCA-HPR: A principle component analysis model for human promoter recognition

 

Authors

Xiaomeng Li1, 2, 3, *, Jia Zeng1 and Hong Yan1, 2

 

Affiliation

1Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong ; 2School of Electrical and Information Engineering, University of Sydney, NSW 2006, Australia; 3School of Astronautics, Harbin Institute of Technology, Harbin, China

 

Email

sgusico@hotmail.com; * Corresponding author

 

Article Type

Prediction Model

 

Date

received January 29, 2008; revised May 06, 2008; accepted May 09, 2008; published June 19, 2008

 

Abstract

We describe a promoter recognition method named PCA-HPR to locate eukaryotic promoter regions and predict transcription start sites (TSSs). We computed codon (3-mer) and pentamer (5-mer) frequencies and created codon and pentamer frequency feature matrices to extract informative and discriminative features for effective classification. Principal component analysis (PCA) is applied to the feature matrices and a subset of principal components (PCs) are selected for classification. Our system uses three neural network classifiers to distinguish promoters versus exons, promoters versus introns, and promoters versus 3' un-translated region (3'UTR). We compared PCA-HPR with three well-known existing promoter prediction systems such as DragonGSF, Eponine and FirstEF. Validation shows that PCA-HPR achieves the best performance with three test sets for all the four predictive systems.

 

Keywords

promoter recognition; sequence feature; CpG islands; transcription start sites; principal component analysis

 

Citation

Li et al., Bioinformation 2(9): 373-378 (2008)

 

Edited by

P. Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.