Title |
|
PCA-HPR: A principle component analysis model for human promoter recognition
|
Authors |
Xiaomeng Li1, 2, 3, *, Jia Zeng1 and Hong Yan1, 2
| |
Affiliation |
1Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong ; 2School of Electrical and Information Engineering, University of Sydney, NSW 2006, Australia; 3School of Astronautics, Harbin Institute of Technology, Harbin, China
| |
|
sgusico@hotmail.com; * Corresponding author
| |
Article Type |
Prediction Model
| |
Date |
received January 29, 2008; revised May 06, 2008; accepted May 09, 2008; published June 19, 2008
| |
Abstract |
We describe a promoter recognition method named PCA-HPR to locate eukaryotic promoter regions and predict transcription start sites (TSSs). We computed codon (3-mer) and pentamer (5-mer) frequencies and created codon and pentamer frequency feature matrices to extract informative and discriminative features for effective classification. Principal component analysis (PCA) is applied to the feature matrices and a subset of principal components (PCs) are selected for classification. Our system uses three neural network classifiers to distinguish promoters versus exons, promoters versus introns, and promoters versus 3' un-translated region (3'UTR). We compared PCA-HPR with three well-known existing promoter prediction systems such as DragonGSF, Eponine and FirstEF. Validation shows that PCA-HPR achieves the best performance with three test sets for all the four predictive systems.
| |
Keywords |
promoter recognition; sequence feature; CpG islands; transcription start sites; principal component analysis
| |
Citation |
Li et al., Bioinformation 2(9): 373-378 (2008)
| |
Edited by |
P. Kangueane
| |
ISSN |
0973-2063
| |
Publisher |
| |
License |
This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License. |