BACK TO CONTENTS   |    PDF   |    NEXT

Title

 

 

 

 

An efficient method for statistical significance calculation of transcription factor binding sites

 

Authors

Ziliang Qian1, 2, $, Lingyi Lu1, 2, $,  Liu Qi3,*, Yixue Li1, 3, 4, *

 

Affiliation

1Bioinformatics Center, Key Laboratory of Molecular System Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China; 2 Graduate School of the Chinese Academy of Sciences, 19 Yuquan Road, Beijing 100039, China; 3School of Life Science and Biotechnology, Shanghai Jiao Tong University; 4Shanghai Center for Bioinformatics Technology, 100 Qinzhou Road, 200235 Shanghai, China

 

Email

liuqi@sibs.ac.cn; yxli@sibs.ac.cn; * Corresponding author

 

Article Type

Prediction Model

 

Date

received December 13, 2007; accepted December 31, 2007; published online December 30, 2007

 

Abstract

Various statistical models have been developed to describe the DNA binding preference of transcription factors, by which putative transcription factor binding sites (TFBS) can be identified according to scores assigned. Statistical significance of these scores, usually known as the p-value, play a critical role in identification. We developed an efficient algorithm to provide precise calculation of the statistical significance, remarkably enhancing the calculation efficiency by reducing the time complexity from an exponent scale to a linear scale, and successfully extended the application of this algorithm to a wide range of models, from the commonly used position weight matrix models to the complicated Bayesian Network models. Further, we calculated p-values of all transcription factor DNA binding sites recorded in the database, JASPAR, and based on these, we investigated some unseen properties of p-values as a whole, such as the p-value distribution of different models and the p-value variance according to changed scoring schemes. We hope that our algorithm and the result of computational experiments would offer an improved solution to the statistical significance of transcription factor binding sites. The software to implement our method can be downloaded from http://pcal.biosino.org/pCal.html.

 

Keywords

transcription factor DNA binding sites; Bayesian network

 

Citation

Qian, et al., Bioinformation 2(5): 169-174 (2007)

 

Edited by

A. M. Khan, T. W. Tan & S. Ranganathan

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.