An efficient method for statistical significance calculation of transcription factor binding sites



Ziliang Qian1, 2, $, Lingyi Lu1, 2, $,  Liu Qi3,*, Yixue Li1, 3, 4, *



1Bioinformatics Center, Key Laboratory of Molecular System Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China; 2 Graduate School of the Chinese Academy of Sciences, 19 Yuquan Road, Beijing 100039, China; 3School of Life Science and Biotechnology, Shanghai Jiao Tong University; 4Shanghai Center for Bioinformatics Technology, 100 Qinzhou Road, 200235 Shanghai, China


Email;; * Corresponding author


Article Type

Prediction Model



received December 13, 2007; accepted December 31, 2007; published online December 30, 2007



Various statistical models have been developed to describe the DNA binding preference of transcription factors, by which putative transcription factor binding sites (TFBS) can be identified according to scores assigned. Statistical significance of these scores, usually known as the p-value, play a critical role in identification. We developed an efficient algorithm to provide precise calculation of the statistical significance, remarkably enhancing the calculation efficiency by reducing the time complexity from an exponent scale to a linear scale, and successfully extended the application of this algorithm to a wide range of models, from the commonly used position weight matrix models to the complicated Bayesian Network models. Further, we calculated p-values of all transcription factor DNA binding sites recorded in the database, JASPAR, and based on these, we investigated some unseen properties of p-values as a whole, such as the p-value distribution of different models and the p-value variance according to changed scoring schemes. We hope that our algorithm and the result of computational experiments would offer an improved solution to the statistical significance of transcription factor binding sites. The software to implement our method can be downloaded from



transcription factor DNA binding sites; Bayesian network



Qian, et al., Bioinformation 2(5): 169-174 (2007)


Edited by

A. M. Khan, T. W. Tan & S. Ranganathan






Biomedical Informatics



This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.