Statistical tools for discovering pseudo-periodicities in biological sequences
Laboratoire Statistique et Génome, URA
8071 du CNRS, La Génopole, Université d'Evry, France; firstname.lastname@example.org.
2 Institut National de la Recherche Agronomique, BIA, 78352 Jouy-en-Josas, France; email@example.com.
3 Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany; firstname.lastname@example.org.
Revised: 31 July 2001
Revised: 12 October 2001
Many protein sequences present non trivial periodicities, such as cysteine signatures and leucine heptads. These known periodicities probably represent a small percentage of the total number of sequences periodic structures, and it is useful to have general tools to detect such sequences and their period in large databases of sequences. We compare three statistics adapted from those used in time series analysis: a generalisation of the simple autocovariance based on a similarity score and two statistics intending to increase the power of the method. Theoretical behaviour of these statistics are derived, and the corresponding tests are then described. In this paper we also present an application of these tests to a protein known to have sequence periodicity.
Mathematics Subject Classification: 62G10 / 62P10
Key words: Biological sequences / proteins / periodicity / autocovariance funtion.
© EDP Sciences, SMAI, 2001