Volume 5, 2001
|Page(s)||171 - 181|
|Published online||15 August 2002|
Statistical tools for discovering pseudo-periodicities in biological sequences
Laboratoire Statistique et Génome, URA
8071 du CNRS, La Génopole, Université d'Evry, France; firstname.lastname@example.org.
2 Institut National de la Recherche Agronomique, BIA, 78352 Jouy-en-Josas, France; email@example.com.
3 Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany; firstname.lastname@example.org.
Revised: 31 July 2001
Revised: 12 October 2001
Many protein sequences present non trivial periodicities, such as cysteine signatures and leucine heptads. These known periodicities probably represent a small percentage of the total number of sequences periodic structures, and it is useful to have general tools to detect such sequences and their period in large databases of sequences. We compare three statistics adapted from those used in time series analysis: a generalisation of the simple autocovariance based on a similarity score and two statistics intending to increase the power of the method. Theoretical behaviour of these statistics are derived, and the corresponding tests are then described. In this paper we also present an application of these tests to a protein known to have sequence periodicity.
Mathematics Subject Classification: 62G10 / 62P10
Key words: Biological sequences / proteins / periodicity / autocovariance funtion.
© EDP Sciences, SMAI, 2001
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.