Research Article
No access
Published Online: 7 May 2012

An Automaton Approach for Waiting Times in DNA Evolution

Publication: Journal of Computational Biology
Volume 19, Issue Number 5

Abstract

In a recent article, Behrens and Vingron (J. Comput. Biol. 17/12, 2010) compute waiting times for k-mers to appear during DNA evolution under the assumption that the considered k-mers do not occur in the initial DNA sequence, an issue arising when studying the evolution of regulatory DNA sequences with regard to transcription factor (TF) binding site emergence. The mathematical analysis underlying their computation assumes that occurrences of words under interest do not overlap. We relax here this assumption by use of an automata approach. In an alphabet of size 4 like the DNA alphabet, most words have no or a low autocorrelation; therefore, globally, our results confirm those of Behrens and Vingron. The outcome is quite different when considering highly autocorrelated k-mers; in this case, the autocorrelation pushes down the probability of occurrence of these k-mers at generation 1 and, consequently, increases the waiting time for apparition of these k-mers up to 40%. An analysis of existing TF binding sites unveils a significant proportion of k-mers exhibiting autocorrelation. Thus, our computations based on automata greatly improve the accuracy of predicting waiting times for the emergence of TF binding sites to appear during DNA evolution. We do the computation in the Bernoulli or M0 model; computations in the M1 model, a Markov model of order 1, are more costly in terms of time and memory but should produce similar results. While Behrens and Vingron considered specifically promoters of length 1000, we extend the results to promoters of any size; we exhibit the property that the probability that a k-mer occurs at generation time 1 while being absent at time 0 behaves linearly with respect to the length of the promoter, which induces a hyperbolic behaviour of the waiting time of any k-mer with respect to the length of the promoter. The C code is available at www.lipn.univ-paris13.fr/∼nicodeme/.

Get full access to this article

View all available purchase options and get full access to this article.

References

Arndt P.F.Hwa T.2005. Identification and measurement of neighbor-dependent nucleotide substitution processesBioinformatics212322-2328. Arndt, P.F., and Hwa, T. 2005. Identification and measurement of neighbor-dependent nucleotide substitution processes. Bioinformatics 21, 2322–2328.
Bassino F.Clément J.Fayolle J. et al.2008. Constructions for clump statisticsProc. 5th Colloq. Math. Comput. Sci.183-198. Bassino, F., Clément, J., Fayolle, J., et al., 2008. Constructions for clump statistics. Proc. 5th Colloq. Math. Comput. Sci. 183–198.
Behrens S.Vingron M.2010. Studying the evolution of promoters: a waiting time problemJ. Comput. Biol.171591-1606. Behrens, S., and Vingron, M. 2010. Studying the evolution of promoters: a waiting time problem. J. Comput. Biol. 17, 1591–1606.
Crochemore M.Rytter W.1994Text AlgorithmsOxford University PressNew York. Crochemore, M., and Rytter, W. 1994. Text Algorithms. Oxford University Press, New York.
Dowell R.D.2010. Transcription factor binding variation in the evolution of gene regulationTrends Genet.26468-475. Dowell, R.D. 2010. Transcription factor binding variation in the evolution of gene regulation. Trends Genet. 26, 468–475.
Duret L.Arndt P.F.2008. The impact of recombination on nucleotide substitutions in the human genomePLoS Genet4. Duret, L., and Arndt, P.F. 2008. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 4.
Durrett R.Schmidt D.2007. Waiting for regulatory sequences to appearAnn. Appl. Probab.171-32. Durrett, R., and Schmidt, D. 2007. Waiting for regulatory sequences to appear. Ann. Appl. Probab. 17, 1–32.
Flajolet P.Sedgewick R.2009Analytic CombinatoricsCambridge University PressNew York. Flajolet, P., and Sedgewick, R. 2009. Analytic Combinatorics. Cambridge University Press, New York.
Goulden I.Jackson D.1983Combinatorial EnumerationJohn WileyNew York. Goulden, I., and Jackson, D. 1983. Combinatorial Enumeration. John Wiley, New York.
Guibas L.Odlyzko A.1981a. Periods in stringsJ. Combin. Theory A19-42. Guibas, L., and Odlyzko, A. 1981a. Periods in strings. J. Combin. Theory A, 19–42.
Guibas L.Odlyzko A.1981b. Strings overlaps, pattern matching, and non-transitive gamesJ. Combin. Theory A108-203. Guibas, L., and Odlyzko, A. 1981b. Strings overlaps, pattern matching, and non-transitive games. J. Combin. Theory A, 108–203.
Hopcroft J.Motwani R.Ullman J.2001Introduction to Automata Theory, Languages and ComputationAddison-WesleyNew York. Hopcroft, J., Motwani, R., and Ullman, J. 2001. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, New York.
Karlin S.Taylor H.1975A First Course in Stochastic Processes2ndAcademic PressNew York. Karlin, S., and Taylor, H. 1975. A First Course in Stochastic Processes, 2nd ed. Academic Press, New York.
Kunarso G.Chia N.-Y.Jeyakani J. et al.2010. Transposable elements have rewired the core regulatory network of human embryonic stem cellsNat. Genet.42631-634. Kunarso, G., Chia, N.-Y., Jeyakani, J., et al. 2010. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634.
Lothaire M.2005Applied Combinatorics on WordsCambridge University PressNew York. Lothaire, M. 2005. Applied Combinatorics on Words. Cambridge University Press, New York.
Nicodème P.2011. A clump analysis for waiting times in DNA evolution [Personal communication]www.lipn.univ-paris13.fr/∼nicodeme/ Nicodème, P. 2011. A clump analysis for waiting times in DNA evolution [Personal communication]. www.lipn.univ-paris13.fr/∼nicodeme/
Odom D.T.Dowell R.D.Jacobsen E.S. et al.2007. Tissue-specific transcriptional regulation has diverged significantly between human and mouseNat. Genet.39730-732. Odom, D.T., Dowell, R.D., Jacobsen, E.S., et al. 2007. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat. Genet. 39, 730–732.
Portales-Casamar E.Thongjuea S.Kwon A.T. et al.2010. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profilesNucleic Acids Res.38D105-D110. Portales-Casamar, E., Thongjuea, S., Kwon, A.T., et al. 2010. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38, D105–D110.
Schmidt D.Wilson M.D.Ballester B. et al.2010. Five-vertebrate chip-seq reveals the evolutionary dynamics of transcription factor bindingScience3281036-1040. Schmidt, D., Wilson, M.D., Ballester, B., et al. 2010. Five-vertebrate chip-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040.
Stone J.R.Wray G.A.2001. Rapid evolution of cis-regulatory sequences via local point mutationsMol. Biol. Evol.181764-1770. Stone, J.R., and Wray, G.A. 2001. Rapid evolution of cis-regulatory sequences via local point mutations. Mol. Biol. Evol. 18, 1764–1770.
Wray G.A.Hahn M.W.Abouheif E. et al.2003. The evolution of transcriptional regulation in eukaryotesMol. Biol. Evol.201377-1419. Wray, G.A., Hahn, M.W., Abouheif, E., et al. 2003. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20, 1377–1419.

Information & Authors

Information

Published In

cover image Journal of Computational Biology
Journal of Computational Biology
Volume 19Issue Number 5May 2012
Pages: 550 - 562
PubMed: 22468677

History

Published online: 7 May 2012
Published in print: May 2012
Published ahead of print: 2 April 2012

Permissions

Request permissions for this article.

Topics

Authors

Affiliations

Sarah Behrens
Westfälische Wilhelms-Universität, Institute for Evolution and Biodiversity, Münster, Germany.
Cyril Nicaud
LIGM, CNRS-UMR 8049, Université Paris-Est, Marne la Vallée, France.
Pierre Nicodème
LIPN—Team CALIN, CNRS-UMR 7030, Institut Galilée, Université Paris-Nord, Villetaneuse, France.

Notes

Address correspondence to:Dr. Pierre NicodèmeLIPN—Team CALINCNRS-UMR 7030Université Paris-Nord, Institut Galilée99, Avenue Jean-Baptiste Clément93430, Villetaneuse,France
E-mail: [email protected]

Disclosure Statement

No competing financial interests exist.

Metrics & Citations

Metrics

Citations

Export citation

Select the format you want to export the citations of this publication.

View Options

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.

Society Access

If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF/EPUB

View PDF/EPUB

Full Text

View Full Text

Figures

Tables

Media

Share

Share

Copy the content Link

Share on social media

Back to Top