Research Article
No access
Published Online: 4 January 2012

An Overview of Population Genetic Data Simulation

Publication: Journal of Computational Biology
Volume 19, Issue Number 1

Abstract

Simulation studies in population genetics play an important role in helping to better understand the impact of various evolutionary and demographic scenarios on sequence variation and sequence patterns, and they also permit investigators to better assess and design analytical methods in the study of disease-associated genetic factors. To facilitate these studies, it is imperative to develop simulators with the capability to accurately generate complex genomic data under various genetic models. Currently, a number of efficient simulation software packages for large-scale genomic data are available, and new simulation programs with more sophisticated capabilities and features continue to emerge. In this article, we review the three basic simulation frameworks—coalescent, forward, and resampling—and some of the existing simulators that fall under these frameworks, comparing them with respect to their evolutionary and demographic scenarios, their computational complexity, and their specific applications. Additionally, we address some limitations in current simulation algorithms and discuss future challenges in the development of more powerful simulation tools.

Get full access to this article

View all available purchase options and get full access to this article.

References

Arenas M.Posada D.2007. Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demographyBMC Bioinform.8458. Arenas, M., and Posada, D. 2007. Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinform. 8, 458.
Calafell F.Grigorenko E.L.Chikanian A.A. et al.2001. Haplotype evolution and linkage disequilibrium: a simulation studyHum. Hered.5185-96. Calafell, F., Grigorenko, E.L., Chikanian, A.A., et al. 2001. Haplotype evolution and linkage disequilibrium: a simulation study. Hum. Hered. 51, 85–96.
Carvajal-Rodriguez A.2008. GENOMEPOP: a program to simulate genomes in populationsBMC Bioinform.9223. Carvajal-Rodriguez, A. 2008. GENOMEPOP: a program to simulate genomes in populations. BMC Bioinform. 9, 223.
Carvajal-Rodriguez A.2008. Simulation of genomes: a reviewCurr. Genomics9155-159. Carvajal-Rodriguez, A. 2008. Simulation of genomes: a review. Curr. Genomics 9, 155–159.
Chen G.K.Marjoram P.Wall J.D.2009. Fast and flexible simulation of DNA sequence dataGenome Res.19136-142. Chen, G.K., Marjoram, P., and Wall, J.D. 2009. Fast and flexible simulation of DNA sequence data. Genome Res. 19, 136–142.
Chen L.Yu G.Miller D. et al.2009. A ground truth based comparative study on detecting epistatic SNPsProc. IEEE Int. Conf. Bioinform. Biomed. Chen, L., Yu, G., Miller, D., et al. (2009). A ground truth based comparative study on detecting epistatic SNPs. Proc. IEEE Int. Conf. Bioinform. Biomed.
Dudek S.M.Motsinger A.A.Velez D.R. et al.2006. Data simulation software for whole-genome association and other studies in human geneticsPac. Symp. Biocomput.499-510. Dudek, S.M., Motsinger, A.A., Velez, D.R., et al. 2006. Data simulation software for whole-genome association and other studies in human genetics. Pac. Symp. Biocomput. 499–510.
Edward T.L.Bush W.S.Turner S.M. et al.2008. Generating linkage disequilibrium patterns in data simulations using genomeSIMLA, 24–35Evolutionary Computation, Machine Learning and Data Mining in BioinformaticsSpringerBerlin. Edward, T.L., Bush, W.S., Turner, S.M., et al. (2008). Generating linkage disequilibrium patterns in data simulations using genomeSIMLA, 24–35. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, Berlin.
Excoffier L.Novembre J.Schneider S.2000. SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demographyJ. Hered.91506-509. Excoffier, L., Novembre, J., and Schneider, S. 2000. SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography. J. Hered. 91, 506–509.
Guillaume F.Rougemont J.2006. Nemo: an evolutionary and population genetics programming frameworkBioinformatics222556-2557. Guillaume, F., and Rougemont, J. 2006. Nemo: an evolutionary and population genetics programming framework. Bioinformatics 22, 2556–2557.
Hahn L.W.Ritchie M.D.Moore J.H.2003. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactionsBioinformatics19376-382. Hahn, L.W., Ritchie, M.D., and Moore, J.H. 2003. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382.
Hellenthal G.Stephens M.2007. msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspotsBioinformatics23520-521. Hellenthal, G., and Stephens, M. 2007. msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots. Bioinformatics 23, 520–521.
Hernandez R.D.2008. A flexible forward simulator for populations subject to selection and demographyBioinformatics242786-2787. Hernandez, R.D. 2008. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24, 2786–2787.
Hirschhorn J.N.Daly M.J.2005. Genome-wide association studies for common diseases and complex traitsNat. Rev. Genet.695-108. Hirschhorn, J.N., and Daly, M.J. 2005. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108.
Hoggart C.J.Chadeau-Hyam M.Clark T.G. et al.2007. Sequence-level population simulations over large genomic regionsGenetics1771725-1731. Hoggart, C.J., Chadeau-Hyam, M., Clark, T.G., et al. 2007. Sequence-level population simulations over large genomic regions. Genetics 177, 1725–1731.
Hudson R.R.2002. Generating samples under a Wright-Fisher neutral model of genetic variationBioinformatics18337-338. Hudson, R.R. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338.
Hudson R.R.Kaplan N.L.1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequencesGenetics111147-164. Hudson, R.R., and Kaplan, N.L. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164.
Kardia S.L.2000. Context-dependent genetic effects in hypertensionCurr. Hypertens. Rep.232-38. Kardia, S.L. 2000. Context-dependent genetic effects in hypertension. Curr. Hypertens. Rep. 2, 32–38.
Kimmel G.Shamir R.2005. GERBIL: genotype resolution and block identification using likelihoodProc. Natl. Acad. Sci. USA102158-162. Kimmel, G., and Shamir, R. 2005. GERBIL: genotype resolution and block identification using likelihood. Proc. Natl. Acad. Sci. USA 102, 158–162.
Kingman J.F.2000. Origins of the coalescent: 1974–1982Genetics1561461-1463. Kingman, J.F. 2000. Origins of the coalescent: 1974–1982. Genetics 156, 1461–1463.
Kingman J.F.C.1982. The coalescentStochastic Process. Appl.13235-248. Kingman, J.F.C. 1982. The coalescent. Stochastic Process. Appl. 13, 235–248.
Lambert B.W.Terwilliger J.D.Weiss K.M.2008. ForSim: a tool for exploring the genetic architecture of complex traits with controlled truthBioinformatics241821-1822. Lambert, B.W., Terwilliger, J.D., and Weiss, K.M. 2008. ForSim: a tool for exploring the genetic architecture of complex traits with controlled truth. Bioinformatics 24, 1821–1822.
Laval G.Excoffier L.2004. SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex historyBioinformatics202485-2487. Laval, G., and Excoffier, L. 2004. SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Bioinformatics 20, 2485–2487.
Liang L.Zollner S.Abecasis G.R.2007. GENOME: a rapid coalescent-based whole genome simulatorBioinformatics231565-1567. Liang, L., Zollner, S., and Abecasis, G.R. 2007. GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics 23, 1565–1567.
Mailund T.Schierup M.H.Pedersen C.N. et al.2005. CoaSim: a flexible environment for simulating genetic data under coalescent modelsBMC Bioinform.6252. Mailund, T., Schierup, M.H., Pedersen, C.N., et al. 2005. CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinform. 6, 252.
Marchini J.Donnelly P.Cardon L.R.2005. Genome-wide strategies for detecting multiple loci that influence complex diseasesNat. Genet.37413-417. Marchini, J., Donnelly, P., and Cardon, L.R. 2005. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417.
Marjoram P.Wall J.D.2006. Fast “coalescent” simulationBMC Genet.716. Marjoram, P., and Wall, J.D. 2006. Fast “coalescent” simulation. BMC Genet. 7, 16.
Miller D.J.Zhang Y.Yu G. et al.2009. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactionsBioinformatics252478-2485. Miller, D.J., Zhang, Y., Yu, G., et al. 2009. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25, 2478–2485.
Moore J.H.Williams S.M.2002. New strategies for identifying gene-gene interactions in hypertensionAnn. Med.3488-95. Moore, J.H., and Williams, S.M. 2002. New strategies for identifying gene-gene interactions in hypertension. Ann. Med. 34, 88–95.
Neuenschwander S.Hospital F.Guillaume F. et al.2008. quantiNemo: an individual-based program to simulate quantitative traits with explicit genetic architecture in a dynamic metapopulationBioinformatics241552-1553. Neuenschwander, S., Hospital, F., Guillaume, F., et al. 2008. quantiNemo: an individual-based program to simulate quantitative traits with explicit genetic architecture in a dynamic metapopulation. Bioinformatics 24, 1552–1553.
Padhukasahasram B.Marjoram P.Wall J.D. et al.2008. Exploring population genetic models with recombination using efficient forward-time simulationsGenetics1782417-2427. Padhukasahasram, B., Marjoram, P., Wall, J.D., et al. 2008. Exploring population genetic models with recombination using efficient forward-time simulations. Genetics 178, 2417–2427.
Peng B.Amos C.2008. Forward-time simulations of non-random mating populations using simuPOPBioinformatics241408-1409. Peng, B., and Amos, C. 2008. Forward-time simulations of non-random mating populations using simuPOP. Bioinformatics 24, 1408–1409.
Peng B.Amos C.I.Kimmel M.2007. Forward-time simulations of human populations with complex diseasesPLoS Genet.3e47. Peng, B., Amos, C.I., and Kimmel, M. 2007. Forward-time simulations of human populations with complex diseases. PLoS Genet. 3, e47.
Peng B.Kimmel M.2005. simuPOP: a forward-time population genetics simulation environmentBioinformatics213686-3687. Peng, B., and Kimmel, M. 2005. simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21, 3686–3687.
Peng B.Kimmel M.2007. Simulations provide support for the common disease–common variant hypothesisGenetics175763-776. Peng, B., and Kimmel, M. 2007. Simulations provide support for the common disease–common variant hypothesis. Genetics 175, 763–776.
Posada D.Wiuf C.2003. Simulating haplotypes blocks in the human genomeBioinformatics19289-290. Posada, D., and Wiuf, C. 2003. Simulating haplotypes blocks in the human genome. Bioinformatics 19, 289–290.
Ramos-Onsins S.E.Mitchell-Olds T.2007. Mlcoalsim: multilocus coalescent simulationsEvol. Bioinform. Online341-44. Ramos-Onsins, S.E., and Mitchell-Olds, T. 2007. Mlcoalsim: multilocus coalescent simulations. Evol. Bioinform. Online 3, 41–44.
Redon R.Ishikawa S.Fitch K.R. et al.2006. Global variation in copy number in the human genomeNature444444-454. Redon, R., Ishikawa, S., Fitch, K.R., et al. 2006. Global variation in copy number in the human genome. Nature 444, 444–454.
Saxena R.Voight B.F.Lyssenko V. et al.2007. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levelsScience3161331-1336. Saxena, R., Voight, B.F., Lyssenko, V., et al. 2007. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336.
Schaffner S.F.Foo C.Gabriel S. et al.2005. Calibrating a coalescent simulation of human genome sequence variationGenome Res.151576-1583. Schaffner, S.F., Foo, C., Gabriel, S., et al. 2005. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583.
Slatkin M.2001. Simulating genealogies of selected alleles in a population of variable sizeGenet. Res.7849-57. Slatkin, M. 2001. Simulating genealogies of selected alleles in a population of variable size. Genet. Res. 78, 49–57.
Spencer C.C.Coop G.2004. SelSim: a program to simulate population genetic data with natural selection and recombinationBioinformatics203673-3675. Spencer, C.C., and Coop, G. 2004. SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics 20, 3673–3675.
Wang Y.Rannale B.2005. In silico analysis of disease-association mapping strategies using the coalescent process and incorporating ascertainment and selectionAm. J. Hum. Genet.761066-1073. Wang, Y., and Rannale, B. 2005 In silico analysis of disease-association mapping strategies using the coalescent process and incorporating ascertainment and selection. Am. J. Hum. Genet. 76, 1066–1073.
Wright F.A.Huang H.Guan X. et al.2007. Simulating association studies: a data-based resampling method for candidate regions or whole genome scansBioinformatics232581-2588. Wright, F.A., Huang, H., Guan, X., et al. 2007. Simulating association studies: a data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23, 2581–2588.
Zhang K.Qin Z.S.Liu J.S. et al.2004. Haplotype block partitioning and tap SNP selection using genotype data and their applications to association studiesGenome Res.14908-916. Zhang, K., Qin, Z.S., Liu, J.S., et al. 2004. Haplotype block partitioning and tap SNP selection using genotype data and their applications to association studies. Genome Res. 14, 908–916.

Information & Authors

Information

Published In

cover image Journal of Computational Biology
Journal of Computational Biology
Volume 19Issue Number 1January 2012
Pages: 42 - 54
PubMed: 22149682

History

Published online: 4 January 2012
Published in print: January 2012
Published ahead of print: 9 December 2011

Permissions

Request permissions for this article.

Topics

    Authors

    Affiliations

    Xiguo Yuan
    *
    School of Computer Science and Technology, Xidian University, Xi'an, P.R. China.
    David J. Miller*
    Department of Electrical Engineering, Pennsylvania State University, University Park, Pennsylvania.
    Junying Zhang
    School of Computer Science and Technology, Xidian University, Xi'an, P.R. China.
    David Herrington
    Department of Internal Medicine, Wake Forest University, Winston Salem, North Carolina.
    Yue Wang
    Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia.

    Notes

    Address correspondence to:Dr. Yue WangBradley Department of Electrical and Computer EngineeringVirginia Polytechnic Institute and State UniversityArlington, VA 22203E-mail: [email protected]

    Disclosure Statement

    No competing financial interests exist.

    Metrics & Citations

    Metrics

    Citations

    Export citation

    Select the format you want to export the citations of this publication.

    View Options

    Get Access

    Access content

    To read the fulltext, please use one of the options below to sign in or purchase access.

    Society Access

    If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

    Restore your content access

    Enter your email address to restore your content access:

    Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

    View options

    PDF/EPUB

    View PDF/ePub

    Full Text

    View Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Copy the content Link

    Share on social media

    Back to Top