Abstract

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

Get full access to this article

View all available purchase options and get full access to this article.

References

Bandeira N.Clauser K.Pevzner P.2007. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteinsMol. Cell Proteomics61123-1134. Bandeira, N., Clauser, K., and Pevzner, P. 2007. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins. Mol. Cell Proteomics 6, 1123–1134.
Bandeira N.Pham V.Pevzner P. et al.2008. Automated de novo protein sequencing of monoclonal antibodiesNat. Biotechnol.261336-1338. Bandeira, N., Pham, V., Pevzner, P., et al. 2008. Automated de novo protein sequencing of monoclonal antibodies. Nat. Biotechnol. 26, 1336–1338.
Blainey P.Mosier A.Potanina A. et al.2011. Genome of a low-salinity ammonia-oxidizing archaeon determined by single-cell and metagenomic analysisPLoS One6e16626. Blainey, P., Mosier, A., Potanina, A., et al. 2011. Genome of a low-salinity ammonia-oxidizing archaeon determined by single-cell and metagenomic analysis. PLoS One 6, e16626.
Butler J.MacCallum I.Kleber M. et al.2008. ALLPATHS: de novo assembly of whole-genome shotgun microreadsGenome Res.18810-820. Butler, J., MacCallum, I., Kleber, M., et al. 2008. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820.
Chaisson M.Brinza D.Pevzner P.2009. De novo fragment assembly with short mate-paired reads: does the read length matter?Genome Res.19336-346. Chaisson, M., Brinza, D., and Pevzner, P. 2009. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19, 336–346.
Chaisson M.Pevzner P.2008. Short read fragment assembly of bacterial genomesGenome Res.18324-330. Chaisson, M., and Pevzner, P. 2008. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330.
Chikhi R.Lavenier D.2011. Localized genome assembly from reads to scaffolds: practical traversal of the paired string graphLect. Notes Comput. Sci.683339-48. Chikhi, R., and Lavenier, D. 2011. Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph. Lect. Notes Comput. Sci. 6833, 39–48.
Chitsaz H.Yee-Greenbaum J.Tesler G. et al.2011. Efficient de novo assembly of single-cell bacterial genomes from short-read data setsNat. Biotechnol.29915-921. Chitsaz, H., Yee-Greenbaum, J., Tesler, G., et al. 2011. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol. 29, 915–921.
Dalerba P.Kalisky T.Sahoo D. et al.2011. Single-cell dissection of transcriptional heterogeneity in human colon tumorsNat. Biotechnol.291120-1127. Dalerba, P., Kalisky, T., Sahoo, D., et al. 2011. Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nat. Biotechnol. 29, 1120–1127.
Dean F.Nelson J.Giesler T. et al.2001. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplificationGenome Res.111095-1099. Dean, F., Nelson, J., Giesler, T., et al. 2001. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11, 1095–1099.
Donmez N.Brudno M.2011. Hapsembler: an assembler for highly polymorphic genomesLect. Notes Comput. Sci.657738-52. Donmez, N., and Brudno, M. 2011. Hapsembler: an assembler for highly polymorphic genomes. Lect. Notes Comput. Sci. 6577, 38–52.
Ewing B.Hillier L.Wendl M. et al.1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessmentGenome Res.8175-185. Ewing, B., Hillier, L., Wendl, M., et al. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185.
Fan H.Wang J.Potanina A. et al.2011. Whole-genome molecular haplotyping of single cellsNat. Biotechnol.2951-57. Fan, H., Wang, J., Potanina, A., et al. 2011. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57.
Gill S.Pop M.Deboy R. et al.2006. Metagenomic analysis of the human distal gut microbiomeScience3121355-1359. Gill, S., Pop, M., Deboy, R., et al. 2006. Metagenomic analysis of the human distal gut microbiome. Science 312, 1355–1359.
Gnerre S.Maccallum I.Przybylski D. et al.2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence dataProc. Natl. Acad. Sci. USA1081513-1518. Gnerre, S., Maccallum, I., Przybylski, D., et al. 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518.
Grindberg R.Ishoey T.Brinza D. et al.2011. Single cell genome amplification accelerates identification of the apratoxin biosynthetic pathway from a complex microbial assemblagePLoS One6e18565. Grindberg, R., Ishoey, T., Brinza, D., et al. 2011. Single cell genome amplification accelerates identification of the apratoxin biosynthetic pathway from a complex microbial assemblage. PLoS One 6, e18565.
Hossain M.Azimi N.Skiena S.2009. Crystallizing short-read assemblies around seedsBMC Bioinform.10Suppl 1S16. Hossain, M., Azimi, N., and Skiena, S. 2009. Crystallizing short-read assemblies around seeds. BMC Bioinform. 10, Suppl 1, S16.
Huson D.Reinert K.Myers E.2002. The greedy path-merging algorithm for contig scaffoldingJ. ACM49603-615. Huson, D., Reinert, K., and Myers, E. 2002. The greedy path-merging algorithm for contig scaffolding. J. ACM 49, 603–615.
Idury R.Waterman M.1995. A new algorithm for DNA sequence assemblyJ. Comput. Biol.2291-306. Idury, R. and Waterman, M. 1995. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2, 291–306.
Ilie L.Fazayeli F.Ilie S.2010. Hitec: accurate error correction in high-throughput sequencing dataBioinformatics27295-302. Ilie, L., Fazayeli, F., and Ilie, S. 2010. Hitec: accurate error correction in high-throughput sequencing data. Bioinformatics 27, 295–302.
Ishoey T.Woyke T.Stepanauskas R. et al.2008. Genomic sequencing of single microbial cells from environmental samplesCurr. Opin. Microbiol.11198-204. Ishoey, T., Woyke, T., Stepanauskas, R., et al. 2008. Genomic sequencing of single microbial cells from environmental samples. Curr. Opin. Microbiol. 11, 198–204.
Islam S.Kjallquist U.Moliner A. et al.2011. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seqGenome Res.211160-1167. Islam, S., Kjallquist, U., Moliner, A., et al. 2011. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167.
Kelley D.Schatz M.Salzberg S.2010. Quake: quality-aware detection and correction of sequencing errorsGenome Biol.11R116. Kelley, D., Schatz, M., and Salzberg, S. 2010. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11, R116.
Li J.Vederas J.2009. Drug discovery and natural products: end of an era or an endless frontier?Science325161-165. Li, J., and Vederas, J., 2009. Drug discovery and natural products: end of an era or an endless frontier? Science 325, 161–165.
Li R.Zhu H.Ruan J. et al.2010. De novo assembly of human genomes with massively parallel short read sequencingGenome Res.20265-272. Li, R., Zhu, H., Ruan, J., et al. 2010. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272.
Marcy Y.Ouverney C.Bik E. et al.2007. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated tm7 microbes from the human mouthProc. Natl. Acad. Sci. USA10411889-11894. Marcy, Y., Ouverney, C., Bik, E., et al. 2007. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated tm7 microbes from the human mouth. Proc. Natl. Acad. Sci. USA 104, 11889–11894.
Medvedev P.Pham S.Chaisson M. et al.2011a. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblersLect. Notes Comput. Sci.6577238-251. Medvedev, P., Pham, S., Chaisson, M., et al. 2011a. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. Lect. Notes Comput. Sci. 6577, 238–251.
Medvedev P.Scott E.Kakaradov B. et al.2011b. Error correction of high-throughput sequencing datasets with non-uniform coverageBioinformatics27i137-i141. Medvedev, P., Scott, E., Kakaradov, B., et al. 2011b. Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics 27, i137–i141.
Navin N.Kendall J.Troge J. et al.2011. Tumour evolution inferred by single-cell sequencingNature47290-94. Navin, N., Kendall, J., Troge, J., et al. 2011. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94.
Peng Y.Leung H.C.M.Yiu S.-M. et al.2010. IDBA—a practical iterative de Bruijn graph de novo assemblerLect. Notes Comput. Sci.6044426-440. Peng, Y., Leung, H.C.M., Yiu, S.-M., et al. 2010. IDBA—a practical iterative de Bruijn graph de novo assembler. Lect. Notes Comput. Sci. 6044, 426–440.
Pevzner P.Tang H.2001. Fragment assembly with double-barreled dataBioinformatics17Suppl 1S225-S233. Pevzner, P., and Tang, H. 2001. Fragment assembly with double-barreled data. Bioinformatics 17, Suppl 1, S225–S233.
Pevzner P.Tang H.Tesler G.2004. De novo repeat classification and fragment assemblyGenome Res.141786-1796. Pevzner, P., Tang, H., and Tesler, G. 2004. De novo repeat classification and fragment assembly. Genome Res. 14, 1786–1796.
Pevzner P.Tang H.Waterman M.2001. An Eulerian path approach to DNA fragment assemblyProc. Natl. Acad. Sci. USA989748-9753. Pevzner, P., Tang, H., and Waterman, M. 2001. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753.
Pham S.Pevzner P.2010. DRIMM-synteny: decomposing genomes into evolutionary conserved segmentsBioinformatics262509-2516. Pham, S., and Pevzner, P. 2010. DRIMM-synteny: decomposing genomes into evolutionary conserved segments. Bioinformatics 26, 2509–2516.
Rodrigue S.Malmstrom R.Berlin A. et al.2009. Whole genome amplification and de novo assembly of single bacterial cellsPLoS One4e6864. Rodrigue, S., Malmstrom, R., Berlin, A., et al. 2009. Whole genome amplification and de novo assembly of single bacterial cells. PLoS One 4, e6864.
Sieber S.Marahiel M.2005. Molecular mechanisms underlying nonribosomal peptide synthesis: approaches to new antibioticsChem. Rev.105715-738. Sieber, S., and Marahiel, M. 2005. Molecular mechanisms underlying nonribosomal peptide synthesis: approaches to new antibiotics. Chem. Rev. 105, 715–738.
Simpson J.Wong K.Jackman S. et al.2009. ABySS: a parallel assembler for short read sequence dataGenome Res.191117-1123. Simpson, J., Wong, K., Jackman, S., et al. 2009. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123.
Woyke T.Tighe D.Mavromatis K. et al.2010. One bacterial cell, one complete genomePLoS One5e10314. Woyke, T., Tighe, D., Mavromatis, K., et al. 2010. One bacterial cell, one complete genome. PLoS One 5, e10314.
Youssef N.Blainey P.Quake S. et al.2011. Partial genome assembly for a candidate division op11 single cell from an anoxic spring (Zodletone Spring, Oklahoma)Appl. Environ. Microbiol.777804-7814. Youssef, N., Blainey, P., Quake, S., et al. 2011. Partial genome assembly for a candidate division op11 single cell from an anoxic spring (Zodletone Spring, Oklahoma). Appl. Environ. Microbiol. 77, 7804–7814.
Zerbino D.Birney E.2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphsGenome Res.18821-829. Zerbino, D., and Birney, E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829.

Information & Authors

Information

Published In

cover image Journal of Computational Biology
Journal of Computational Biology
Volume 19Issue Number 5May 2012
Pages: 455 - 477
PubMed: 22506599

History

Published online: 7 May 2012
Published in print: May 2012
Published ahead of print: 16 April 2012

Permissions

Request permissions for this article.

Topics

Authors

Affiliations

Anton Bankevich
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
These authors contributed equally to this work.
Sergey Nurk
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
These authors contributed equally to this work.
Dmitry Antipov
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Alexey A. Gurevich
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Mikhail Dvorkin
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Alexander S. Kulikov
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Steklov Institute of Mathematics, St. Petersburg, Russia.
Valery M. Lesin
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Sergey I. Nikolenko
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Steklov Institute of Mathematics, St. Petersburg, Russia.
Son Pham
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California.
Andrey D. Prjibelski
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Alexey V. Pyshkin
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Alexander V. Sirotkin
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Nikolay Vyahhi
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Glenn Tesler
Department of Mathematics, University of California, San Diego, La Jolla, California.
Max A. Alekseyev
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina.
Pavel A. Pevzner
Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California.

Notes

Address correspondence to:Dr. Max A. AlekseyevDepartment of Computer Science and EngineeringUniversity of South Carolina301 Main St.Columbia, SC 29208E-mail: [email protected]

Disclosure Statement

No competing financial interests exist.

Metrics & Citations

Metrics

Citations

Export citation

Select the format you want to export the citations of this publication.

View Options

Get Access

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.

Society Access

If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF/EPUB

View PDF/ePub

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media

Back to Top