Research Article
No access
Published Online: 8 April 2010

Aligning Protein Sequences with Predicted Secondary Structure

Publication: Journal of Computational Biology
Volume 17, Issue Number 3

Abstract

Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequences annotated with predicted secondary structure: (1) more accurate models for scoring alignments, (2) efficient algorithms for optimal alignment under these models, and (3) improved learning criteria for setting model parameters through inverse alignment, as well as (4) in-depth experiments evaluating model variants on benchmark alignments. More specifically, the new models use secondary structure predictions and their confidences to modify the scoring of both substitutions and gaps. All models have efficient algorithms for optimal pairwise alignment that run in near-quadratic time. These models have many parameters, which are rigorously learned using inverse alignment under a new criterion that carefully balances score error and recovery error. We then evaluate these models by studying how accurately an optimal alignment under the model recovers benchmark reference alignments that are based on the known three-dimensional structures of the proteins. The experiments show that these new models provide a significant boost in accuracy over the standard model for distant sequences. The improvement for pairwise alignment is as much as 15% for sequences with less than 25% identity, while for multiple alignment the improvement is more than 20% for difficult benchmarks whose accuracy under standard tools is at most 40%.

Get full access to this article

View all available purchase options and get full access to this article.

Information & Authors

Information

Published In

cover image Journal of Computational Biology
Journal of Computational Biology
Volume 17Issue Number 3March 2010
Pages: 561 - 580
PubMed: 20377464

History

Published online: 8 April 2010
Published in print: March 2010

Permissions

Request permissions for this article.

Topics

Authors

Affiliations

John Kececioglu
Department of Computer Science, University of Arizona, Tucson, Arizona 85721.
Eagu Kim
Work done while at Department of Computer Science, University of Arizona. Present affiliation: Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706.
Travis Wheeler
Work done while at Department of Computer Science, University of Arizona. Present affiliation: Janelia Farm Research Campus, Ashburn, Virginia 20147.

Notes

Address correspondence to:Prof. John KececiogluDepartment of Computer ScienceUniversity of ArizonaTucson, AZ 85721E-mail: [email protected]

Metrics & Citations

Metrics

Citations

Export citation

Select the format you want to export the citations of this publication.

View Options

Get Access

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.

Society Access

If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF/EPUB

View PDF/ePub

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media

Back to Top