Machine Learning Approaches for Predicting Protein Complex Similarity
Publication: Journal of Computational Biology
Volume 24, Issue Number 1
Abstract
Discriminating native-like structures from false positives with high accuracy is one of the biggest challenges in protein–protein docking. While there is an agreement on the existence of a relationship between various favorable intermolecular interactions (e.g., Van der Waals, electrostatic, and desolvation forces) and the similarity of a conformation to its native structure, the precise nature of this relationship is not known. Existing protein–protein docking methods typically formulate this relationship as a weighted sum of selected terms and calibrate their weights by using a training set to evaluate and rank candidate complexes. Despite improvements in the predictive power of recent docking methods, producing a large number of false positives by even state-of-the-art methods often leads to failure in predicting the correct binding of many complexes. With the aid of machine learning methods, we tested several approaches that not only rank candidate structures relative to each other but also predict how similar each candidate is to the native conformation. We trained a two-layer neural network, a multilayer neural network, and a network of Restricted Boltzmann Machines against extensive data sets of unbound complexes generated by RosettaDock and PyDock. We validated these methods with a set of refinement candidate structures. We were able to predict the root mean squared deviations (RMSDs) of protein complexes with a very small, often less than 1.5 Å, error margin when trained with structures that have RMSD values of up to 7 Å. In our most recent experiments with the protein samples having RMSD values up to 27 Å, the average prediction error was still relatively small, attesting to the potential of our approach in predicting the correct binding of protein–protein complexes.
Get full access to this article
View all available purchase options and get full access to this article.
References
Akbal-Delibas B., Farhoodi R., Pomplun M., et al. 2016. Accurate refinement of docked protein complexes using evolutionary information and deep learning. J. Bioinform. Comput. Biol. 14, 1642002.
Akbal-Delibas B., Hashmi I., Shehu A., et al. 2012. An evolutionary conservation-based method for refining and reranking protein complex structures. J. Bioinform. Comput. Biol. 10, 1242002.
Akbal-Delibas B., and Haspel N. 2013. A conservation and biophysics guided stochastic approach to refining docked multimeric proteins. BMC Struct. Biol. 13(Suppl 1), S7.
Akbal-Delibas B., Pomplun M., and Haspel N. 2014. Accurmsd: A machine learning approach to predicting structure similarity of docked protein complexes. In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 289–296. ACM. Newport Beach, CA, USA.
Akbal-Delibas B., Pomplun M., and Haspel N. 2015a. Accurefiner: A machine learning guided refinement method for protein-protein docking. In Proceedings of the 7th International Conference on Bioinformatics and Computational Biology. 2015. Honolulu, HI, USA.
Akbal-Delibas B., Pomplun M., and Haspel N. 2015b. Accurate prediction of docked protein structure similarity. J. Comp. Biol. 22, 892–904.
Cheng T.M., Blundell T.L., and Fernandez-Recio J. 2007. pydock: Electrostatics and desolvation for effective scoring of rigid-body protein–protein docking. Proteins 68, 503–515.
Cherfils J., and Janin J. 1993. Protein docking algorithms: Simulating molecular recognition. Curr. Opin. Struct. Biol. 3, 265–269.
Comeau S.R., Gatchell D.W., Vajda S., et al. 2004. Cluspro: A fully automated algorithm for protein–protein docking. Nucleic Acids Res. 32(suppl 2), W96–W99.
Dominguez C., Boelens R., and Bonvin A. 2003. Haddock: A protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737.
Farhoodi R., Akbal-Delibas B., and Haspel N. 2015. Accurate prediction of docked protein structure similarity using neural networks and restricted boltzmann machines. In CSBW (Computational Structural Bioinformatics Workshop), in conjunction with IEEE-BIBM 2015. IEEE, Washington, DC.
Goodsell D.S., and Olson A.J. 2000. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153.
Gray J.J. 2006. High-resolution protein–protein docking. Curr. Opin. Struct. Biol. 16, 183–193.
Halperin I., Ma B., Wolfson H., et al. 2002. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 47, 409–443.
Hinton G.E. 2012. A practical guide to training restricted boltzmann machines, 599–619. In Montavon G., Orr G.B., and Müller K.-R., eds. Neural Networks: Tricks of the Trade, volume 7700 of Lecture Notes in Computer Science. Springer Berlin Heidelberg. ISBN 978-3-642-35288-1. Heidelberg, Germany.
Hwang H., Vreven T., Janin J., et al. 2010. Protein–protein docking benchmark version 4.0. Proteins 78, 3111–3114.
Janin J. 2010. Protein–protein docking tested in blind predictions: The capri experiment. Mol. Biosyst. 6, 2351–2362.
Jiménez-García B., Pons C., and Fernández-Recio J. 2013. pydockweb: A web server for rigid-body protein-protein docking using electrostatics and desolvation scoring. Bioinformatics 29, 1698–1699.
Kastritis P.L., and Bonvin A.M. 2010. Are scoring functions in protein-protein docking ready to predict interactomes? clues from a novel binding affinity benchmark. J. Proteome Res. 9, 2216–2225.
Lesk A.M. 2008. Introduction to Bioinformatics, 3rd edition. Oxford University Press. ISBN 978-0-19-920804-3. Oxford, UK.
Li X., Moal I.H., and Bates P.A. 2010. Detection and refinement of encounter complexes for protein-protein docking: Taking account of macromolecular crowding. Proteins 78, 3189–3196.
Lyskov S., and Gray J.J. 2008. The RosettaDock server for local protein-protein docking. Nucleic Acids Res. 36, W233–W238.
Mashiach E., Schneidman-Duhovny D., Andrusier N., et al. 2008. Firedock: A web server for fast interaction refinement in molecular docking. Nucleic Acids Res. 36(suppl 2), W229–W232.
Mehrotra K., Mohan C.K., and Ranka S. 1997. Elements of Artificial Neural Networks. MIT press, Cambridge, MA, USA.
Mihalek I., Res I., and Lichtarge O. 2006. Evolutionary trace report maker: A new type of service for comparative analysis of proteins. Bioinformatics 22, 1656–1657.
Moal I.H., Torchala M., Bates P.A., et al. 2013. The scoring of poses in protein-protein docking: Current capabilities and future directions. BMC Bioinform. 14, 286.
Moreira I.S., Fernandes P.A., and Ramos M.J. 2010. Protein–protein docking dealing with the unknown. J. Comput. Chem. 31, 317–342.
Phillips J.C., Braun R., Wang W., et al. 2005. Scalable molecular dynamics with namd. J. Comput. Chem. 26, 1781–1802.
Pierce B., and Weng Z. 2007. Zrank: Reranking protein docking predictions with an optimized energy function. Proteins 67, 1078–1086.
Rumelhart D.E., Hinton G.E., and Williams R.J. 1986. Learning internal representations by error propagation. In Rumelhart D.E., and Mcclelland J.L., eds. Parallel Distributed Processing. Vol. 1. Foundations, Pgs. 318–362. MIT Press, Cambridge, MA, USA.
Vreven T., Moal I.H., Vangone A., et al. 2015. Updates to the integrated protein–protein interaction benchmarks: Docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol. 427, 3031–3041.
Vries S., and Zacharias M. 2013. Flexible docking and refinement with a coarse-grained protein model using attract. Proteins 81, 2167–2174.
Werbos P.J. 1990. Backpropagation through time: What it does and how to do it. Proc. IEEE. 78, 1550–1560.
Wilkins A., Erdin S., Lua R., et al. 2012. Evolutionary trace for prediction and redesign of protein functional sites. Methods Mol Biol. 819, 29–42.
Yu D., and Deng L. 2011. Deep learning and its applications to signal and information processing. IEEE Signal Process. Mag. Available at: http://research.microsoft.com/apps/pubs/default.aspx?id=143620 (last viewing: 9/27/16).
Information & Authors
Information
Published In
Journal of Computational Biology
Volume 24 • Issue Number 1 • January 2017
Pages: 40 - 51
PubMed: 27748625
Copyright
Copyright 2017, Mary Ann Liebert, Inc.
History
Published in print: January 2017
Published online: 1 January 2017
Published ahead of print: 17 October 2016
Topics
Authors
Author Disclosure Statement
No competing financial interests exist.
Metrics & Citations
Metrics
Citations
Export Citation
Export citation
Select the format you want to export the citations of this publication.
View Options
Get Access
Access content
To read the fulltext, please use one of the options below to sign in or purchase access.⚠ Society Access
If you are a member of a society that has access to this content please log in via your society website and then return to this publication.