Manifold Learning in Protein Interactomes
Publication: Journal of Computational Biology
Volume 18, Issue Number 1
Abstract
Many studies and applications in the post-genomic era have been devoted to analyze complex biological systems by computational inference methods. We propose to apply manifold learning methods to protein-protein interaction networks (PPIN). Despite their popularity in data-intensive applications, these methods have received limited attention in the context of biological networks. We show that there is both utility and unexplored potential in adopting manifold learning for network inference purposes. In particular, the following advantages are highlighted: (a) fusion with diagnostic statistical tools designed to assign significance to protein interactions based on pre-selected topological features; (b) dissection into components of the interactome in order to elucidate global and local connectivity organization; (c) relevance of embedding the interactome in reduced dimensions for biological validation purposes. We have compared the performances of three well-known techniques—kernel-PCA, RADICAL ICA, and ISOMAP—relatively to their power of mapping the interactome onto new coordinate dimensions where important associations among proteins can be detected, and then back projected such that the corresponding sub-interactomes are reconstructed. This recovery has been done selectively, by using significant information according to a robust statistical procedure, and then standard biological annotation has been provided to validate the results. We expect that a byproduct of using subspace analysis by the proposed techniques is a possible calibration of interactome modularity studies. Supplementary Material is available online at www.libertonlinec.com.
Get full access to this article
View all available purchase options and get full access to this article.
References
Bader J.S.Chaudhuri A.Rothberg J.M. et al.2003. Gaining confidence in high-throughput protein interaction networksNat. Biotech.2278-85. Bader, J.S., Chaudhuri, A., Rothberg, J.M., et al. 2003. Gaining confidence in high-throughput protein interaction networks. Nat. Biotech. 22, 78–85.
Barbano P.E.Spivak M.Flajolet M. et al.2007. A mathematical tool for exploring the dynamics of biological networksProc. Natl. Acad. Sci. USA10419169-19174. Barbano, P.E., Spivak, M., Flajolet, M., et al. 2007. A mathematical tool for exploring the dynamics of biological networks. Proc. Natl. Acad. Sci. USA 104, 19169–19174.
Cardoso J.1989. Source separation using higher order momentsProc. ICASSP2109-2112. Cardoso, J. 1989. Source separation using higher order moments. Proc. ICASSP 2109–2112.
Cardoso J.Soulomiac A.S.1993. Blind beamforming for non-Gaussian signalsProc. IEE-F140362-370. Cardoso, J., and Soulomiac, A.S. 1993. Blind beamforming for non-Gaussian signals. Proc. IEE-F 140, 362–370.
Clarke R.Ressom H.W.Wang A. et al.2008. The properties of high-dimensional data spaces: implications for exploring gene and protein expression dataNat. Rev. Cancer837-49. Clarke, R., Ressom, H.W., Wang, A., et al. 2008. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer 8, 37–49.
Clauset A.Newman M.E.J.Moore C.2004. Finding community structure in very large networksPhy. Rev. E70066111. Clauset, A., Newman, M.E.J., and Moore, C. 2004. Finding community structure in very large networks. Phy. Rev. E 70, 066111.
Clauset A.Shalizi C.R.Newman M.E.J.2009. Power-law distributions in empirical dataSIAM Rev.51661-703. Clauset, A., Shalizi, C.R., and Newman, M.E.J. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 661–703.
Comon P.1994. Independent component analysis—a new concept?Signal Proc.36287-314. Comon, P. 1994. Independent component analysis—a new concept? Signal Proc. 36, 287–314.
Diaconis P.Freedman D.1984. Asymptotics of graphical projection pursuitAnn. Statist.12793-815. Diaconis, P., and Freedman, D. 1984. Asymptotics of graphical projection pursuit. Ann. Statist. 12, 793–815.
Deane C.M.Salwinski L.Xenarios I. et al.2002. Protein interactions: two methods for assessment of the reliability of high throughput observationsMol. Cell. Proteom.1349-356. Deane, C.M., Salwinski, L., Xenarios, I., et al. 2002. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteom. 1, 349–356.
Donoho D.L.2000. High-dimensional data analysis: the curses and blessings of dimensionalityPresented at the AMS Conference on Math Challenges of the 21st Century. Donoho, D.L. 2000. High-dimensional data analysis: the curses and blessings of dimensionality. Presented at the AMS Conference on Math Challenges of the 21st Century.
Edwards A.M.Kus B.Jansen R. et al.2002. Bridging structural biology and genomics: assessing protein interaction data with known complexesTrends Genet.18529-536. Edwards, A.M., Kus, B., Jansen, R., et al. 2002. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 18, 529–536.
Forman J.J.Clemons P.A.Schreiber S.L. et al.2005. SpectralNET an application for spectral graph analysis and visualizationBMC Bioinform.6260. Forman, J.J., Clemons, P.A., Schreiber, S.L., et al. 2005. SpectralNET an application for spectral graph analysis and visualization. BMC Bioinform. 6, 260.
Fortunato S.Barthelemy M.2007. Resolution limit in community detectionProc. Natl. Acad. Sci. USA10436-41. Fortunato, S., and Barthelemy, M. 2007. Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 104, 36–41.
Gavin A.C.Aloy P.Grandi P. et al.2006. Proteome survey reveals modularity of the yeast cell machineryNature440631-636. Gavin, A.C., Aloy, P., Grandi, P., et al. 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636.
GO2010http://go.princeton.edu/cgi-bin/GOTermMapperJune12010. GO. 2010. Available at: http://go.princeton.edu/cgi-bin/GOTermMapper. Accessed June 1, 2010.
Gorban A.Kegl B.Wunsch D. et al.2007. Principal manifolds for data visualisation and dimension reductionLect. Notes Comput. Sci.58. Gorban, A., Kegl, B., Wunsch, D., et al. 2007. Principal manifolds for data visualisation and dimension reduction. Lect. Notes Comput. Sci. 58.
Han J.D.J.Bertin N.Hao T. et al.2004. Evidence for dynamically organized modularity in the yeast proteinprotein interaction networkNature43088-93. Han, J.D.J., Bertin, N., Hao, T., et al. 2004. Evidence for dynamically organized modularity in the yeast proteinprotein interaction network. Nature 430, 88–93.
Hart G.T.Ramani A.K.Marcotte E.M.2006. How complete are current yeast and human protein-interaction networks?Gen. Biol.7120.1-120.9. Hart, G.T., Ramani, A.K., and Marcotte, E.M. 2006. How complete are current yeast and human protein-interaction networks? Gen. Biol. 7, 120.1–120.9.
Hartwell L.H.Hopfield J.J.Leibler S. et al.1999. From molecular to modular cell biologyNature40247-52. Hartwell, L.H., Hopfield, J.J., Leibler, S., et al. 1999. From molecular to modular cell biology. Nature 402, 47–52.
Hofman J.M.Wiggins C.H.2008. Bayesian approach to network modularityPhys. Rev Lett.100258701. Hofman, J.M., and Wiggins, C.H. 2008. Bayesian approach to network modularity. Phys. Rev Lett. 100, 258701.
Hyvarinen A.Oja E.1997. A fast fixed-point algorithm for independent component analysisNeur. Comput.91483-1492. Hyvarinen, A., and Oja, E. 1997. A fast fixed-point algorithm for independent component analysis. Neur. Comput. 9, 1483–1492.
Hyvarinen A.1999. Fast and robust fixed-point algorithms for independent component analysisIEEE Trans. Neur. Netw.10626-634. Hyvarinen, A. 1999. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neur. Netw. 10, 626–634.
Hyvrinen A.Karhunen J.Oja E.2001Independent Component AnalysisWileyNew York. Hyvrinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. Wiley, New York.
Ito T.Chiba T.Ozawa R. et al.2001. A comprehensive two-hybrid analysis to explore the yeast protein interactomeProc. Natl. Acad. Sci. USA984569-4574. Ito, T., Chiba, T., Ozawa, R., et al. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574.
Ivakhno S.Armstrong J.D.2007. Nonlinear dimensionality reduction of signaling networksBMC Syst. Biol.127. Ivakhno, S., and Armstrong, J.D. 2007. Nonlinear dimensionality reduction of signaling networks. BMC Syst. Biol. 1, 27.
Jeong H.Mason S.P.Barabasi A.L. et al.2001. Lethality and centrality in protein networksNature41141-42. Jeong, H., Mason, S.P., Barabasi, A.L., et al. 2001. Lethality and centrality in protein networks. Nature 411, 41–42.
Jonsson P.F.Bates P.A.2006. Global topological features of cancer proteins in the human interactomeBioinformatics222291-2297. Jonsson, P.F., and Bates, P.A. 2006. Global topological features of cancer proteins in the human interactome. Bioinformatics 22, 2291–2297.
Kitsak M.Havlin S.Paul G. et al.2007. Betweenness centrality of fractal and non-fractal scale-free model networks and tests on real networksPhys. Rev. E75056115. Kitsak, M., Havlin, S., Paul, G., et al. 2007. Betweenness centrality of fractal and non-fractal scale-free model networks and tests on real networks. Phys. Rev. E 75, 056115.
Krogan N.J.Cagney G.Yu H. et al.2006. Global landscape of protein complexes in the yeast Saccharomyces cerevisiaeNature440637-643. Krogan, N.J., Cagney, G., Yu, H., et al. 2006. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643.
Learned-Miller E.Fisher J.W. III2003. ICA using spacings estimates of entropyJ. Mach. Learn. Res.41271-1295. Learned-Miller, E., and Fisher, III, J.W. 2003. ICA using spacings estimates of entropy. J. Mach. Learn. Res. 4, 1271–1295.
de Lichtenberg U.Jensen L.J.Brunak S. et al.2005. Dynamic complex formation during the yeast cell cycleScience307724-727. de Lichtenberg, U., Jensen, L.J., Brunak, S., et al. 2005. Dynamic complex formation during the yeast cell cycle. Science 307, 724–727.
Lu L.J.Xia Y.Paccanaro A. et al.2005. Assessing the limits of genomic data integration for predicting protein networksGen. Res.15945-953. Lu, L.J., Xia, Y., Paccanaro, A., et al. 2005. Assessing the limits of genomic data integration for predicting protein networks. Gen. Res. 15, 945–953.
Marras E.Capobianco E.2008. Advances in human protein interactome inference, 89–94Dabo-Niang S.Ferraty F.Functional and Operational StatisticsPhysica-VerlagHeidelberg. Marras, E., and Capobianco, E. 2008. Advances in human protein interactome inference, 89–94. In Dabo-Niang, S., and Ferraty, F., eds. Functional and Operational Statistics. Physica-Verlag, Heidelberg.
Marras ECapobianco E2009. Mining protein-protein interaction networks: denoising effectsJ. Stat. Mech. Th. Exp.P01006. Marras E., and Capobianco E. 2009. Mining protein-protein interaction networks: denoising effects. J. Stat. Mech. Th. Exp. P01006.
MIPS2010http://mips.helmholtzmuenchen.de/genre/proj/yeast/Search/Catalogs/catalog.jspJune12010. MIPS. 2010. Available at: http://mips.helmholtzmuenchen.de/genre/proj/yeast/Search/Catalogs/catalog.jsp. Accessed June 1, 2010.
Newman M.E.J.Girvan M.2004. Finding and evaluating community structure in networksPhys. Rev. E69026113. Newman, M.E.J., and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113.
Newman M.E.J.2006. Modularity and community structure in networksProc. Natl. Acad. Sci. USA1038577-8582. Newman, M.E.J. 2006. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582.
Nilsson J.Fioretos T.Hoglund M. et al.2004. Approximate geodesic distances reveal biologically relevant structures in microarray dataBioinformatics20874-880. Nilsson, J., Fioretos, T., Hoglund, M., et al. 2004. Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics 20, 874–880.
Palla G.Dernyi I.Farkas I. et al.2005. Uncovering the overlapping community structure of complex networks in nature an societyNature435814-818. Palla, G., Dernyi, I., Farkas, I., et al. 2005. Uncovering the overlapping community structure of complex networks in nature an society. Nature 435, 814–818.
Reguly T.Breitkreutz A.Boucher L. et al.2006. A comprehensive curation and analysis of global interaction networks in S. cerevisiaeJ. Biol.511. Reguly, T., Breitkreutz, A., Boucher, L., et al. 2006. A comprehensive curation and analysis of global interaction networks in S. cerevisiae. J. Biol. 5, 11.
Roswall M.Bergstrom C.T.2007. An information theoretic framework for resolving community structure in comllex networksProc. Natl. Acad. Sci. USA1047327-7331. Roswall, M., and Bergstrom, C.T. 2007. An information theoretic framework for resolving community structure in comllex networks. Proc. Natl. Acad. Sci. USA 104, 7327–7331.
Rual J.F.Venkatesan K.Hao T. et al.2005. Towards a proteome-scale map of the human protein-protein interaction networkNature4371173-1178. Rual, J.F., Venkatesan, K., Hao, T., et al. 2005. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178.
Schlkopf B.Smola A.J.Muller K.R.1999. Kernel principal component analysis, 327–352Schlkopf B.Burges C.J.C.Smola A.J.Advances in Kernel Methods-Support Vector LearningMIT PressCambridge, MA. Schlkopf, B., Smola, A.J., and Muller, K.R. 1999. Kernel principal component analysis, 327–352. In Schlkopf, B., Burges, C.J.C., Smola, A.J., eds. Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge, MA.
Tenenbaum J.B.de Silva V.Langford J.C.2000. A global geometric framework for nonlinear dimensionality reductionScience2902319-2323. Tenenbaum, J.B., de Silva, V., and Langford, J.C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323.
Troyanskaya O.G.Dolinski K.Owen A.B. et al.2003. A Bayesian framework for combining heterogeneous data sources for gene function prediction in Saccharomyces cerevisiaeProc. Natl. Acad. Sci. USA1008348-8353. Troyanskaya, O.G., Dolinski, K., Owen, A.B., et al. 2003. A Bayesian framework for combining heterogeneous data sources for gene function prediction in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 100, 8348–8353.
Uetz P.Giot L.Cagney G. et al.2000. Comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiaeNature403623-627. Uetz, P., Giot, L., Cagney, G., et al. 2000. Comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627.
Ulitsky I.Karp R.M.Shamir R.2008. Detecting disease-specific disregulated pathways via analysis of clinical expression profilesProc. RECOMB 20084955. Ulitsky, I., Karp, R.M., and Shamir, R. 2008. Detecting disease-specific disregulated pathways via analysis of clinical expression profiles. Proc. RECOMB 2008 4955.
Vidal M.2005. Interactome modelingFEBS Lett.5791834-1838. Vidal, M. 2005. Interactome modeling. FEBS Lett. 579, 1834–1838.
von Mering C.Krause R.Snel B. et al.2002. Comparative assessment of large-scale data sets of protein-protein interactionsNature417399-401. von Mering, C., Krause, R., Snel, B., et al. 2002. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–401.
Wachi S.Yoneda K.Wu R.2005. Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissuesBioinform.214205-4208. Wachi, S., Yoneda, K., and Wu, R. 2005. Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinform. 21, 4205–4208.
Xu J.Li Y.2006. Discovering disease-genes by topological features in human protein-protein interaction networkBioinformatics222800-2805. Xu, J., and Li, Y. 2006. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics 22, 2800–2805.
Information & Authors
Information
Published In
Journal of Computational Biology
Volume 18 • Issue Number 1 • January 2011
Pages: 81 - 96
PubMed: 20666618
Copyright
Copyright 2011, Mary Ann Liebert, Inc.
History
Published online: 6 January 2011
Published in print: January 2011
Published ahead of print: 28 July 2010
Authors
Disclosure Statement
No competing financial interests exist.
Metrics & Citations
Metrics
Citations
Export Citation
Export citation
Select the format you want to export the citations of this publication.
View Options
Access content
To read the fulltext, please use one of the options below to sign in or purchase access.⚠ Society Access
If you are a member of a society that has access to this content please log in via your society website and then return to this publication.