A Clique-Based Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees
Publication: Journal of Computational Biology
Volume 19, Issue Number 10
Abstract
Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees.
This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up.
Get full access to this article
View all available purchase options and get full access to this article.
References
Akutsu T.Fukagawa D.Takasu A. et al.2011a. Exact algorithms for computing tree edit distance between unordered treesTheor Comput Sci.421352-364. Akutsu, T., Fukagawa, D., Takasu, A., et al. 2011a. Exact algorithms for computing tree edit distance between unordered trees. Theor Comput Sci. 421, 352–364.
Akutsu T.Mori T.Tamura T. et al.2011b. An improved clique-based method for computing edit distance between unordered trees and its application to comparison of glycan structures2011 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS)536-540. Akutsu, T., Mori, T., Tamura, T., et al. 2011b. An improved clique-based method for computing edit distance between unordered trees and its application to comparison of glycan structures. 2011 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), 536–540.
Aoki K.F.Yamaguchi A.Ueda N. et al.2004. KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chainsNucleic Acids Res.32267-272. Aoki, K.F., Yamaguchi, A., Ueda, N., et al. 2004. KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains. Nucleic Acids Res. 32, 267–272.
Bille P.2005. A survey on tree edit distance and related problemsTheor Comput Sci.337217-239. Bille, P. 2005. A survey on tree edit distance and related problems, Theor Comput Sci. 337, 217–239.
Demaine E.D.Mozes S.Rossman et al.2009. An optimal decomposition algorithm for tree edit distanceACM Transactions on Algorithms6Article 2. Demaine, E.D., Mozes, S., Rossman, et al. 2009. An optimal decomposition algorithm for tree edit distance, ACM Transactions on Algorithms 6, Article 2.
Fukagawa D.Tamura T.Takasu A. et al.2011. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structuresBMC BioinformaticsSuppl for APBC 201112S14. Fukagawa, D., Tamura, T., Takasu, A., et al. 2011. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures, BMC Bioinformatics (Suppl. for APBC 2011), 12, S14.
Horesh Y.Mehr R.Unger R.2006. Designing an A* algorithm for calculating edit distance between rooted-unordered treesJ Comput Biol.131165-1176. Horesh, Y., Mehr, R., and Unger, R. 2006. Designing an A* algorithm for calculating edit distance between rooted-unordered trees. J Comput Biol. 13, 1165–1176.
Jiang T.Lin G.Ma B. et al.2002. A general edit distance between RNA structures, 2002J Comput Biol.9371-388. Jiang, T., Lin, G., Ma, B., et al. 2002. A general edit distance between RNA structures, 2002. J Comput Biol. 9, 371–388.
Kanehisa M.Goto S.Furumichi F. et al.2010. KEGG for representation and analysis of molecular networksNucleic Acids Res.38D355-D360. Kanehisa, M., Goto, S., Furumichi, F., et al. 2010. KEGG for representation and analysis of molecular networks. Nucleic Acids Res. 38, D355–D360.
Matoušek J.Nešetřil J.1998Invitation to Discrete MathematicsOxford University PressNew York. Matoušek, J., and Nešetřil, J. 1998. Invitation to Discrete Mathematics, Oxford University Press, New York.
Nakamura T.Tomita E.2005. Efficient algorithms for finding a maximum clique with maximum vertex weight. Technical Report UEC-TR-CAS3-2005 (in Japanese)the University of Electro-CommunicationsTokyo. Nakamura, T., and Tomita, E. 2005. Efficient algorithms for finding a maximum clique with maximum vertex weight. Technical Report UEC-TR-CAS3-2005 (in Japanese), the University of Electro-Communications, Tokyo.
Ogawa H.1986. Labeled point pattern matching by Delaunay triangulation and maximal cliquesPattern Recognition35-40. Ogawa, H. 1986. Labeled point pattern matching by Delaunay triangulation and maximal cliques. Pattern Recognition. 35–40.
Pelillo M.Siddiqi K.Zucker S. W.1999. Matching hierarchical structures using association graphsIEEE Transactions on Pattern Analysis and Machine Intelligence211105-1119. Pelillo, M., Siddiqi, K., and Zucker, S. W. 1999. Matching hierarchical structures using association graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 21, 1105–1119.
Tai K.-C.1979. The tree-to-tree correction problemJournal of ACM26422-433. Tai, K.-C. 1979. The tree-to-tree correction problem. Journal of ACM. 26, 422–433.
Tomita E.Seki T.2003. An efficient branch-and-bound algorithm for finding a maximum cliqueProc. 4th International Conference on Discrete Mathematics and Theoretical Computer Science, Lecture Notes in Computer Science2731278-289. Tomita, E., and Seki, T. 2003. An efficient branch-and-bound algorithm for finding a maximum clique. Proc. 4th International Conference on Discrete Mathematics and Theoretical Computer Science, Lecture Notes in Computer Science, Vol. 2731, 278–289.
Tomita E.Sutani Y.Higashi T. et al.2010. A simple and faster branch-and-bound algorithm for finding a maximum cliqueProc. 4th International Workshop on Algorithms and Computation, Lecture Notes in Computer Science5942191-203. Tomita, E., Sutani, Y., Higashi, T., et al. 2010. A simple and faster branch-and-bound algorithm for finding a maximum clique. Proc. 4th International Workshop on Algorithms and Computation, Lecture Notes in Computer Science, Vol. 5942, 191–203.
Tomita E.Akutsu T.Matsunaga T.2011. Efficient algorithms for finding maximum and maximal cliques: Effective tools for bioinformatics625-640Laskovski A.B.Biomedical Engineering, Trends in Electronics, Communications and Softwarewww.intechopen.com/articles/show/title/efficient-algorithms-for-finding-maximum-and-maximal-cliques-effective-tools-for-bioinformaticsSept.72012. Tomita, E., Akutsu. T., and Matsunaga, T. 2011. Efficient algorithms for finding maximum and maximal cliques: Effective tools for bioinformatics, 625–640. In Laskovski, A.B. ed., Biomedical Engineering, Trends in Electronics, Communications and Software. Available at: www.intechopen.com/articles/show/title/efficient-algorithms-for-finding-maximum-and-maximal-cliques-effective-tools-for-bioinformatics. Accessed Sept. 7, 2012.
Torsello A.Hancock E. R.2003. Computing approximate tree edit distance using relaxation labelingPattern Recognition Letters241089-1097. Torsello, A., and Hancock, E. R. 2003. Computing approximate tree edit distance using relaxation labeling. Pattern Recognition Letters. 24, 1089–1097.
Yu K.-C.Ritman E.L.Higgns E.2007. System for the analysis and visualization of large 3D anatomical treesComputers in Biology and Medicine371802-1830. Yu, K.-C., Ritman, E.L., and Higgns, E. 2007. System for the analysis and visualization of large 3D anatomical trees. Computers in Biology and Medicine. 37, 1802–1830.
Zaki M.J.2005. Efficiently mining frequent trees in a forest: algorithms and applicationsIEEE Transactions on Knowledge and Data Engineering171021-1035. Zaki, M.J. 2005. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering. 17, 1021–1035.
Zhang K.Statman R.Shasha D.1992. On the editing distance between unordered labeled treesInformation Processing Letters42133-139. Zhang, K., Statman, R., and Shasha, D. 1992. On the editing distance between unordered labeled trees. Information Processing Letters. 42, 133–139.
Information & Authors
Information
Published In
Journal of Computational Biology
Volume 19 • Issue Number 10 • October 2012
Pages: 1089 - 1104
PubMed: 23057820
Copyright
Copyright 2012, Mary Ann Liebert, Inc.
History
Published online: 11 October 2012
Published in print: October 2012
Topics
Authors
Disclosure Statement
No competing financial interests exist.
Metrics & Citations
Metrics
Citations
Export Citation
Export citation
Select the format you want to export the citations of this publication.
View Options
Get Access
Access content
To read the fulltext, please use one of the options below to sign in or purchase access.⚠ Society Access
If you are a member of a society that has access to this content please log in via your society website and then return to this publication.