Research Article
No access
Published Online: 11 October 2012

A Clique-Based Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees

Publication: Journal of Computational Biology
Volume 19, Issue Number 10

Abstract

Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees.
This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up.

Get full access to this article

View all available purchase options and get full access to this article.

References

Akutsu T.Fukagawa D.Takasu A. et al.2011a. Exact algorithms for computing tree edit distance between unordered treesTheor Comput Sci.421352-364. Akutsu, T., Fukagawa, D., Takasu, A., et al. 2011a. Exact algorithms for computing tree edit distance between unordered trees. Theor Comput Sci. 421, 352–364.
Akutsu T.Mori T.Tamura T. et al.2011b. An improved clique-based method for computing edit distance between unordered trees and its application to comparison of glycan structures2011 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS)536-540. Akutsu, T., Mori, T., Tamura, T., et al. 2011b. An improved clique-based method for computing edit distance between unordered trees and its application to comparison of glycan structures. 2011 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), 536–540.
Aoki K.F.Yamaguchi A.Ueda N. et al.2004. KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chainsNucleic Acids Res.32267-272. Aoki, K.F., Yamaguchi, A., Ueda, N., et al. 2004. KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains. Nucleic Acids Res. 32, 267–272.
Bille P.2005. A survey on tree edit distance and related problemsTheor Comput Sci.337217-239. Bille, P. 2005. A survey on tree edit distance and related problems, Theor Comput Sci. 337, 217–239.
Demaine E.D.Mozes S.Rossman et al.2009. An optimal decomposition algorithm for tree edit distanceACM Transactions on Algorithms6Article 2. Demaine, E.D., Mozes, S., Rossman, et al. 2009. An optimal decomposition algorithm for tree edit distance, ACM Transactions on Algorithms 6, Article 2.
Fukagawa D.Tamura T.Takasu A. et al.2011. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structuresBMC BioinformaticsSuppl for APBC 201112S14. Fukagawa, D., Tamura, T., Takasu, A., et al. 2011. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures, BMC Bioinformatics (Suppl. for APBC 2011), 12, S14.
Horesh Y.Mehr R.Unger R.2006. Designing an A* algorithm for calculating edit distance between rooted-unordered treesJ Comput Biol.131165-1176. Horesh, Y., Mehr, R., and Unger, R. 2006. Designing an A* algorithm for calculating edit distance between rooted-unordered trees. J Comput Biol. 13, 1165–1176.
Jiang T.Lin G.Ma B. et al.2002. A general edit distance between RNA structures, 2002J Comput Biol.9371-388. Jiang, T., Lin, G., Ma, B., et al. 2002. A general edit distance between RNA structures, 2002. J Comput Biol. 9, 371–388.
Kanehisa M.Goto S.Furumichi F. et al.2010. KEGG for representation and analysis of molecular networksNucleic Acids Res.38D355-D360. Kanehisa, M., Goto, S., Furumichi, F., et al. 2010. KEGG for representation and analysis of molecular networks. Nucleic Acids Res. 38, D355–D360.
Matoušek J.Nešetřil J.1998Invitation to Discrete MathematicsOxford University PressNew York. Matoušek, J., and Nešetřil, J. 1998. Invitation to Discrete Mathematics, Oxford University Press, New York.
Nakamura T.Tomita E.2005. Efficient algorithms for finding a maximum clique with maximum vertex weight. Technical Report UEC-TR-CAS3-2005 (in Japanese)the University of Electro-CommunicationsTokyo. Nakamura, T., and Tomita, E. 2005. Efficient algorithms for finding a maximum clique with maximum vertex weight. Technical Report UEC-TR-CAS3-2005 (in Japanese), the University of Electro-Communications, Tokyo.
Ogawa H.1986. Labeled point pattern matching by Delaunay triangulation and maximal cliquesPattern Recognition35-40. Ogawa, H. 1986. Labeled point pattern matching by Delaunay triangulation and maximal cliques. Pattern Recognition. 35–40.
Pelillo M.Siddiqi K.Zucker S. W.1999. Matching hierarchical structures using association graphsIEEE Transactions on Pattern Analysis and Machine Intelligence211105-1119. Pelillo, M., Siddiqi, K., and Zucker, S. W. 1999. Matching hierarchical structures using association graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 21, 1105–1119.
Tai K.-C.1979. The tree-to-tree correction problemJournal of ACM26422-433. Tai, K.-C. 1979. The tree-to-tree correction problem. Journal of ACM. 26, 422–433.
Tomita E.Seki T.2003. An efficient branch-and-bound algorithm for finding a maximum cliqueProc. 4th International Conference on Discrete Mathematics and Theoretical Computer Science, Lecture Notes in Computer Science2731278-289. Tomita, E., and Seki, T. 2003. An efficient branch-and-bound algorithm for finding a maximum clique. Proc. 4th International Conference on Discrete Mathematics and Theoretical Computer Science, Lecture Notes in Computer Science, Vol. 2731, 278–289.
Tomita E.Sutani Y.Higashi T. et al.2010. A simple and faster branch-and-bound algorithm for finding a maximum cliqueProc. 4th International Workshop on Algorithms and Computation, Lecture Notes in Computer Science5942191-203. Tomita, E., Sutani, Y., Higashi, T., et al. 2010. A simple and faster branch-and-bound algorithm for finding a maximum clique. Proc. 4th International Workshop on Algorithms and Computation, Lecture Notes in Computer Science, Vol. 5942, 191–203.
Tomita E.Akutsu T.Matsunaga T.2011. Efficient algorithms for finding maximum and maximal cliques: Effective tools for bioinformatics625-640Laskovski A.B.Biomedical Engineering, Trends in Electronics, Communications and Softwarewww.intechopen.com/articles/show/title/efficient-algorithms-for-finding-maximum-and-maximal-cliques-effective-tools-for-bioinformaticsSept.72012. Tomita, E., Akutsu. T., and Matsunaga, T. 2011. Efficient algorithms for finding maximum and maximal cliques: Effective tools for bioinformatics, 625–640. In Laskovski, A.B. ed., Biomedical Engineering, Trends in Electronics, Communications and Software. Available at: www.intechopen.com/articles/show/title/efficient-algorithms-for-finding-maximum-and-maximal-cliques-effective-tools-for-bioinformatics. Accessed Sept. 7, 2012.
Torsello A.Hancock E. R.2003. Computing approximate tree edit distance using relaxation labelingPattern Recognition Letters241089-1097. Torsello, A., and Hancock, E. R. 2003. Computing approximate tree edit distance using relaxation labeling. Pattern Recognition Letters. 24, 1089–1097.
Yu K.-C.Ritman E.L.Higgns E.2007. System for the analysis and visualization of large 3D anatomical treesComputers in Biology and Medicine371802-1830. Yu, K.-C., Ritman, E.L., and Higgns, E. 2007. System for the analysis and visualization of large 3D anatomical trees. Computers in Biology and Medicine. 37, 1802–1830.
Zaki M.J.2005. Efficiently mining frequent trees in a forest: algorithms and applicationsIEEE Transactions on Knowledge and Data Engineering171021-1035. Zaki, M.J. 2005. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering. 17, 1021–1035.
Zhang K.Statman R.Shasha D.1992. On the editing distance between unordered labeled treesInformation Processing Letters42133-139. Zhang, K., Statman, R., and Shasha, D. 1992. On the editing distance between unordered labeled trees. Information Processing Letters. 42, 133–139.

Information & Authors

Information

Published In

cover image Journal of Computational Biology
Journal of Computational Biology
Volume 19Issue Number 10October 2012
Pages: 1089 - 1104
PubMed: 23057820

History

Published online: 11 October 2012
Published in print: October 2012

Permissions

Request permissions for this article.

Topics

Authors

Affiliations

Tomoya Mori
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan.
Takeyuki Tamura
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan.
Daiji Fukagawa
Faculty of Culture and Information Science, Doshisha University, Kyoto, Japan.
Atsuhiro Takasu
National Institute of Informatics, Tokyo, Japan.
Etsuji Tomita
University of Electro-Communications, Tokyo, Japan.
Tatsuya Akutsu
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan.

Notes

Address correspondence to:Tatsuya AkutsuBioinformatics CenterInstitute for Chemical ResearchKyoto UniversityGokasho, UjiKyoto, 611-0011Japan
E-mail: [email protected]

Disclosure Statement

No competing financial interests exist.

Metrics & Citations

Metrics

Citations

Export citation

Select the format you want to export the citations of this publication.

View Options

Get Access

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.

Society Access

If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF/EPUB

View PDF/ePub

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media

Back to Top