Research Article
No access
Published Online: 19 October 2020

Finding Path Motifs in Large Temporal Graphs Using Algebraic Fingerprints

Publication: Big Data
Volume 8, Issue Number 5

Abstract

We study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several applications, for example, in recommending tours for tourists or detecting abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we consider, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution scales to massive graphs with up to a billion edges for a multiset query with 5 colors and up to 100 million edges for a multiset query with 10 colors, despite the problems being non-deterministic polynomial time-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and is highly optimized. For instance, in a real-world graph dataset with >6 million edges and a multiset query with 10 colors, we can extract an optimal solution in <8 minutes on a Haswell desktop with four cores.

Get full access to this article

View all available purchase options and get full access to this article.

References

1. Lacroix V, Fernandes CG, Sagot M-F. Motif search in graphs: Application to metabolic networks. IEEE Trans Comput Biol Bioinform 2006;3:360–368.
2. Coletto M, Garimella K, Gionis A, Lucchese C. A motif-based approach for identifying controversy. In: Proceedings of the Eleventh International AAAI Conference on Web and Social Media, 2017.
3. Honey CJ, Kötter R, Breakspear M, Sporns O. Network structure of cerebral cortex shapes functional connectivity on multiple time scales. PNAS 2007;104:10240–10245.
4. Yang J, McAuley J, Leskovec J. Community detection in networks with node attributes. In: 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, December 7–10, 2013. pp. 1151–1156.
5. Liu L, Tang J, Han J, et al. Mining topic-level influence in heterogeneous networks. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, October 2010. pp. 199–208.
6. Holme P, Saramäki J. Temporal networks. Phys Rep 2012;519:97–125.
7. DeChoudhury M, Feldman M, Amer-Yahia S, et al. Automatic construction of travel itineraries using social breadcrumbs. In: Proceedings of the 21st ACM conference on Hypertext and Hypermedia, June 2010. pp. 35–44.
8. Björklund A, Husfeldt T, Kaski P, Koivisto M. Narrow sieves for parameterized paths and packings. JCSS 2017;87:119–139.
9. Björklund A, Kaski P, Kowalik Ł. Determinant sums for undirected Hamiltonicity. SIAM J Comput 2014;43:280–299.
10. Björklund A, Kaski P, Kowalik Ł. Constrained multilinear detection and generalized graph motifs. Algorithmica 2016;74:947–967.
11. Björklund A, Kaski P, Kowalik Ł, Lauri J. Engineering motif search for large graphs. In: Proceedings of the Meeting on Algorithm Engineering and Experiments, January 2015. pp. 104–118.
12. Kaski P, Lauri J, Thejaswi S. Engineering motif search for large motifs. In: D'Angelo G. (Ed.): 17th International Symposium on Experimental Algorithms. Dagstuhl, Germany: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. pp. 1–19.
13. Cygan M, Fomin FV, Kowalik Ł, et al. Parameterized algorithms. Berlin, Heidelberg: Springer, 2015.
14. Thejaswi S, Gionis A. 2019. Available online at https://github.com/suhastheju/temporal-patterns (last accessed July 31, 2020).
15. Thejaswi S, Gionis A, Lauri J. 2020. Available online at https://github.com/suhastheju/temporal-patterns-mk2 (last accessed July 31, 2020).
16. Cicalese F, Gagie T, Giaquinta E, et al. Indexes for jumbled pattern matching in strings, trees and graphs. In: Kurland O, Lewenstein M, Porat E (Eds.): String Processing and Information Retrieval. Cham: Springer, 2013. pp. 56–63.
17. Gagie T, Hermelin D, Landau GM, et al. Binary jumbled pattern matching on trees and tree-like structures. Algorithmica 2015;73:571–588.
18. Giaquinta E, Grabowski S. New algorithms for binary jumbled pattern matching. IPL 2013;113:538–542.
19. Kowalik Ł, Lauri J. On finding rainbow and colorful paths. Theor Comput Sci. 2016;628:110–114.
20. Benson A, Gleich D, Leskovec J. Higher-order organization of complex networks. Science 2016;353:163–166.
21. Coletto M, Garimella K, Gionis A, Lucchese C. Automatic controversy detection in social media: A content-independent motif-based approach. Online Soc Netw Med 2017;3–4:22–31.
22. Milo R, Shen-Orr S, Itzkovitz S, et al. Network motifs: Simple building blocks of complex networks. Science 2002;298:824–827.
23. Alon N, Dao P, Hajirasouliha I, et al. Biomolecular network motif counting and discovery by color coding. Bioinformatics 2008;24:241–249.
24. Bressan M, Leucci S, Panconesi A. Motivo: Fast motif counting via succinct color coding and adaptive sampling. PVLDB 2019;12:1651–1663.
25. Koutis I. Faster algebraic algorithms for path and packing problems. In: Proceedings of the 35th international colloquium on Automata, Languages and Programming - Volume Part I, July 2008. pp. 575–586.
26. Koutis I. The power of group algebras for constrained multilinear monomial detection. In: Dagstuhl meeting 10441, 2010.
27. Koutis I. Constrained multilinear detection for faster functional motif discovery. Inform Process Lett. 2012;112:889–892.
28. Williams R. Finding paths of length k in O*(2k) time. Inform Process Lett. 2009;109:315–318.
29. Koutis I, Williams R. Limits and applications of group algebras for parameterized problems. In: Albers S, Marchetti-Spaccamela A, Matias Y, et al. (eds) Automata, Languages and Programming, Berlin, Germany: Springer, 2009. pp. 653–664.
30. Koutis I, Williams R. Algebraic fingerprints for faster algorithms. Comm ACM 2016;59:98–105.
31. Dell H, Lapinskas J, Meeks K. Approximately counting and sampling small witnesses using a colourful decision oracle. In: Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, January 2020. pp. 2201–2211.
32. Dechter R, Meiri I, Pearl J. Temporal constraint networks. Artif Intell 1991;49:61–95.
33. Holme P. Modern temporal network theory: A colloquium. Eur Phys J B 2015;88:234.
34. Kostakos V. Temporal graphs. Phys A Stat Mech Appl 2009;388:1007–1023.
35. Latapy M, Viard T, Magnien C. Stream graphs and link streams for the modeling of interactions over time. Soc Netw Analysis Min 2018;8.
36. Liu P, Benson A, Charikar M. Sampling methods for counting temporal motifs. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, New York, USA, 2019, pp. 294–302.
37. Paranjape A, Benson A, Leskovec J. Motifs in temporal networks. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 2017. pp. 601–610.
38. Wackersreuther B, Wackersreuther P, Oswald A, et al. Frequent subgraph discovery in dynamic networks. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, July 2010. pp. 155–162.
39. George B, Kim S, Shekhar S. Spatio-temporal network databases and routing algorithms: A summary of results. In: Papadias D, Zhang D, Kollios G. (Eds.): International Symposium on Spatial and Temporal Databases. Berlin, Heidelberg: Springer, 2007. pp. 460–477.
40. Wu H, Cheng J, Huang S, et al. Path problems in temporal graphs. Proc VLDB Endow (2014;7:721–732.
41. Wu H, Cheng J, Ke Y, et al. Efficient algorithms for temporal path computation. TKDE 2016;28:2927–2942.
42. Casteigts A, Himmel A, Molter H, Zschoche P. The computational complexity of finding temporal paths under waiting time constraints. CoRR, abs/1909.06437, 2019.
43. Gupta M, Aggarwal CC, Han J. Finding top-k shortest path distance changes in an evolutionary network. In: Advances in Spatial and Temporal Databases, Berlin, Germany: Springer, 2011. pp. 130–148.
44. Kovanen L, Karsai M, Kaski K, et al. Temporal motifs in time-dependent networks. J Stat Mech Theory Exp 2011;2011:P11005.
45. Aslay C, Nasir A, De Francisci Morales G, Gionis A. Mining frequent patterns in evolving graphs. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, October 2018. pp. 923–932.
46. Vansteenwegen P, Souffriau W, Oudheusden DV. The orienteering problem: A survey. EJOR 2011;209:1–10.
47. Garey MR, Johnson DS. Computers and intractability, vol. 29. W.H. Freeman and Co., New York, USA, 2002.
48. Gionis A, Lappas T, Pelechrinis K, Terzi E. Customized tour recommendations in urban areas. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, February 2014. pp. 313–322.
49. Thejaswi S, Gionis A. Pattern detection in large temporal graphs using algebraic fingerprints. In: Proceedings of the 2020 SIAM International Conference on Data Mining, Cincinnati, USA, 2020.
50. Chen L, Li X, Shi Y. The complexity of determining the rainbow vertex-connection of a graph. Theor Comput Sci 2011;412:4531–4535.
51. Uchizawa K, Aoki T, Ito T, et al. On the rainbow connectivity of graphs: Complexity and fpt algorithms. Algorithmica 2013;67:161–179.
52. Schwartz JT, Fast probabilistic algorithms for verification of polynomial identities. J ACM 1980;27:701–717.
53. Zippel R. Probabilistic algorithms for sparse polynomials. In: Proceedings of the International Symposium on Symbolic and Algebraic Computation, vol. 72 of LNCS, 1979. pp. 216–226.
54. Lin S-J, Al-Naffouri TY, Han YS, Chung W-H. Novel polynomial basis with fast Fourier transform and its application to Reed–Solomon erasure codes. IEEE Trans Inform Theory 2016;62:6284–6299.
55. Bell N, Garland M. Efficient sparse matrix-vector multiplication on CUDA. California: NVIDIA Corp., 2008.
56. Björklund A, Kaski P, Kowalik Ł. Fast Witness Extraction Using a Decision Oracle. In ESA 2014. ESA 2014. Lecture Notes in Computer Science, vol 8737. Berlin, Heidelberg: Springer, 2014.
57. Bollobás B. Random graphs, 2nd ed. Cambridge University Press, Cambridge, UK, 2001.
58. Kunegis J. KONECT: The Koblenz Network Collection. 2013. pp. 1343–1350. Available online at http://konect.uni-koblenz.de/networks (last accessed July 31, 2020).
59. Leskovec J, Krevl A. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. June 2014. Available online at http://snap.stanford.edu/data (last accessed July 31, 2020).
Cite this article as: Thejaswi S, Gionis A, Lauri J (2020) Finding path motifs in large temporal graphs using algebraic fingerprints. Big Data 8:5, 335–362, DOI: 10.1089/big.2020.0078.

Information & Authors

Information

Published In

cover image Big Data
Big Data
Volume 8Issue Number 5October 2020
Pages: 335 - 362
PubMed: 33017173

History

Published online: 19 October 2020
Published ahead of print: 5 October 2020
Published in print: October 2020

Permissions

Request permissions for this article.

Topics

Authors

Affiliations

Suhas Thejaswi* [email protected]
Department of Computer Science, Aalto University, Aalto, Finland.
Aristides Gionis
Department of Computer Science, Aalto University, Aalto, Finland.
Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.
Juho Lauri

Notes

An earlier version of this work appeared in the SIAM International Conference on Data Mining (SDM20).
*
Address correspondence to: Suhas Thejaswi, Department of Computer Science, Aalto University, Aalto 00076, Finland, [email protected]

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This research was supported by the Academy of Finland project “Adaptive and Intelligent Data (AIDA)” (317085), the EC H2020 RIA project “SoBigData++” (871042), and the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation. The authors acknowledge the use of computational resources funded by the project “Science-IT” at Aalto University, Finland.

Metrics & Citations

Metrics

Citations

Export citation

Select the format you want to export the citations of this publication.

View Options

Get Access

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.

Society Access

If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF/EPUB

View PDF/ePub

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media

Back to Top