A. Borusan

26.04.2017, 16 Uhr c.t. DFKI Projektbüro Berlin, Room: Weizenbaum, Alt-Moabit 91c, 10559 Berlin: "Distributed frequent sequence mining with declarative subsequence constraints" (Alexander Renz-Wieland, Universität Mannheim)

Frequent sequence mining extracts frequently occurring patterns from sequential data. Some algorithms allow users to specify constraints to control which sequences are of interest. Each algorithm usually supports a particular subset of available constraints. Recently, based on regular expressions, a declarative approach to specifying many of these constraints in a unified way was proposed.
Scalable algorithms are essential to mine large datasets efficiently. Such algorithms exist for particular subsets of constraints. However, no scalable algorithm with support for many constraints has been put forward yet.
In this talk, I present a distributed two-stage algorithm based on item-based partitioning. It processes input sequences in parallel and constructs partitions that can mine for frequent sequences in parallel. We use nondeterministic finite automata as an efficient representation for the intermediary sequences we send to the partitions.
The proposed algorithm outperforms naive approaches for declarative constraints, is competitive to a state-of-the-art algorithm for traditional constraints, offers linear scalability, and makes it possible to mine datasets that cannot be mined efficiently using sequential algorithms.