Encoding-based Algorithms in TSMD

Grammarviz

Grammarviz [Senin et al. 2018] uses grammar induction methods for motif detection. In practice, the time series is discretized using SAX, and grammar induction techniques, such as Sequitur or RE-PAIR, are applied to the discretized series to identify grammar rules. The most frequent and representative grammar rules are selected, and occurrences of the various motifs are then extracted.

class tsmd.competitors.grammarviz.Grammarviz(n_patterns: int, alphabet_size=5, numerosity='MINDIST', window_size=30, word_size=10, folder_java='./competitors/competitors_tools/grammarviz', file_exchange_location='target/file_exchange')

Grammarviz algorithm for motif discovery.

Parameters
  • n_patterns (int) – Number of patterns.

  • alphabet_size (int, optional (default=5)) – Alphabet size for the discretization of the time series.

  • numerosity (str, optional (default to "MINDIST")) – Numerosity reduction type.

  • window_size (int, optional (default=30)) – Window_size.

  • word_size (int, optional (default=10)) – Word size.

prediction_mask_

Binary mask indicating the presence of motifs across the signal. Each row corresponds to one discovered motif, and each column to a time step. A value of 1 means the motif is present at that time step, and 0 means it is not.

Type

np.ndarray of shape (n_patterns, n_samples)

fit(signal: ndarray) None

Fit Grammarviz

Parameters

signal (numpy array of shape (n_samples, )) – The input samples (time series length).

Returns

self – Fitted estimator.

Return type

object

Usage

Grammarviz is written in Java and requires a specific setup. Download it from the following GitHub: https://github.com/GrammarViz2/grammarviz2_src, place it in TSMD/tsmd/competitors/competitors_tools/, and follow the installation instructions in the repository. You can then use Grammarviz like any other method:

from tsmd.competitors.grammarviz import Grammarviz
from tsmd.tools.utils import transform_label
from tsmd.tools.plotting import plot_signal_pattern


gm=Grammarviz(n_patterns=2)
gm.fit(signal)

labels=transform_label(gm.prediction_mask_)
plot_signal_pattern(signal,labels)

Grammarviz output

Reference

[Senin et al. 2018] Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P Boedi-hardjo, Crystal Chen, and Susan Frankenstein. 2018. Grammarviz 3.0: Interactive discovery of variable-length time series patterns. ACM Transactions on Knowledge Discovery from Data (TKDD)12, 1 (2018), 1–28

MDL-Clust

The MDL-CLust method [Rakthanmanon et al. 2012] aims to perform clustering of subsequences. However, since clustering time series subsequences is generally ineffective, the authors propose disregarding data that does not fit into any cluster and avoiding overlapping subsequences. Thus, the output of MDL-CLust can be fully interpreted as motif sets. More specifically, the method utilizes the MDL principle to form clusters. In each iteration, we can either create a new cluster (by selecting the first two members using a classic PairMotif algorithm), add a subsequence to an existing cluster, or merge two clusters. We select the operation that most effectively reduces the description length. The algorithm terminates when no usable data remains or further reduction in the time series description length is no longer possible.

class tsmd.competitors.mdl.MDL(min_wlen, max_wlen, step=1, n_bins=6, n_neighbor=10, max_iter=50, n_jobs=1, verbose=False)

MDL algorithm for motif discovery.

Parameters
  • min_wlen (int) – Minimium window length.

  • max_wlen (int) – Maximum window length.

  • step (int, optional (default=1)) – step used to explore the different window lengths.

  • n_bins (int) – Number of bins used to binarize the signal.

  • n_neighbor (int, optional (default=10)) – Number of neighbors used when K nearest neighbors is called.

  • max_iter (int, optional (default=50)) – Max number of iteration.

  • n_jobs (int, optional (default=1)) – Number of jobs

  • verbose (bool, optional(default=False)) – Varbose mode.

prediction_mask_

Binary mask indicating the presence of motifs across the signal. Each row corresponds to one discovered motif, and each column to a time step. A value of 1 means the motif is present at that time step, and 0 means it is not.

Type

np.ndarray of shape (n_patterns, n_samples)

fit(signal)

Fit MDL

Parameters

signal (numpy array of shape (n_samples, )) – The input samples (time series length).

Returns

self – Fitted estimator.

Return type

object

Usage

from tsmd.competitors.mdl import MDL
from tsmd.tools.utils import transform_label
from tsmd.tools.plotting import plot_signal_pattern


mdl=MDL(min_wlen=180,max_wlen=220)
mdl.fit(signal)

labels=transform_label(mdl.prediction_mask_)
plot_signal_pattern(signal,labels)

MDL output

Reference

[Rakthanmanon et al. 2012] Thanawin Rakthanmanon, Eamonn J Keogh, Stefano Lonardi, and Scott Evans.2012. MDL-based time series clustering. Knowledge and information systems 33(2012), 371–399.

LoCoMotif

The LoCoMotif method [Wesenbeeck et al. 2024] addresses the challenge of variable length by searching for time-warped motifs at potentially different time scales within a time series. The process begins with the LoCo step, where the Self-Similarity Matrix of the time series is utilized to construct paths based on a principle similar to Dynamic Time Warping (DTW). The paths with the highest accumulated similarity in this matrix are identified. In the second step, these subpaths are grouped to create candidate Motifs. The method then assesses the encoding capacity of these candidates using a quality score that combines the similarity between occurrences with the overall coverage of the Motif set.

class tsmd.competitors.locomotif.LocoMotif(min_wlen, max_wlen, rho=0.8, n_patterns=None, start_mask=None, end_mask=None, overlap=0, warping=True)

LoCoMotif algorithm for motif discovery.

Parameters
  • min_wlen (int) – Minimium window length.

  • max_wlen (int) – Maximum window length.

  • n_patterns (int, optional (default is None)) – Maximum number of motif sets to find. If None, the number of patterns is inferred.

  • rho (float, optional (default=0.8)) – The strictness parameter between 0 and 1. It is the quantile of the similarity matrix to use as the threshold for the LoCo algorithm.

  • overlap (float, optional (default=0.0)) – Defines the maximal amount of overlap between subsequences accepted.

  • start_mask (np.ndarray, optional (default is None)) – Mask for the starting time points of representative motifs, where True means allowed. If None, all points are allowed.

  • start_mask – Mask for the ending time points of representative motifs, where True means allowed. If None, all points are allowed.

  • warping (bool, optional (default=True)) – Whether warping is allowed (True) or not (False).

prediction_mask_

Binary mask indicating the presence of motifs across the signal. Each row corresponds to one discovered motif, and each column to a time step. A value of 1 means the motif is present at that time step, and 0 means it is not.

Type

np.ndarray of shape (n_patterns, n_samples)

fit(signal)

Fit LoCoMotif

Parameters

signal (numpy array of shape (n_samples, )) – The input samples (time series length).

Returns

self – Fitted estimator.

Return type

object

Usage

from tsmd.competitors.locomotif import LocoMotif
from tsmd.tools.utils import transform_label
from tsmd.tools.plotting import plot_signal_pattern


loco=LocoMotif(n_patterns=2, min_wlen = 180, max_wlen =220)
loco.fit(signal)

labels=transform_label(loco.prediction_mask_)
plot_signal_pattern(signal,labels)

LoCoMotif output

Reference

[Wesenbeeck et al. 2024] Daan Van Wesenbeeck, Aras Yurtman, Wannes Meert, and Hendrik Blockeel.2024. LoCoMotif: Discovering time-warped motifs in time series.Data Mining and Knowledge Discovery (2024), 1–30.