Encoding-based Algorithms in TSMD¶

Grammarviz¶

Grammarviz [Senin et al. 2018] uses grammar induction methods for motif detection. In practice, the time series is discretized using SAX, and grammar induction techniques, such as Sequitur or RE-PAIR, are applied to the discretized series to identify grammar rules. The most frequent and representative grammar rules are selected, and occurrences of the various motifs are then extracted.

class tsmd.competitors.grammarviz.Grammarviz(n_patterns: int, alphabet_size=5, numerosity='MINDIST', window_size=30, word_size=10, folder_java='./competitors/competitors_tools/grammarviz', file_exchange_location='target/file_exchange')¶

Grammarviz algorithm for motif discovery.

Parameters

n_patterns (int) – Number of patterns.
alphabet_size (int, optional (default=5)) – Alphabet size for the discretization of the time series.
numerosity (str, optional (default to "MINDIST")) – Numerosity reduction type.
window_size (int, optional (default=30)) – Window_size.
word_size (int, optional (default=10)) – Word size.

prediction_mask_¶

Binary mask indicating the presence of motifs across the signal. Each row corresponds to one discovered motif, and each column to a time step. A value of 1 means the motif is present at that time step, and 0 means it is not.

Type: np.ndarray of shape (n_patterns, n_samples)

fit(signal: ndarray) → None¶

Fit Grammarviz

Parameters: signal (numpy array of shape (n_samples, )) – The input samples (time series length).
Returns: self – Fitted estimator.
Return type: object

Usage¶

Grammarviz is written in Java and requires a specific setup. Download it from the following GitHub: https://github.com/GrammarViz2/grammarviz2_src, place it in TSMD/tsmd/competitors/competitors_tools/, and follow the installation instructions in the repository. You can then use Grammarviz like any other method:

from tsmd.competitors.grammarviz import Grammarviz
from tsmd.tools.utils import transform_label
from tsmd.tools.plotting import plot_signal_pattern


gm=Grammarviz(n_patterns=2)
gm.fit(signal)

labels=transform_label(gm.prediction_mask_)
plot_signal_pattern(signal,labels)

Grammarviz output

Reference¶

[Senin et al. 2018] Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P Boedi-hardjo, Crystal Chen, and Susan Frankenstein. 2018. Grammarviz 3.0: Interactive discovery of variable-length time series patterns. ACM Transactions on Knowledge Discovery from Data (TKDD)12, 1 (2018), 1–28

MDL-Clust¶

The MDL-CLust method [Rakthanmanon et al. 2012] aims to perform clustering of subsequences. However, since clustering time series subsequences is generally ineffective, the authors propose disregarding data that does not fit into any cluster and avoiding overlapping subsequences. Thus, the output of MDL-CLust can be fully interpreted as motif sets. More specifically, the method utilizes the MDL principle to form clusters. In each iteration, we can either create a new cluster (by selecting the first two members using a classic PairMotif algorithm), add a subsequence to an existing cluster, or merge two clusters. We select the operation that most effectively reduces the description length. The algorithm terminates when no usable data remains or further reduction in the time series description length is no longer possible.

class tsmd.competitors.mdl.MDL(min_wlen, max_wlen, step=1, n_bins=6, n_neighbor=10, max_iter=50, n_jobs=1, verbose=False)¶

MDL algorithm for motif discovery.

Parameters

min_wlen (int) – Minimium window length.
max_wlen (int) – Maximum window length.
step (int, optional (default=1)) – step used to explore the different window lengths.
n_bins (int) – Number of bins used to binarize the signal.
n_neighbor (int, optional (default=10)) – Number of neighbors used when K nearest neighbors is called.
max_iter (int, optional (default=50)) – Max number of iteration.
n_jobs (int, optional (default=1)) – Number of jobs
verbose (bool, optional(default=False)) – Varbose mode.

prediction_mask_¶

Binary mask indicating the presence of motifs across the signal. Each row corresponds to one discovered motif, and each column to a time step. A value of 1 means the motif is present at that time step, and 0 means it is not.

Type: np.ndarray of shape (n_patterns, n_samples)

fit(signal)¶

Fit MDL

Parameters: signal (numpy array of shape (n_samples, )) – The input samples (time series length).
Returns: self – Fitted estimator.
Return type: object

Usage¶

from tsmd.competitors.mdl import MDL
from tsmd.tools.utils import transform_label
from tsmd.tools.plotting import plot_signal_pattern


mdl=MDL(min_wlen=180,max_wlen=220)
mdl.fit(signal)

labels=transform_label(mdl.prediction_mask_)
plot_signal_pattern(signal,labels)

MDL output

Reference¶

[Rakthanmanon et al. 2012] Thanawin Rakthanmanon, Eamonn J Keogh, Stefano Lonardi, and Scott Evans.2012. MDL-based time series clustering. Knowledge and information systems 33(2012), 371–399.

LoCoMotif¶

The LoCoMotif method [Wesenbeeck et al. 2024] addresses the challenge of variable length by searching for time-warped motifs at potentially different time scales within a time series. The process begins with the LoCo step, where the Self-Similarity Matrix of the time series is utilized to construct paths based on a principle similar to Dynamic Time Warping (DTW). The paths with the highest accumulated similarity in this matrix are identified. In the second step, these subpaths are grouped to create candidate Motifs. The method then assesses the encoding capacity of these candidates using a quality score that combines the similarity between occurrences with the overall coverage of the Motif set.

class tsmd.competitors.locomotif.LocoMotif(min_wlen, max_wlen, rho=0.8, n_patterns=None, start_mask=None, end_mask=None, overlap=0, warping=True)¶

LoCoMotif algorithm for motif discovery.

Parameters

min_wlen (int) – Minimium window length.
max_wlen (int) – Maximum window length.
n_patterns (int, optional (default is None)) – Maximum number of motif sets to find. If None, the number of patterns is inferred.
rho (float, optional (default=0.8)) – The strictness parameter between 0 and 1. It is the quantile of the similarity matrix to use as the threshold for the LoCo algorithm.
overlap (float, optional (default=0.0)) – Defines the maximal amount of overlap between subsequences accepted.
start_mask (np.ndarray, optional (default is None)) – Mask for the starting time points of representative motifs, where True means allowed. If None, all points are allowed.
start_mask – Mask for the ending time points of representative motifs, where True means allowed. If None, all points are allowed.
warping (bool, optional (default=True)) – Whether warping is allowed (True) or not (False).

prediction_mask_¶

Binary mask indicating the presence of motifs across the signal. Each row corresponds to one discovered motif, and each column to a time step. A value of 1 means the motif is present at that time step, and 0 means it is not.

Type: np.ndarray of shape (n_patterns, n_samples)

fit(signal)¶

Fit LoCoMotif

Parameters: signal (numpy array of shape (n_samples, )) – The input samples (time series length).
Returns: self – Fitted estimator.
Return type: object

Usage¶

from tsmd.competitors.locomotif import LocoMotif
from tsmd.tools.utils import transform_label
from tsmd.tools.plotting import plot_signal_pattern


loco=LocoMotif(n_patterns=2, min_wlen = 180, max_wlen =220)
loco.fit(signal)

labels=transform_label(loco.prediction_mask_)
plot_signal_pattern(signal,labels)

LoCoMotif output

Reference¶

[Wesenbeeck et al. 2024] Daan Van Wesenbeeck, Aras Yurtman, Wannes Meert, and Hendrik Blockeel.2024. LoCoMotif: Discovering time-warped motifs in time series.Data Mining and Knowledge Discovery (2024), 1–30.