# Similarity-based Algorithms in TSMD ## STOMP The STOMP algorithm [Yeh et al. 2016] is a similarity-based method and proposes a fast computation of the Matrix Profile by efficiently leveraging the Fast Fourier Transform (FFT). Once the Matrix Profile is computed, the center of the Motif Set is defined as the subsequence with the minimum distance to another non-overlapping subsequence. A full scan of the time-series subsequences is performed, and non-overlapping subsequences that are at a distance of less than $R$ from the center are identified as occurrences of the corresponding Motif set. ```{eval-rst} .. autoclass:: tsmd.competitors.matrixprofile.MatrixProfile :members: ``` ### Usage ```python from tsmd.competitors.matrixprofile import MatrixProfile from tsmd.tools.utils import transform_label from tsmd.tools.plotting import plot_signal_pattern mp=MatrixProfile(n_patterns=2, wlen=200, radius_ratio=3, distance_name = 'UnitEuclidean') mp.fit(signal) labels=transform_label(mp.prediction_mask_) plot_signal_pattern(signal,labels) ``` ![STOMP output](../../../assets/methodExample/stomp_example.png "STOMP output") ### Reference [Yeh et al. 2016] Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, YifeiDing, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2016. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM). Ieee, 1317–1322 ## PanMP PanMP [Madrid et al. 2019] aims to generalize the Matrix Profile approach to detect patterns at varying time scales without requiring prior knowledge of the Motif size. To achieve this, the PanMatrixProfile—a matrix that contains Matrix Profiles for a range of window lengths in a time series—is computed. Based on distance and regardless of window size, the best non-overlapping Motif Pairs are then iteratively selected. The Motif sets are constructed from these selected Motif Pairs in the same way as in STOMP. Note that if the range of window sizes is restricted to a single value, this method functions identically to STOMP. ```{eval-rst} .. autoclass:: tsmd.competitors.panmatrixprofile.PanMatrixProfile :members: ``` ### Usage ```python from tsmd.competitors.panmatrixprofile import PanMatrixProfile from tsmd.tools.utils import transform_label from tsmd.tools.plotting import plot_signal_pattern pmp=PanMatrixProfile(n_patterns=2, min_wlen=180, max_wlen=220, radius_ratio=3, distance_name = 'UnitEuclidean') pmp.fit(signal) labels=transform_label(pmp.prediction_mask_) plot_signal_pattern(signal,labels) ``` ![PANMP output](../../../assets/methodExample/panmp_example.png "PanMatrixprofile output") ### Reference [Madrid et al. 2019] Frank Madrid, Shima Imani, Ryan Mercer, Zachary Zimmerman, Nader Shakibay, and Eamonn Keogh. 2019. Matrix profile xx: Finding and visualizing time series motifs of all lengths using the matrix profile. In 2019 IEEE International conference on big knowledge (ICBK). IEEE, 175–182 ## VALMOD VALMOD [Linardi et al. 2018] has a similar goal to PanMP but employs a slightly different approach. It leverages pruning techniques to compute the Matrix Profile over a range of window lengths, $\ell$. Motif Pairs are iteratively selected based on distance normalized by the square root of the window length. Motif sets are then built from these top Motif Pairs by identifying non-overlapping subsequences within a distance $< R$ from one of the two centers. ```{eval-rst} .. autoclass:: tsmd.competitors.valmod.Valmod :members: ``` ### Usage ```python from tsmd.competitors.valmod import Valmod from tsmd.tools.utils import transform_label from tsmd.tools.plotting import plot_signal_pattern valmod=Valmod(n_patterns=2, min_wlen=180, max_wlen=220, radius_ratio=3, distance_name = 'UnitEuclidean') valmod.fit(signal) labels=transform_label(valmod.prediction_mask_) plot_signal_pattern(signal,labels) ``` ![Valmod output](../../../assets/methodExample/valmod_example.png "Valmod output") ### Reference [Linardi et al. 2018] Michele Linardi, Yan Zhu, Themis Palpanas, and Eamonn Keogh. 2018. Matrix profile X: VALMOD-scalable discovery of variable-length motifs in data series.In Proceedings of the 2018 international conference on management of data. 1053–1066. ## $k$-Motiflets Unlike most other algorithms in our benchmark that require the user to set a radius parameter $R$, the k-Motiflets [Schäfer et al. 2022] method aims to discover motifs without needing this parameter. Instead, the user specifies the desired number of occurrences $k$ for the target motif. With $k$ defined, the method identifies the set of $k$ non-overlapping subsequences with minimal extent, where extent is the maximum pairwise distance among elements in the set. ```{eval-rst} .. autoclass:: tsmd.competitors.motiflets.Motiflets :members: ``` ### Usage ```python from tsmd.competitors.motiflets import Motiflets from tsmd.tools.utils import transform_label from tsmd.tools.plotting import plot_signal_pattern mf=Motiflets(k_max=5, min_wlen=180, max_wlen=220) mf.fit(signal) labels=transform_label(mf.prediction_mask_) plot_signal_pattern(signal,labels) ``` ![Motiflets output](../../../assets/methodExample/motiflets_example.png "k-Motiflets output") ### Reference [Schäfer et al. 2022] Patrick Schäfer and Ulf Leser. 2022. Motiflets: Simple and Accurate Detection of Motifs in Time Series. Proceedings of the VLDB Endowment, 16, 4 (2022), 725–737. ## PEPA ```{eval-rst} .. autoclass:: tsmd.competitors.persistence.BasePersistentPattern :members: ``` This method [Germain et al. 2024] extracts the motifs through three computational steps: (i) the time series is transformed into a graph with nodes representing subsequences and edges weighted by the distance between subsequences; (ii) persistent homology is applied to detect significant clusters of nodes, isolating them from nodes that correspond to irrelevant parts of the time series; and (iii) a post-processing step merges temporally adjacent subsequences within each cluster to form variable length motif sets. Note also that this method utilizes the LT-Normalized Euclidean distance, a distance measure invariant to linear trends. ### Usage ```python from tsmd.competitors.persistence import BasePersistentPattern from tsmd.tools.utils import transform_label from tsmd.tools.plotting import plot_signal_pattern pepa=BasePersistentPattern(wlen_for_persistence=180, n_patterns=2) pepa.fit(signal) labels=transform_label(pepa.prediction_mask_) plot_signal_pattern(signal,labels) ``` ![PEPA output](../../../assets/methodExample/pepa_example.png "PEPA output") ### Reference [Germain et al. 2024] Thibaut Germain, Charles Truong, and Laurent Oudre. 2024. Persistence-based motif discovery in time series.IEEE Transactions on Knowledge and Data Engineering (2024). ## A-PEPA A variant of PEPA [Germain et al. 2024] that does not require the user to define the exact number of motif sets and estimates it automatically. ```{eval-rst} .. autoclass:: tsmd.competitors.adaptative_persistence.AdaptativeBasePersistentPattern :members: ``` ### Usage ```python from tsmd.competitors.adaptative_persistence import AdaptativeBasePersistentPattern from tsmd.tools.utils import transform_label from tsmd.tools.plotting import plot_signal_pattern apepa=AdaptativeBasePersistentPattern(wlen_for_persistence=180) apepa.fit(signal) labels=transform_label(apepa.prediction_mask_) plot_signal_pattern(signal,labels) ``` ![A-PEPA output](../../../assets/methodExample/apepa_example.png "A-PEPA output") ### Reference [Germain et al. 2024] Thibaut Germain, Charles Truong, and Laurent Oudre. 2024. Persistence-based motif discovery in time series.IEEE Transactions on Knowledge and Data Engineering (2024).