Main Content

distanceProfile

Compute distance profile between query subsequence and all other subsequences of a time series

Since R2024b

Description

Return Distance Profile

DP = distanceProfile(X,len,loc) returns the distance profile (vector of z-normalized Euclidean distances) between a query subsequence of the time series X and every subsequence in X with the same length len.

  • If X is a vector, then the software treats it as a single channel

  • If X is a matrix, then you can control whether the software computes the distance profile for each time series channel individually or cumulatively using the Type name-value argument.

The query begins at the time series position loc. The query subsequence is therefore defined by X(loc:loc+len-1).

example

[DP,DPI] = distanceProfile(___) also returns the vector DPI of the starting indices of the subsequences that best match the query subsequence.

example

[___] = distanceProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to exclude matches near the query starting position, set ExcludeTrivialMatches to true.

Plot Distance Profile

distanceProfile(___) plots an interactive plot of the distance profile, with overlays for the query, the motif (best match to query), and the discord (worst match to query). You can move the vertical selection lines in the plot to find the top motif and discord of any other data segments in the time series.

You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Load the data, which consists of T1. T1 is a timetable containing armature current measurements on a degrading DC motor.

load matrix_profile_data T1

T1 is known to have an anomalous segment with length 100, starting at location 9797. Use this segment as the query segment.

X = T1.MotorCurrent;
len = 100;
loc = 9797;

Calculate the distance profile.

[DP,DPI] = distanceProfile(X,len,loc);

Display the first two elements of index vector DPI and the corresponding distances in DP.

DPI(1:2)
ans = 2×1

        2617
        9368

DP(DPI(1)),DP(DPI(2))
ans = 
8.3894
ans = 
8.4532

For comparison, display the value of the largest distance.

max(DP)
ans = 
18.4828

Plot the distance profile.

distanceProfile(X,len,loc);

Figure contains 3 axes objects. Axes object 1 with title Time Series, xlabel Time, ylabel Data contains 5 objects of type line, constantline. These objects represent Data (Channel=1), Query (i=9797), Motif (i=2617), Discord (i=567). Axes object 2 with title Distance Profile, xlabel Time, ylabel Distance contains 3 objects of type line, constantline, patch. These objects represent Distance (Channel=1), Exclusion Zone. Axes object 3 with title Subsequences, xlabel Time, ylabel Data contains 3 objects of type line. These objects represent Query (i=9797), Motif (i=2617), Discord (i=567).

  • The top plot shows the time series. The query appears at location 9797. A motif, or match to the query, occurs at location 2617.

  • The middle plot shows the distance profile with an exclusion zone around the query location.

  • The bottom plot shows the query subsequence, the motif subsequence (best match) and the discord subsequence (worst match).

Move the vertical selection lines to find the top motif and discord of any other data segments in the time series.

The distanceProfile plot displays only the top match. If you are interested in viewing more matches, you can extract, plot, and compare subsegments using the values in X and DPI.

Input Arguments

collapse all

Time series to evaluate, specified as a numeric vector of length n or a numeric matrix containing multiple columns of length n. X must not have any missing data.

Length of the query subsequence, specified as an integer. len must be less than the length n of the time series.

Starting position of the query subsequence, specified as an integer. loc must be less than the length n of the time series.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: DP = distanceProfile(X,10,20,ExcludeTrivialMatch=true) excludes subsequence matches near the query subsequence starting position of 20.

Option to set exclusion zone around the starting position loc of the query sequence, specified as true or false. Setting this option to true excludes matches of the query subsequence with itself.

Length of exclusion zone on either side of the query starting position loc, specified as the number of data points to exclude. Setting this parameter when ExcludeTrivialMatch is true results in the setting of values of DP to NaN within the exclusion zone.

Option for controlling output length when X ends with a partial subsequence, specified as one of the following options:

  • "discard" — Truncate the length of the output vectors DP and DPI to nlen + 1, where n is the length of X.

  • "fill" — Extend the length of distance and index to n by padding DP with len – 1 NaNs. The software sets the last len – 1 elements of the vector DPI to the sequence n-len+2:n.

Computation options when X is a matrix, specified as one of the following approaches:

  • "individual" — Compute the distance profile of each time channel separately.

  • "cumulative" — Combine the distance profiles at each time point using the cumulative average of sorted distance profile values.

Output Arguments

collapse all

Distance profile containing the z-normalized distances between a query sequence of time series X and each subsequence of the time series of the same length len, returned as a numeric vector.

The length of DP is equal to n or nlen, depending on the setting for EndPoints. Here, n is the length of X.

When ExcludeTrivialMatch is true, elements of DP near the query starting location loc are set to NaN, with the number of elements determined by the value of ExclusionZoneLength.

Starting indices for subsequences X(DPI(k):DPI(k)+len-1) of X that best match the query subsequence of X(loc:loc+len-1), returned as an integer vector.

The elements of DPI sort the elements of DP(DPI) in ascending order of distances, that is, from the best match (smallest distance) to the worst match (largest distance). The best match, therefore, has the starting location of DP(DPI(1)), and the worst match has the starting location of DP(DPI(n-len+1).

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

Extended Capabilities

expand all

Version History

Introduced in R2024b