Main Content

Time-stretch audio

Read in an audio signal. Listen to the audio signal and plot it over time.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on

sound(audioIn,fs)

Use `stretchAudio`

to apply a 1.5 speedup factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same, but the duration of the signal has decreased.

audioOut = stretchAudio(audioIn,1.5); t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, Speedup Factor = 1.5') axis tight grid on

sound(audioOut,fs)

Slow down the original audio signal by a 0.75 factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same as the original audio, but the duration of the signal has increased.

audioOut = stretchAudio(audioIn,0.75); t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, Speedup Factor = 0.75') axis tight grid on

sound(audioOut,fs)

`stretchAudio`

supports TSM on frequency-domain audio when using the default vocoder method. Applying TSM to frequency-domain audio enables you to reuse your STFT computation for multiple TSM factors.

Read in an audio signal. Listen to the audio signal and plot it over time.

[audioIn,fs] = audioread('FemaleSpeech-16-8-mono-3secs.wav'); sound(audioIn,fs) t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on

Convert the audio signal to the frequency domain.

win = sqrt(hann(256,'periodic')); ovrlp = 192; S = stft(audioIn,'Window',win,'OverlapLength',ovrlp,'Centered',false);

Speed up the audio signal by a factor of 1.4. Specify the window and overlap length used to create the frequency-domain representation.

alpha = 1.4; audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp); sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, TSM Factor = 1.4') axis tight grid on

Slow down the audio signal by a factor of 0.8. Specify the window and overlap length used to create the frequency-domain representation.

alpha = 0.8; audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp); sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, TSM Factor = 0.8') axis tight grid on

The default TSM method (vocoder) enables you to additionally apply phase-locking to increase the fidelity to the original audio.

Read in an audio signal. Listen to the audio signal and plot it over time.

[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav"); sound(audioIn,fs) t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on

Phase-locking adds a nontrivial computational load to TSM and is not always required. By default, phase-locking is disabled. Apply a speedup factor of 1.8 to the input audio signal. Listen to the audio signal and plot it over time.

alpha = 1.8; tic audioOut = stretchAudio(audioIn,alpha); processingTimeWithoutPhaseLocking = toc

processingTimeWithoutPhaseLocking = 0.0798

sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, alpha = 1.8, LockPhase = false') axis tight grid on

Apply the same 1.8 speedup factor to the input audio signal, this time enabling phase-locking. Listen to the audio signal and plot it over time.

```
tic
audioOut = stretchAudio(audioIn,alpha,"LockPhase",true);
processingTimeWithPhaseLocking = toc
```

processingTimeWithPhaseLocking = 0.1154

sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, alpha = 1.8, LockPhase = true') axis tight grid on

The waveform similarity overlap-add (WSOLA) TSM method enables you to specify the maximum number of samples to search for the best signal alignment. By default, WSOLA delta is the number of samples in the analysis window minus the number of samples overlapped between adjacent analysis windows. Increasing the WSOLA delta increases the computational load but might also increase fidelity.

Read in an audio signal. Listen to the first 10 seconds of the audio signal.

```
[audioIn,fs] = audioread('RockGuitar-16-96-stereo-72secs.flac');
sound(audioIn(1:10*fs,:),fs)
```

Apply a TSM factor of 0.75 to the input audio signal using the WSOLA method. Listen to the first 10 seconds of the resulting audio signal.

alpha = 0.75; tic audioOut = stretchAudio(audioIn,alpha,"Method","wsola"); processingTimeWithDefaultWSOLADelta = toc

processingTimeWithDefaultWSOLADelta = 19.4403

sound(audioOut(1:10*fs,:),fs)

Apply a TSM factor of 0.75 to the input audio signal, this time increasing the WSOLA delta to 1024. Listen to the first 10 seconds of the resulting audio signal.

tic audioOut = stretchAudio(audioIn,alpha,"Method","wsola","WSOLADelta",1024); processingTimeWithIncreasedWSOLADelta = toc

processingTimeWithIncreasedWSOLADelta = 25.5306

sound(audioOut(1:10*fs,:),fs)

`audioIn`

— Input signalcolumn vector | matrix | 3-D array

Input signal, specified as a column vector, matrix, or 3-D array. How the function
interprets `audioIn`

depends on the complexity of
`audioIn`

and the value of `Method`

:

If

`audioIn`

is real,`audioIn`

is interpreted as a time-domain signal. In this case,`audioIn`

must be a column vector or matrix. Columns are interpreted as individual channels.This syntax applies when

`Method`

is set to`'vocoder'`

or`'wsola'`

.If

`audioIn`

is complex,`audioIn`

is interpreted as a frequency-domain signal. In this case,`audioIn`

must be an*L*-by-*M*-by-*N*array, where*L*is the FFT length,*M*is the number of individual spectrums, and*N*is the number of channels.This syntax only applies when

`Method`

is set to`'vocoder'`

.

**Data Types: **`single`

| `double`

**Complex Number Support: **Yes

`alpha`

— TSM factorpositive scalar

TSM factor, specified as a positive scalar.

**Data Types: **`single`

| `double`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

`'Window',kbdwin(512)`

`'Method'`

— Method used to time-scale audio`'vocoder'`

(default) | `'wsola'`

Method used to time-scale audio, specified as the comma-separated pair consisting
of `'Method'`

and `'vocoder'`

or
`'wsola'`

. Set `'Method'`

to
`'vocoder'`

to use the phase vocoder method. Set
`'Method'`

to `'wsola'`

to use the WSOLA
method.

If `'Method'`

is set to `'vocoder'`

,
`audioIn`

can be real or complex. If `'Method'`

is set to `'wsola'`

, `audioIn`

must be
real.

**Data Types: **`single`

| `double`

`'Window'`

— Window applied in time domain`sqrt(hann(1024,'periodic'))`

(default) | real vectorWindow applied in the time domain, specified as the comma-separated pair
consisting of `'Window'`

and a real vector. The number of elements in
the vector must be in the range [1,
`size(`

]. The number of elements in
the vector must also be greater than `audioIn`

,1)`OverlapLength`

.

**Note**

If using `stretchAudio`

with frequency-domain input, you must
specify `Window`

as the same window used to transform
`audioIn`

to the frequency domain.

**Data Types: **`single`

| `double`

`'OverlapLength'`

— Number of samples overlapped between adjacent windows`round(0.75*numel(``Window`

))

(default) | scalar in the range [0
`numel(``Window`

)

)Number of samples overlapped between adjacent windows, specified as the
comma-separated pair consisting of `'OverlapLength'`

and an integer
in the range [0, `numel(Window)`

).

**Note**

If using `stretchAudio`

with frequency-domain input, you must
specify `OverlapLength`

as the same overlap length used to
transform `audioIn`

to a time-frequency representation.

**Data Types: **`single`

| `double`

`'LockPhase'`

— Apply identity phase-locking`false`

(default) | `true`

Apply identity phase-locking, specified as the comma-separated pair consisting of
`'LockPhase'`

and `false`

or
`true`

.

To enable this name-value pair argument, set `Method`

to
`'vocoder'`

.

**Data Types: **`logical`

`'WSOLADelta'`

— Maximum samples used to search for best signal alignment`numel(``Window`

)-`OverlapLength`

(default) | nonnegative scalarMaximum number of samples used to search for the best signal alignment, specified
as the comma-separated pair consisting of `'WSOLADelta'`

and a
nonnegative scalar.

To enable this name-value pair argument, set `Method`

to
`'wsola'`

.

**Data Types: **`single`

| `double`

`audioOut`

— Time-scale modified audiocolumn vector | matrix

Time-scale modified audio, returned as a column vector or matrix of independent channels.

The phase vocoder algorithm is a frequency-domain approach to TSM [1][2]. The basic steps of the phase vocoder algorithm are:

The algorithm windows a time-domain signal at interval η, where

`η = numel(`

. The windows are then converted to the frequency domain.`Window`

) -`OverlapLength`

To preserve horizontal (across time) phase coherence, the algorithm treats each bin as an independent sinusoid whose phase is computed by accumulating the estimates of its instantaneous frequency.

To preserve vertical (across an individual spectrum) phase coherence, the algorithm locks the phase advance of groups of bins to the phase advance of local peaks. This step only applies if

`LockPhase`

is set to`true`

.The algorithm returns the modified spectrogram to the time domain, with windows spaced at intervals of δ, where δ ≈ η/α. α is the speedup factor specified by the

`alpha`

input argument.

The WSOLA algorithm is a time-domain approach to TSM [1][2]. WSOLA is an extension of
the overlap and add (OLA) algorithm. In the OLA algorithm, a time-domain signal is windowed
at interval η, where `η = numel(`

. To construct the time-scale modified
output audio, the windows are spaced at interval δ, where δ ≈ η/α. α is the TSM factor
specified by the `Window`

) -
`OverlapLength`

`alpha`

input argument.

The OLA algorithm does a good job of recreating the magnitude spectra but can introduce
phase jumps between windows. The WSOLA algorithm attempts to smooth the phase jumps by
searching `WSOLADelta`

samples around the η interval for a window that
minimizes phase jumps. The algorithm searches for the best window iteratively, so that each
successive window is chosen relative to the previously selected window.

If `WSOLADelta`

is set to `0`

, then the algorithm
reduces to OLA.

[1] Driedger, Johnathan, and Meinard
Müller. "A Review of Time-Scale Modification of Music Signals." *Applied
Sciences*. Vol. 6, Issue 2, 2016.

[2] Driedger, Johnathan. "Time-Scale Modification Algorithms for Music Audio Signals", Master's thesis, Saarland University, Saarbrücken, Germany, 2011.

Generate C and C++ code using MATLAB® Coder™.

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Using `gpuArray`

(Parallel Computing Toolbox) input with the `stretchAudio`

function is only recommended for a GPU with compute capability 7.0 ("Volta") or above. Other
hardware might not offer any performance advantage. To check your GPU compute capability,
see `ComputeCompability`

in the output from the `gpuDevice`

(Parallel Computing Toolbox)
function. For more information, see GPU Support by Release (Parallel Computing Toolbox).

For an overview of GPU usage in MATLAB^{®}, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

`audioDataAugmenter`

| `audioTimeScaler`

| `reverberator`

| `shiftPitch`

A modified version of this example exists on your system. Do you want to open this version instead?

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)