Main Content

msalign

Align peaks in signal to reference peaks

Syntax

IntensitiesOut = msalign(X, Intensities, RefX)
... = msalign(..., 'Rescaling', RescalingValue, ...)
... = msalign(..., 'Weights', WeightsValue, ...)
... = msalign(..., 'MaxShift', MaxShiftValue, ...)
... = msalign(..., 'WidthOfPulses', WidthOfPulsesValue, ...)
... = msalign(..., 'WindowSizeRatio', WindowSizeRatioValue, ...)
... = msalign(..., 'Iterations', IterationsValue, ...)
... = msalign(..., 'GridSteps', GridStepsValue, ...)
... = msalign(..., 'SearchSpace', SearchSpaceValue, ...)
... = msalign(..., 'ShowPlot', ShowPlotValue, ...)
[IntensitiesOut, RefXOut] = msalign(..., 'Group', GroupValue, ...)

Input Arguments

X Vector of separation-unit values for a set of signals with peaks. The number of elements in the vector equals the number of rows in the matrix Intensities. The separation unit can quantify wavelength, frequency, distance, time, or m/z depending on the instrument that generates the signal data.
Intensities Matrix of intensity values for a set of peaks that share the same separation-unit range. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The number of rows equals the number of elements in vector X.
RefXVector of separation-unit values of known reference masses in a sample signal.

Tip

For reference peaks, select compounds that are not expected to have significant shifts among the different signals. For example, in mass spectrometry, select compounds that do not undergo structural transformation, such as phosphorylation. Doing so increases the accuracy of your alignment and lets you detect compounds that exhibit structural transformations among the sample signal.

RescalingValueControls the rescaling of X. Choices are true (default) or false. When false, the output signal is aligned only to the reference peaks by using constant shifts. By default, msalign estimates a rescaling factor, unless RefX contains only one reference peak.
WeightsValueVector of positive values, with the same number of elements as RefX. The default vector is ones(size(RefX)).
MaxShiftValueTwo-element vector, in which the first element is negative and the second element is positive, that specifies the lower and upper limits of a range, in separation units, relative to each peak. No peak shifts beyond these limits. Default is [-100 100].
WidthOfPulsesValuePositive value that specifies the width, in separation units, for all the Gaussian pulses used to build the correlating synthetic signal. The point of the peak where the Gaussian pulse reaches 60.65% of its maximum is set to the width specified by WidthOfPulsesValue. Default is 10.
WindowSizeRatioValuePositive value that specifies a scaling factor that determines the size of the window around every alignment peak. The synthetic signal is compared to the input signal only within these regions, which saves computation time. The size of the window is given in separation-units by WidthOfPulsesValue * WindowSizeRatioValue. Default is 2.5, which means at the limits of the window, the Gaussian pulses have a value of 4.39% of their maximum.
IterationsValuePositive integer that specifies the number of refining iterations. At every iteration, the search grid is scaled down to improve the estimates. Default is 5.
GridStepsValuePositive integer that specifies the number of steps for the search grid. At every iteration, the search area is divided by GridStepsValue^2. Default is 20.
SearchSpaceValue

Character vector or string that specifies the type of search space. Choices are:

  • 'regular' — Default. Evenly spaced lattice.

  • 'latin' — Random Latin hypercube with GridStepsValue^2 samples.

ShowPlotValueControls the display of a plot of an original and aligned signal over the reference masses specified by RefX. Choices are true, false, or I, an integer specifying the index of a signal in Intensities. If you set to true, the first signal in Intensities is plotted. Default is:
  • false — When return values are specified.

  • true — When return values are not specified.

GroupValueControls the creation of RefXOut, a new vector of separation-unit values to be used as reference masses for aligning the peaks. This vector is created by adjusting the values in RefX, based on the sample data from multiple signals in Intensities, such that the overall shifting and scaling of the peaks is minimized. Choices are true or false (default).

Tip

Set GroupValue to true only if Intensities contains data for a large number of signals, and you are not confident of the separation-unit values used for your reference peaks in RefX. Leave GroupValue set to false if you are confident of the separation-unit values used for your reference peaks in RefX.

Output Arguments

IntensitiesOut

Matrix of intensity values for a set of peaks that share the same separation-unit range. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The intensity values represent a shifting and scaling of the data.

RefXOutVector of separation-unit values of reference masses, calculated from RefX and the sample data from multiple signals in Intensities, when you set GroupValue to true.

Description

Tip

Use the following syntaxes with data from any separation technique that produces signal data, such as spectroscopy, NMR, electrophoresis, chromatography, or mass spectrometry.

IntensitiesOut = msalign(X, Intensities, RefX) aligns the peaks in raw, noisy signal data, represented by Intensities and X, to reference peaks, provided by RefX. First, it creates a synthetic signal from the reference peaks using Gaussian pulses centered at the separation-unit values specified by RefX. Then, it shifts and scales the separation-unit scale to find the maximum alignment between the input signals and the synthetic signal. (It uses an iterative multiresolution grid search until it finds the best scale and shift factors for each signal.) Once the new separation-unit scale is determined, the corrected signals are created by resampling their intensities at the original separation-unit values, creating IntensitiesOut, a vector or matrix of corrected intensity values. The resampling method preserves the shape of the peaks.

Tip

The msalign function works best with three to five reference peaks that you know will appear in the signal. If you use a single reference peak (internal standard), there is a possibility of aligning sample peaks to the incorrect reference peaks as msalign both scales and shifts the X vector. If using a single reference peak, you might need to only shift the X vector. To do this, use IntensitiesOut = interp1(X, Intensities, X-(ReferencePeak-ExperimentalPeak)).

... = msalign(..., 'PropertyName', PropertyValue, ...) calls msalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

... = msalign(..., 'Rescaling', RescalingValue, ...) controls the rescaling of X. Choices are true (default) or false. When false, the output signal is aligned only to the reference peaks by using constant shifts. By default, msalign estimates a rescaling factor, unless RefX contains only one reference peak.

... = msalign(..., 'Weights', WeightsValue, ...) specifies the relative weight for each mass in RefX, the vector of reference separation-unit values. WeightsValue is a vector of positive values, with the same number of elements as RefX. The default vector is ones(size(RefX)), which means each reference peak is weighted equally, so that more intense reference peaks have a greater effect in the alignment algorithm. If you have a less intense reference peak, you can increase its weight to emphasize it more in the alignment algorithm.

... = msalign(..., 'MaxShift', MaxShiftValue, ...) specifies the lower and upper limits of the range, in separation units, relative to each peak. No peak shifts beyond these limits. MaxShiftValue is a two-element vector, in which the first element is negative and the second element is positive. Default is [-100 100].

Note

Use these values to tune the robustness of the algorithm. Ideally, you should keep the range within the maximum expected shift. If you try to correct larger shifts by increasing the limits, you increase the possibility of picking incorrect peaks to align to the reference masses.

... = msalign(..., 'WidthOfPulses', WidthOfPulsesValue, ...) specifies the width, in separation units, for all the Gaussian pulses used to build the correlating synthetic signal. The point of the peak where the Gaussian pulse reaches 60.65% of its maximum is set to the width you specify with WidthOfPulsesValue. Choices are any positive value. Default is 10. WidthOfPulsesValue may also be a function handle. The function is evaluated at the respective separation-unit values and returns a variable width for the pulses. Its evaluation should give reasonable values from 0 to max(abs(Range)); otherwise, the function returns an error.

Note

Tuning the spread of the Gaussian pulses controls a tradeoff between robustness (wider pulses) and precision (narrower pulses). However, the spread of the pulses is unrelated to the shape of the observed peaks in the signal. The purpose of the pulse spread is to drive the optimization algorithm.

... = msalign(..., 'WindowSizeRatio', WindowSizeRatioValue, ...) specifies a scaling factor that determines the size of the window around every alignment peak. The synthetic signal is compared to the sample signal only within these regions, which saves computation time. The size of the window is given in separation units by WidthOfPulsesValue * WindowSizeRatioValue. Choices are any positive value. Default is 2.5, which means at the limits of the window, the Gaussian pulses have a value of 4.39% of their maximum.

... = msalign(..., 'Iterations', IterationsValue, ...) specifies the number of refining iterations. At every iteration, the search grid is scaled down to improve the estimates. Choices are any positive integer. Default is 5.

... = msalign(..., 'GridSteps', GridStepsValue, ...) specifies the number of steps for the search grid. At every iteration, the search area is divided by GridStepsValue^2. Choices are any positive integer. Default is 20.

... = msalign(..., 'SearchSpace', SearchSpaceValue, ...) specifies the type of search space. Choices are:

  • 'regular' — Default. Evenly spaced lattice.

  • 'latin' — Random Latin hypercube with GridStepsValue^2 samples.

... = msalign(..., 'ShowPlot', ShowPlotValue, ...) controls the display of a plot of an original and aligned signal over the reference masses specified by RefX. Choices are true, false, or I, an integer specifying the index of a signal in Intensities. If set to true, the first signal in Intensities is plotted. Default is:

  • false — When return values are specified.

  • true — When return values are not specified.

[IntensitiesOut, RefXOut] = msalign(..., 'Group', GroupValue, ...) controls the creation of RefXOut, a new vector of separation-unit values to use as reference masses for aligning the peaks. This vector is created by adjusting the values in RefX, based on the sample data from multiple signals in Intensities, such that the overall shifting and scaling of the peaks is minimized. Choices are true or false (default).

Tip

Set GroupValue to true only if Intensities contains data for a large number of signals, and you are not confident of the separation-unit values used for your reference peaks in RefX. Leave GroupValue set to false if you are confident of the separation-unit values used for your reference peaks in RefX.

Examples

Example 24. Aligning a Mass Spectrum with Three or More Reference Peaks
  1. Load a MAT-file, included with the Bioinformatics Toolbox™ software, that contains sample data, reference masses, and parameter data for synthetic peak width.

    load sample_lo_res
    R = [3991.4 4598 7964 9160];
    W = [60 100 60 100];
  2. Display a color image of the mass spectra before alignment.

    msheatmap(MZ_lo_res,Y_lo_res,'markers',R,'range',[3000 10000])
    title('before alignment')

  3. Align spectra with reference masses and display a color image of mass spectra after alignment.

    YA = msalign(MZ_lo_res,Y_lo_res,R,'weights',W);
    msheatmap(MZ_lo_res,YA,'markers',R,'range',[3000 10000])
    title('after alignment')

Example 25. Aligning a Mass Spectrum with One Reference Peak

It is not recommended to use the msalign function if you have only one reference peak. Instead, use the following procedure, which shifts the X input vector, but does not scale it.

  1. Load sample data and view the first sample spectrum.

    load sample_lo_res
    MZ = MZ_lo_res;
    Y = Y_lo_res(:,1);
    msviewer(MZ, Y)

  2. Use the tall peak around 4000 m/z as the reference peak. To determine the reference peak's m/z value, click , and then click-drag to zoom in on the peak. Right-click in the center of the peak, and then click Add Marker to label the peak with its m/z value.


  3. Shift a spectrum by the difference between RP, the known reference mass of 4000 m/z, and SP, the experimental mass of 4051.14 m/z.

    RP = 4000;
    SP = 4051.14;
    YOut = interp1(MZ, Y, MZ-(RP-SP));
  4. Plot the original spectrum in red and the shifted spectrum in blue and zoom in on the reference peak.

    plot(MZ,Y,'r',MZ,YOut,'b:')
    xlabel('Mass/Charge (M/Z)')
    ylabel('Relative Intensity')
    legend('Y','YOut')
    axis([3600 4800 -2 60])

References

[1] Monchamp, P., Andrade-Cetto, L., Zhang, J.Y., and Henson, R. (2007) Signal Processing Methods for Mass Spectrometry. In Systems Bioinformatics: An Engineering Case-Based Approach, G. Alterovitz and M.F. Ramoni, eds. (Artech House Publishers).

Version History

Introduced before R2006a