swalign

Locally align two sequences using Smith-Waterman algorithm

Syntax

Score = swalign(Seq1,Seq2)

[___, Alignment] = swalign(Seq1,Seq2)

[___,___,Start] = swalign(Seq1,Seq2)

swalign(___,Name,Value)

Description

Score = swalign(Seq1,Seq2) returns the optimal local alignment score in bits. The scale factor used to calculate the score is provided by the scoring matrix.

example

[___, Alignment] = swalign(Seq1,Seq2) returns a 3-by-N character array showing the two sequences, Seq1,Seq2, in the first and third rows, and symbols representing the optimal local alignment between them in the second row. The symbol | indicates amino acids or nucleotides that match exactly. The symbol : indicates amino acids or nucleotides that are related as defined by the scoring matrix (nonmatches with a zero or positive scoring matrix value).

example

[___,___,Start] = swalign(Seq1,Seq2) returns a 2-by-1 vector of indices indicating the starting point in each sequence for the alignment.

example

swalign(___,Name,Value)calls swalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order.

example

Examples

collapse all

Locally Align Two Amino Acid Sequences and Return Optimal Local Alignment Score

Locally align two amino acid sequences using the BLOSUM50 scoring matrix. Return the optimal local alignment score.

[Score] = swalign('VSPAGMASGYD','IPGKASYD')

Score =

    8.6667

Locally Align Two Amino Acid Sequences Specifying Scoring Matrix and Open Gap Penalty

Locally align two amino acid sequences specifying the PAM250 scoring matrix and a gap open penalty of 5. Return the optimal local alignment score in bits and the alignment character array.

[Score, Alignment] = swalign('HEAGAWGHEE','PAWHEAE',...
                             'ScoringMatrix', 'pam250',...
                             'GapOpen',5)

Score =

     8


Alignment =

  3×6 char array

    'GAWGHE'
    ':|| ||'
    'PAW-HE'

Locally Align Two Amino Acid Sequences Specifying Scale Factor and Alignment Score Units

Locally align two amino acid sequences returning the Score in nat units (nats) by specifying a scale factor of log(2). Return the optimal local alignment score in bits and the alignment character array.

[Score, Alignment, Start] = swalign('HEAGAWGHEE','PAWHEAE',...
    'Scale',log(2))

Score =

    6.4694


Alignment =

  3×5 char array

    'AWGHE'
    '|| ||'
    'AW-HE'


Start =

     5
     2

Input Arguments

collapse all

`Seq1,Seq2` — Amino acid or nucleotide sequences
character vector | string | integer vector | structure

Amino acid or nucleotide sequences specified as a structure containing a Sequence field, character vector, string, or integer vector. For example:

Character vector or string of letters representing amino acids or nucleotides, such as returned by int2aa or int2nt.
Vector of integers representing amino acids or nucleotides, such as returned by aa2int or nt2int.
Structure containing a Sequence field.

Tip

For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

Example: 'HEAGAWGHEE','PAWHEAE'

Example: 'VSPAGMASGYD','IPGKASYD'

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: swalign('HEAGAWGHEE','PAWHEAE','Scale',log(2))

`Alphabet` — Type of sequence
`AA` (default) | `NT`

The type of sequence, specified as a character vector or string.

Example: "AA"

`ScoringMatrix` — Local alignment scoring matrix
`BLOSUM50` (default) | `NUC44` | `BLOSUM62` | `BLOSUM30` increasing by 5 up to `BLOSUM90` | `BLOSUM100` | `PAM10` increasing by 10 up to `PAM500` | `DAYHOFF` | `GONNET`

The scoring matrix used for the local alignment, specified as one of the following:

BLOSUM50 — When Alphabet is AA then the ScoringMatrix is BLOSUM50.
NUC44 — When Alphabet is NT then the ScoringMatrix is NUC44.
Note
The above scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the Scale property to specify an additional scale factor to convert the output score from bits to another unit.
Matrix representing the scoring matrix to use for the local alignment, such as returned by the blosum, pam, dayhoff, gonnet, or nuc44 function.
Note
If you use a scoring matrix that you created or was created by one of the above functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. You can use the Scale property to specify a scale factor to convert the output score to another unit.
DAYHOFF
GONNET
BLOSUM62
BLOSUM30
BLOSUM35
BLOSUM40
BLOSUM45
BLOSUM50
BLOSUM55
BLOSUM60
BLOSUM65
BLOSUM70
BLOSUM75
BLOSUM80
BLOSUM85
BLOSUM90
BLOSUM100
PAM10
PAM20
PAM30
PAM40
PAM50
PAM60
PAM70
PAM80
PAM90
PAM100
PAM110
PAM120
PAM130
PAM140
PAM150
PAM160
PAM170
PAM180
PAM190
PAM200
PAM210
PAM220
PAM230
PAM240
PAM250
PAM260
PAM270
PAM280
PAM290
PAM300
PAM310
PAM320
PAM330
PAM340
PAM350
PAM360
PAM370
PAM380
PAM390
PAM400
PAM410
PAM420
PAM430
PAM440
PAM450
PAM460
PAM470
PAM480
PAM490
PAM500

Note

If you need to compile swalign into a stand-alone application or software component using MATLAB^® Compiler™, use a matrix instead of a character vector or string for ScoringMatrix.

Example: "BLOSUM75"

Example: "PAM420"

Example: "GONNET"

Example: "DAYHOFF"

`Scale` — Output score scale factor
1 (default) | any positive value

The scale factor that is applied to the output score, and controls the units of the output score, specified as any positive value.

For example, if the output score is initially determined in bits, and you enter log(2), then swalign returns the Score in nats.

Note

If the ScoringMatrix property also specifies a scale factor, then swalign uses it first to scale the output score, then applies the provided scale factor to rescale the output score.

Tip

Before comparing alignment scores from multiple alignments, ensure the scores are in the same units. You can use the Scale property to control the units of the output scores.

Example: 5

Example: log(2)

`GapOpen` — Penalty for opening gap in alignment
`8` (default) | any positive value

The penalty for opening a gap in the alignment, specified as any positive value.

Example: 16

`ExtendGap` — Penalty for extending gap in alignment
any positive value

Penalty for extending a gap using the affine gap penalty scheme, specified as any positive value.

Note

If you specify this value, swalign uses the affine gap penalty scheme, that is, it scores the first gap using the provided GapOpen value and scores subsequent gaps using the ExtendGap. If you do not specify this value, swalign scores all gaps equally, using the GapOpen penalty.

Example: 12

`Showscore` — Display scoring space winning path of alignment
`false` (default) | `true`

Control the display of the scoring space and winning path of the alignment.

The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences. The color of each (n1,n2) coordinate in the scoring space represents the best score for the pairing of subsequences Seq1(s1:n1) and Seq2(s2:n2), where n1 is a position in Seq1, n2 is a position in Seq2, s1 is any position in Seq1 between 1:n1, and s2 is any position in Seq2 between 1:n2. The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties.

The winning path is represented by black dots in the scoring space, and it illustrates the pairing of positions in the optimal local alignment. The color of the last point (lower right) of the winning path represents the optimal local alignment score for the two sequences and is the Score output returned by swalign.

Note

The scoring space visually shows tandem repeats, small segments that potentially align, and partial alignments of domains from rearranged sequences.

Example: true

Output Arguments

collapse all

`Score` — Optimal local alignment score
double (default)

Optimal local alignment score in bits.

Example: 8.667

`Alignment` — Optimal local alignment
3-by-N character array

3-by-N character array showing the two sequences, Seq1,Seq2, in the first and third rows, and symbols representing the optimal local alignment between them in the second row.

Example: 'AWGHE' '|| ||' 'AW-HE'

`Start` — Starting point in alignment
2-by-1 indices vector

2-by-1 vector of indices indicating the starting point in each sequence for the alignment.

Example: 3 2

References

[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).

[2] Smith, T., and Waterman, M. (1981). Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197.

Version History

Introduced before R2006a

swalign

Syntax

Description

Examples

Locally Align Two Amino Acid Sequences and Return Optimal Local Alignment Score

Locally Align Two Amino Acid Sequences Specifying Scoring Matrix and Open Gap Penalty

Locally Align Two Amino Acid Sequences Specifying Scale Factor and Alignment Score Units

Input Arguments

Seq1,Seq2 — Amino acid or nucleotide sequences character vector | string | integer vector | structure

Name-Value Arguments

Alphabet — Type of sequence AA (default) | NT

ScoringMatrix — Local alignment scoring matrix BLOSUM50 (default) | NUC44 | BLOSUM62 | BLOSUM30 increasing by 5 up to BLOSUM90 | BLOSUM100 | PAM10 increasing by 10 up to PAM500 | DAYHOFF | GONNET

Scale — Output score scale factor 1 (default) | any positive value

GapOpen — Penalty for opening gap in alignment 8 (default) | any positive value

ExtendGap — Penalty for extending gap in alignment any positive value

Showscore — Display scoring space winning path of alignment false (default) | true

Output Arguments

Score — Optimal local alignment score double (default)

Alignment — Optimal local alignment 3-by-N character array

Start — Starting point in alignment 2-by-1 indices vector

References

Version History

See Also

`Seq1,Seq2` — Amino acid or nucleotide sequences
character vector | string | integer vector | structure

`Alphabet` — Type of sequence
`AA` (default) | `NT`

`ScoringMatrix` — Local alignment scoring matrix
`BLOSUM50` (default) | `NUC44` | `BLOSUM62` | `BLOSUM30` increasing by 5 up to `BLOSUM90` | `BLOSUM100` | `PAM10` increasing by 10 up to `PAM500` | `DAYHOFF` | `GONNET`

`Scale` — Output score scale factor
1 (default) | any positive value

`GapOpen` — Penalty for opening gap in alignment
`8` (default) | any positive value

`ExtendGap` — Penalty for extending gap in alignment
any positive value

`Showscore` — Display scoring space winning path of alignment
`false` (default) | `true`

`Score` — Optimal local alignment score
double (default)

`Alignment` — Optimal local alignment
3-by-N character array

`Start` — Starting point in alignment
2-by-1 indices vector