swalign
Locally align two sequences using Smith-Waterman algorithm
Syntax
Score
= swalign(Seq1
, Seq2
)
[Score, Alignment
] = swalign(Seq1
, Seq2
)
[Score, Alignment, Start
]
= swalign(Seq1
, Seq2
)
... = swalign(Seq1
,Seq2
,
...'Alphabet', AlphabetValue
)
... = swalign(Seq1
,Seq2
,
...'ScoringMatrix', ScoringMatrixValue
,
...)
... = swalign(Seq1
,Seq2
,
...'Scale', ScaleValue
, ...)
... = swalign(Seq1
,Seq2
,
...'GapOpen', GapOpenValue
, ...)
... = swalign(Seq1
,Seq2
,
...'ExtendGap', ExtendGapValue
, ...)
... = swalign(Seq1
,Seq2
,
...'Showscore', ShowscoreValue
, ...)
Input Arguments
Seq1 , Seq2 | Amino acid or nucleotide sequences. Enter any of the following: Tip For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup. |
AlphabetValue | Character vector or string specifying the type of sequence. Choices are
'AA' (default) or
'NT' . |
ScoringMatrixValue | Either of the following:
Note If you need to compile |
ScaleValue | Positive value that specifies a scale factor that is applied to the output score. For example, if the output
score is initially determined in bits, and you enter Default is Note If the Tip Before comparing alignment scores from multiple alignments,
ensure the scores are in the same units. You can use the |
GapOpenValue | Positive value specifying the penalty for opening a gap
in the alignment. Default is |
ExtendGapValue | Positive value specifying the penalty for extending a gap using the affine gap penalty scheme. Note If you specify this value, |
ShowscoreValue | Controls the display of the scoring space and the winning path
of the alignment. Choices are true or false (default). |
Output Arguments
Score | Optimal local alignment score in bits. |
Alignment | 3-by-N character array showing the two sequences, Seq1 and Seq2 ,
in the first and third rows, and symbols representing the optimal
local alignment between them in the second row. |
Start | 2-by-1 vector of indices indicating the starting point in each sequence for the alignment. |
Description
returns
the optimal local alignment score in bits. The scale factor used to
calculate the score is provided by the scoring matrix. Score
= swalign(Seq1
, Seq2
)
[
returns
a 3-by-N character array showing the two sequences, Score, Alignment
] = swalign(Seq1
, Seq2
)Seq1
and Seq2
,
in the first and third rows, and symbols representing the optimal
local alignment between them in the second row. The symbol |
indicates
amino acids or nucleotides that match exactly. The symbol :
indicates
amino acids or nucleotides that are related as defined by the scoring
matrix (nonmatches with a zero or positive scoring matrix value).
[
returns
a 2-by-1 vector of indices indicating the starting point in each sequence
for the alignment.Score, Alignment, Start
]
= swalign(Seq1
, Seq2
)
... = swalign(
calls Seq1
,Seq2
,
...'PropertyName
', PropertyValue
,
...)swalign
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
... = swalign(
specifies
the type of sequences. Choices are Seq1
,Seq2
,
...'Alphabet', AlphabetValue
)'AA'
(default)
or 'NT'
.
... = swalign(
specifies the scoring matrix to use for the local
alignment. Default is:Seq1
,Seq2
,
...'ScoringMatrix', ScoringMatrixValue
,
...)
'BLOSUM50'
— WhenAlphabetValue
equals'AA'
'NUC44'
— WhenAlphabetValue
equals'NT'
... = swalign(
specifies
a scale factor that is applied to the output score, thereby controlling
the units of the output score. Choices are any positive value.Seq1
,Seq2
,
...'Scale', ScaleValue
, ...)
... = swalign(
specifies
the penalty for opening a gap in the alignment. Choices are any positive
value. Default is Seq1
,Seq2
,
...'GapOpen', GapOpenValue
, ...)8
.
... = swalign(
specifies
the penalty for extending a gap using the affine gap penalty scheme.
Choices are any positive value. Seq1
,Seq2
,
...'ExtendGap', ExtendGapValue
, ...)
... = swalign(
controls
the display of the scoring space and winning path of the alignment.
Choices are Seq1
,Seq2
,
...'Showscore', ShowscoreValue
, ...)true
or false
(default).
The scoring space is a heat map displaying the best scores for
all the partial alignments of two sequences. The color of each (n1,n2
)
coordinate in the scoring space represents the best score for the
pairing of subsequences Seq1(s1:n1)
and Seq2(s2:n2)
,
where n1
is a position in Seq1
, n2
is
a position in Seq2
, s1
is any
position in Seq1
between 1:n1
,
and s2
is any position in Seq2
between 1:n2
.
The best score for a pairing of specific subsequences is determined
by scoring all possible alignments of the subsequences by summing
matches and gap penalties.
The winning path is represented by black dots in the scoring
space, and it illustrates the pairing of positions in the optimal
local alignment. The color of the last point (lower right) of the
winning path represents the optimal local alignment score for the
two sequences and is the Score
output returned
by swalign
.
Note
The scoring space visually shows tandem repeats, small segments that potentially align, and partial alignments of domains from rearranged sequences.
Examples
Locally align two amino acid sequences using the
BLOSUM50
(default) scoring matrix and the default values for theGapOpen
andExtendGap
properties. Return the optimal local alignment score in bits and the alignment character array.[Score, Alignment] = swalign('VSPAGMASGYD','IPGKASYD') Score = 8.6667 Alignment = PAGMASGYD | | || || P-GKAS-YD
Locally align two amino acid sequences specifying the
PAM250
scoring matrix and a gap open penalty of5
.[Score, Alignment] = swalign('HEAGAWGHEE','PAWHEAE',... 'ScoringMatrix', 'pam250',... 'GapOpen',5) Score = 8 Alignment = GAWGHE :|| || PAW-HE
Locally align two amino acid sequences returning the
Score
in nat units (nats) by specifying a scale factor oflog(2)
.[Score, Alignment] = swalign('HEAGAWGHEE','PAWHEAE','Scale',log(2)) Score = 6.4694 Alignment = AWGHE || || AW-HE
References
[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).
[2] Smith, T., and Waterman, M. (1981). Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197.
Version History
Introduced before R2006a
See Also
aa2int
| aminolookup
| baselookup
| blosum
| dayhoff
| gonnet
| int2aa
| int2nt
| localalign
| multialign
| nt2aa
| nt2int
| nuc44
| nwalign
| pam
| pdbsuperpose
| seqdotplot