# nwalign

Globally align two sequences using Needleman-Wunsch algorithm

## Syntax

``Score = nwalign(Seq1,Seq2)``
``Score = nwalign(Seq1,Seq2,Name=Value)``
``[Score,Alignment] = nwalign(Seq1,Seq2,___)``
``[Score,Alignment,Start] = nwalign(Seq1,Seq2,___)``

## Description

example

````Score = nwalign(Seq1,Seq2)` returns the optimal global alignment score in bits after aligning two sequences `Seq1` and `Seq2`. The scale factor used to calculate the score is provided by `ScoringMatrix`.```

example

````Score = nwalign(Seq1,Seq2,Name=Value)` uses additional options specified by one or more name-value arguments.```

example

````[Score,Alignment] = nwalign(Seq1,Seq2,___)` also returns a character array `Alignment` showing the alignment of `Seq1` and `Seq2`.```
````[Score,Alignment,Start] = nwalign(Seq1,Seq2,___)` also returns a vector of indices `Start` as `[1;1]` indicating the starting point in each sequence for the alignment.```

## Examples

collapse all

Globally align two amino acid sequences using the `BLOSUM50` (default) scoring matrix and the default values for the `GapOpen` and `ExtendGap` properties. Return the optimal global alignment score in bits and the alignment character array.

```seq1 = "VSPAGMASGYD"; seq2 = "IPGKASYD"; [Score, Alignment] = nwalign(seq1,seq2)```
```Score = 7.3333 ```
```Alignment = 3x11 char array 'VSPAGMASGYD' ': | | || ||' 'I-P-GKAS-YD' ```

Specify the `PAM250` scoring matrix and a gap open penalty of `5.`

`[Score,Alignment] = nwalign(seq1,seq2,ScoringMatrix="PAM250",GapOpen=5)`
```Score = 6 ```
```Alignment = 3x11 char array 'VSPAGMASGYD' ': | |:|| ||' 'I-P-GKAS-YD' ```

Return the `Score` in nat units (nats) by specifying a scale factor of `log(2)`.

`[Score,Alignment] = nwalign(seq1,seq2,Scale=log(2))`
```Score = 5.0831 ```
```Alignment = 3x11 char array 'VSPAGMASGYD' ': | | || ||' 'I-P-GKAS-YD' ```

## Input Arguments

collapse all

Amino or nucleotide sequence to align, specified as a character vector or string scalar, vector of integers, or structure.

You can specify:

Tip

For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

Data Types: `char` | `string` | `double` | `struct`

Amino or nucleotide sequence to align, specified as a character vector or string scalar, vector of integers, or structure. For details, see `Seq1`.

Data Types: `char` | `string` | `double` | `struct`

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: ```[s,a] = nwalign("HEAGAWGHEE","PAWHEAE",GapOpen=5,ShowScore=true)``` specifies to use the value of 5 as a penalty for gap opening and to show the scoring space and winning path.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: ```[s,a] = nwalign("HEAGAWGHEE","PAWHEAE",'GapOpen',5,'ShowScore',true)```

Type of sequence, specified as `"AA"` (amino acid) or `"NT"` (nucleotide).

Data Types: `char` | `string`

Scoring matrix for the global alignment, specified as a character vector, string scalar, or numeric matrix.

You can specify a scoring matrix name. Valid choices are:

• `"BLOSUM50"` (default for amino acid sequences)

• `"NUC44"` (default for nucleotide sequences)

• `"BLOSUM62"`

• `"BLOSUM30"` increasing by `5` up to `"BLOSUM90"`

• `"BLOSUM100"`

• `"PAM10"` increasing by `10` up to `"PAM500"`

• `"DAYHOFF"`

• `"GONNET"`

Note

The above scoring matrices, provided with the software, also include a scale factor that converts the units of the output score to bits. You can also specify the `Scale` name-value argument to specify an additional scale factor to convert the output score from bits to another unit.

You can also specify a numeric matrix, such as the one returned by the `blosum`, `pam`, `dayhoff`, `gonnet`, or `nuc44` function.

Note

• If you use a scoring matrix that you created or was created by one of these scoring matrix functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. You can use the `Scale` name-value argument to specify a scale factor to convert the output score to another unit.

• If you need to compile `nwalign` into a standalone application or software component using MATLAB® Compiler™, use a numeric matrix instead of the scoring matrix name.

Data Types: `double` | `char` | `string`

Scale factor applied to the output score, specified as a numeric scalar or vector. If you specify a vector, the function returns `Score` as a vector of the same length. By default, there is no scaling or change in the units of the output score.

Use this argument to control the units of the output scores. For example, if the output score is initially determined in bits, you can specify `Scale=log(2)` to return the output score in nats instead.

Note

• If the `ScoringMatrix` argument also specifies a scale factor, then the function uses it first to scale the output score, then applies the scale factor specified by the `Scale` argument to rescale the output score.

• Before comparing alignment scores from multiple alignments, ensure that the scores are in the same units.

Data Types: `double`

Penalty for opening a gap, specified as a positive scalar.

Data Types: `double`

Penalty for extending a gap using the affine gap penalty scheme, specified as a positive scalar.

If you specify this value, the function uses the affine gap penalty scheme, that is, it scores the first gap using the `GapOpen` value and scores subsequent gaps using the `ExtendGap` value. If you do not specify this value, the function scores all gaps equally, using the `GapOpen` penalty.

Data Types: `double`

Flag to perform a semiglocal alignment, specified as a numeric or logical `1` (`true`) or 0 (`false`).

In a semiglobal alignment, gap penalties at the end of the sequences are null.

Flag to display the scoring space and winning path of the alignment, specified as a numeric or logical `1` (`true`) or 0 (`false`).

The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences. The color of each (`n1,n2`) coordinate in the scoring space represents the best score for the pairing of subsequences `Seq1(1:n1)` and `Seq2(1:n2)`, where `n1` is a position in `Seq1` and `n2` is a position in `Seq2`. The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties.

The winning path is represented by black dots in the scoring space, and it illustrates the pairing of positions in the optimal global alignment. The color of the last point (lower right) of the winning path represents the optimal global alignment score for the two sequences and is the `Score` output.

Note

The scoring space visually indicates if there are potential alternate winning paths, which is useful when aligning sequences with big gaps. Visual patterns in the scoring space can also indicate a possible sequence rearrangement. ## Output Arguments

collapse all

Optimal global alignment score, returned as a numeric scalar or vector. It is returned as a vector when you specify a numeric vector for the `Scale` name-value argument.

Aligned sequences, returned as a character array. The first and third rows are `Seq1` and `Seq2`, respectively. The second row shows symbols representing the optimal global alignment for two sequences. The symbol `|` indicates amino acids or nucleotides that match exactly. The symbol `:` indicates amino acids or nucleotides that are related as defined by the scoring matrix (nonmatches with a zero or positive scoring matrix value).

Starting point in each sequence for the alignment, returned as a vector of indices. Because the function performs a global alignment, `Start` is always returned as `[1;1]`. The function returns this output to be consistent with the `swalign` function.

 Durbin, Richard, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1st ed. Cambridge University Press, 1998.