blastncbi

Create remote NCBI BLAST report request ID or link to NCBI BLAST report

Syntax

blastncbi(Seq, Program)
RID = blastncbi(Seq, Program)
[RID, RTOE] = blastncbi(Seq, Program)

... blastncbi(Seq, Program, ...'Database', DatabaseValue, ...)
... blastncbi(Seq, Program, ...'Descriptions', DescriptionsValue, ...)
... blastncbi(Seq, Program, ...'Alignments', AlignmentsValue, ...)
... blastncbi(Seq, Program, ...'Filter', FilterValue, ...)
... blastncbi(Seq, Program, ...'Expect', ExpectValue, ...)
... blastncbi(Seq, Program, ...'Word', WordValue, ...)
... blastncbi(Seq, Program, ...'Matrix', MatrixValue, ...)
... blastncbi(Seq, Program, ...'GapOpen', GapOpenValue, ...)
... blastncbi(Seq, Program, ...'ExtendGap', ExtendGapValue, ...)
... blastncbi(Seq, Program, ...'GapCosts', GapCostsValue, ...)
... blastncbi(Seq, Program, ...'Inclusion', InclusionValue, ...)
... blastncbi(Seq, Program, ...'Pct', PctValue, ...)
... blastncbi(Seq, Program, ...'Entrez', EntrezValue, ...)

Input Arguments

Seq

Nucleotide or amino acid sequence specified by any of the following:

  • GenBank®, GenPept, or RefSeq accession number

  • GI sequence identifier

  • FASTA file

  • URL pointing to a sequence file

  • String

  • Character array

  • MATLAB® structure containing a Sequence field

Program

String specifying a BLAST program. Choices are:

  • 'blastn' — Search nucleotide query versus nucleotide database.

  • 'blastp' — Search protein query versus protein database.

  • 'blastx' — Search translated nucleotide query versus protein database.

  • 'megablast' — Quickly search for highly similar nucleotide sequences.

  • 'psiblast' — Search protein query using position-specific iterative BLAST.

  • 'tblastn' — Search protein query versus translated nucleotide database.

  • 'tblastx' — Search translated nucleotide query versus translated nucleotide database.

DatabaseValue

String specifying a database. Compatible databases depend on the type of sequence specified by Seq, and the program specified by Program.

For a list of database choices for nucleotide sequences and amino acid sequences, see the lists in the section Description.

DescriptionsValue

Value specifying the number of short descriptions to include in the report. Default is 100, unless Program = 'psiblast', then default is 500.

AlignmentsValue

Value specifying the number of sequences for which high-scoring segment pairs (HSPs) are reported. Default is 100, unless Program = 'psiblast', then default is 500.

FilterValue

String specifying a filter. Possible choices are:

  • 'L' (default) — Low complexity.

  • 'R' — Human repeats.

  • 'm' — Mask for lookup table.

  • 'lcase' — Turn on the lowercase mask.

Choices vary depending on the selected Program. For more information, see the table Choices for Optional Properties by BLAST Program.

ExpectValue

Value specifying the statistical significance threshold for matches against database sequences. Choices are any real number. Default is 10.

WordValue

Value specifying a word length for the query sequence.

Choices for amino acid sequences are:

  • 2

  • 3 (default)

Choices for nucleotide sequences are:

  • 7

  • 11 (default)

  • 15

Choices when Program = 'megablast' are:

  • 11

  • 12

  • 16

  • 20

  • 24

  • 28 (default)

  • 32

  • 48

  • 64

MatrixValue

String specifying the substitution matrix for amino acid sequences only. The matrix assigns the score for a possible alignment of any two amino acid residues. Choices are:

  • 'PAM30'

  • 'PAM70'

  • 'BLOSUM45'

  • 'BLOSUM62' (default)

  • 'BLOSUM80'

GapOpenValue

Integer that specifies the penalty for opening a gap in the alignment of amino acid sequences.

Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapCosts Property by Matrix.

ExtendGapValue

Integer that specifies the penalty for extending a gap in the alignment of amino acid sequences.

Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapCosts Property by Matrix.

GapCostsValue

Vector containing two integers: the first is the penalty for opening a gap, and the second is the penalty for extending the gap, in the alignment of amino acid sequences.

Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapCosts Property by Matrix.

InclusionValue

Value specifying the statistical significance threshold for including a sequence in the Position-Specific Scoring Matrix (PSSM) created by PSI-BLAST for the subsequent iteration. Default is 0.005.

    Note:   Specify an InclusionValue only when Program = 'psiblast'.

PctValue

Value specifying the percent identity and the corresponding match and mismatch score for matching existing sequences in a public database. Choices are:

  • None

  • 99 (default) — 99, 1, -3

  • 9898, 1, -3

  • 9595, 1, -3

  • 9090, 1, -2

  • 8585, 1, -2

  • 8080, 2, -3

  • 7575, 4, -5

  • 6060, 1, -1

    Note:   Specify a PctValue only when Program = 'megablast'.

EntrezValue

String specifying Entrez query syntax to search a subset of the selected database.

    Tip   Use this property to limit searches based on molecule types, sequence lengths, organisms, and so on.

Output Arguments

RIDRequest ID for the NCBI BLAST report.
RTOE

Request Time Of Execution, which is an estimate of the time (in minutes) until completion.

    Tip   Use this time estimate with the 'WaitTime' property when using the getblast function.

Description

The Basic Local Alignment Search Tool (BLAST) offers a fast and powerful comparative analysis of protein and nucleotide sequences against known sequences in online databases.

blastncbi(Seq, Program) sends a BLAST request to NCBI against a Seq, a nucleotide or amino acid sequence, using Program, a specified BLAST program, and then returns a command window link to the NCBI BLAST report. For help in selecting an appropriate BLAST program, visit:

http://blast.ncbi.nlm.nih.gov/producttable.shtml

RID = blastncbi(Seq, Program) returns RID, the Request ID for the report.

[RID, RTOE] = blastncbi(Seq, Program) returns both RID, the Request ID for the NCBI BLAST report, and RTOE, the Request Time Of Execution, which is an estimate of the time until completion.

    Tip   Use RTOE with the 'WaitTime' property when using the getblast function.

... blastncbi(..., 'PropertyName', PropertyValue,...) calls blastncbi with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are explained below. Additional information on these optional properties can be found at:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml


... blastncbi(Seq, Program, ...'Database', DatabaseValue, ...)
specifies a database for the alignment search. Compatible databases depend on the type of sequence specified by Seq, and the program specified by Program.

Database choices for nucleotide sequences are:

  • 'nr' (default)

  • 'refseq_rna'

  • 'refseq_genomic'

  • 'est'

  • 'est_human'

  • 'est_mouse'

  • 'est_others'

  • 'gss'

  • 'htgs'

  • 'pat'

  • 'pdb'

  • 'month'

  • 'alu_repeats'

  • 'dbsts'

  • 'chromosome'

  • 'wgs'

  • 'env_nt'

Database choices for amino acid sequences are:

  • 'nr' (default)

  • 'refseq_protein'

  • 'swissprot'

  • 'pat'

  • 'month'

  • 'pdb'

  • 'env_nr'

For help in selecting an appropriate database, visit:

http://blast.ncbi.nlm.nih.gov/producttable.shtml

... blastncbi(Seq, Program, ...'Descriptions', DescriptionsValue, ...) specifies the number of short descriptions to include in the report, when you do not specify return values.

... blastncbi(Seq, Program, ...'Alignments', AlignmentsValue, ...) specifies the number of sequences for which high-scoring segment pairs (HSPs) are reported, when you do not specify return values.

... blastncbi(Seq, Program, ...'Filter', FilterValue, ...) specifies the filter to apply to the query sequence.

... blastncbi(Seq, Program, ...'Expect', ExpectValue, ...) specifies a statistical significance threshold for matches against database sequences. Choices are any real number. Default is 10. You can learn more about the statistics of local sequence comparison at:

http://blast.ncbi.nlm.nih.gov/tutorial/Altschul-1.html#head2

... blastncbi(Seq, Program, ...'Word', WordValue, ...) specifies a word size for the query sequence.

... blastncbi(Seq, Program, ...'Matrix', MatrixValue, ...) specifies the substitution matrix for amino acid sequences only. This matrix assigns the score for a possible alignment of two amino acid residues.

... blastncbi(Seq, Program, ...'GapOpen', GapOpenValue, ...) specifies the penalty for opening a gap in the alignment of amino acid sequences. Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapCosts Property by Matrix.

For more information about allowed gap penalties for various matrices, see:

http://blast.ncbi.nlm.nih.gov/html/sub_matrix.html

... blastncbi(Seq, Program, ...'ExtendGap', ExtendGapValue, ...) specifies the penalty for extending a gap greater than one space in the alignment of amino acid sequences. Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapCosts Property by Matrix.

... blastncbi(Seq, Program, ...'GapCosts', GapCostsValue, ...) specifies the penalty for opening and extending a gap in the alignment of amino acid sequences. GapCostsValue is a vector containing two integers: the first is the penalty for opening a gap, and the second is the penalty for extending the gap. Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapCosts Property by Matrix.

... blastncbi(Seq, Program, ...'Inclusion', InclusionValue, ...) specifies the statistical significance threshold for including a sequence in the Position-Specific Scoring Matrix (PSSM) created by PSI-BLAST for the subsequent iteration. Default is 0.005.

    Note:   Specify an InclusionValue only when Program = 'psiblast'.

... blastncbi(Seq, Program, ...'Pct', PctValue, ...) specifies the percent identity and the corresponding match and mismatch score for matching existing sequences in a public database. Default is 99.

    Note:   Specify a PctValue only when Program = 'megablast'.

... blastncbi(Seq, Program, ...'Entrez', EntrezValue, ...) specifies Entrez query syntax to search a subset of the selected database.

Choices for Optional Properties by BLAST Program

When BLAST program is...Then choices for the following properties are...
DatabaseFilterWordMatrixGapCostsPct
'blastn''nr' (default)
'est'
'est_human'
'est_mouse'
'est_others'
'gss'
'htgs'
'pat'
'pdb'
'month'
'alu_repeats'
'dbsts'
'chromosome'
'wgs'
'refseq_rna'
'refseq_genomic'
'env_nt'
'L' (default)
'R'
'm'
'lcase'
7
11 (default)
15
'megablast''L'11
12
16
20
24
28 (default)
32
48
64
None
99 (default)
98
95
90
85
80
75
60
'tblastn''L' (default)
'm'
'lcase'
2
3 (default)
'PAM30'
'PAM70'
'BLOSUM45'
'BLOSUM62' (default)
'BLOSUM80'
See the next table.
'tblastx''L' (default)
'R'
'm'
'lcase'
'blastp''nr' (default)
'swissprot'
'pat'
'pdb'
'month'
'refseq_protein'
'env_nr'
'L' (default)
'm'
'lcase'
'blastx'
'psiblast'

Choices for the GapCosts Property by Matrix

When substitution matrix is...Then choices for GapCosts are...
'PAM30'[7 2]
[6 2]
[5 2]
[10 1]
[9 1] (default)
[8 1]
'PAM70'[8 2]
[7 2]
[6 2]
[11 1]
[10 1] (default)
[9 1]
'BLOSUM80'
'BLOSUM45'[13 3]
[12 3]
[11 3]
[10 3]
[15 2] (default)
[14 2]
[13 2]
[12 2]
[19 1]
[18 1]
[17 1]
[16 1]
'BLOSUM62'[9 2]
[8 2]
[7 2]
[12 1]
[11 1] (default)
[10 1]

Examples

% Get a sequence from the Protein Data Bank and create
% a MATLAB structure.
S = getpdb('1CIV')

% Use the structure as input for a BLAST search with an
% expectation of 1e-10.
blastncbi(S,'blastp','expect',1e-10)

% Click the URL link (Link to NCBI BLAST Request) to go
% directly to the NCBI request.

% You can also perform a typical BLAST protein search directly
% with an accession number and an alternative scoring matrix.
RID = blastncbi('AAA59174','blastp','matrix','PAM70',...
                             'expect',1e-10)

% You can pass the RID to GETBLAST to parse the report and
% load it into a MATLAB structure.
Struct = getblast(RID)

References

[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

Was this topic helpful?