bioinfo.pipeline.block.SeqFilter
Description
A SeqFilter
block enables you to filter sequences based on a
specified criterion.
Creation
Syntax
Description
creates
a b
= bioinfo.pipeline.block.SeqFilterSeqFilter
block.
also specifies additional b
= bioinfo.pipeline.block.SeqFilter(options
)options
.
specifies additional options as the property names and values of a b
= bioinfo.pipeline.block.SeqFilter(Name=Value
)SeqFilterOptions
object. This object is set as the value of the
Options
property of the block.
Note
The block always overwrites existing output files, unlike the seqfilter
function.
Input Arguments
options
— SeqFilter options
bioinfo.pipeline.options.SeqFilterOptions
SeqFilter options, specified as a SeqFilterOptions
object.
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Note
The following list of arguments is a partial list. For the complete list, refer to
the properties of
SeqFilterOptions
object.
Method
— Criterion to filter sequences
'MaxNumberLowQualityBases'
(default) | 'MaxPercentLowQualityBases'
| 'MeanQuality'
| 'MinLength'
Criterion to filter sequences, specified as one of the following options. Specify only one filtering criterion per function call.
'MaxNumberLowQualityBases'
– applies a maximum threshold on the number of low-quality bases allowed.'MaxPercentLowQualityBases'
– applies a maximum threshold on the percentage of low-quality bases allowed.'MeanQuality'
– applies a minimum threshold on the average base quality across each sequence.'MinLength'
– applies a minimum threshold on the sequence length.
Use this name-value pair argument together with 'Threshold'
to specify the appropriate threshold value. Depending on the filtering criterion, the corresponding value for 'Threshold'
can be a scalar or two-element vector. See the 'Threshold'
option for the default values. If you do not specify 'Threshold'
, then the function uses the default threshold value of the specified method. For each filtering criterion, the function uses the base quality encoding format specified by the 'Encoding'
name-value pair argument.
Threshold
— Threshold value for filtering criterion
scalar | vector
Threshold value for the filtering criterion, specified as a scalar or vector. Use this name-value pair to define the threshold value for the filtering criterion specified by 'Method'
.
Depending on the filtering criterion, the corresponding value for 'Threshold'
can be a scalar or two-element vector. If you do not specify 'Threshold'
, then the function uses the default threshold value of the corresponding method. For each filtering criterion, the function uses the encoding format of the base quality specified by the 'Encoding'
name-value pair argument.
'Method' | 'Threshold' | Default 'Threshold' value |
---|---|---|
'MaxNumberLowQualityBases' | Two-element vector [V1 V2] . V1 is a nonnegative integer that specifies the maximum number of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a number of low-quality bases greater than V1 is filtered out and not saved in the output file. | [0 10] |
'MaxPercentLowQualityBases' | Two-element vector [V1 V2] . V1 is a scalar between 0 and 100 that specifies the maximum percentage of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a percentage of low-quality bases greater than V1 is filtered out and not saved in the output file. | [0 10] |
'MeanQuality' | Positive scalar that specifies the minimum threshold on the average base quality across each sequence. Any sequence with average base quality less than this value is filtered out. | 0 |
'MinLength' | Nonnegative integer that specifies the minimum threshold on the sequence length allowed. Any sequence with length less than this value is filtered out. | 1 |
Properties
ErrorHandler
— Function to handle errors from run
method
function handle
Function to handle errors from the run
method of the block, specified as a function handle. The handle specifies the function to call
if the run method encounters an error within a pipeline. For the pipeline to continue after a
block fails, ErrorHandler
must return a structure that is compatible with
the output ports of the block. The error handling function is called with the following two inputs:
Structure with these fields:
Field Description identifier Identifier of the error that occurred message Text of the error message index Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension. Input structure passed to the
run
method when it fails
Data Types: function_handle
Inputs
— Input ports
structure
This property is read-only.
Input ports of the block, specified as a structure. The field
names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input
objects. These objects describe the input port behaviors.
The input port names are the expected field names of the input structure that you pass to the
block run
method.
The SeqFilter
block Inputs
structure has the
following field:
FASTQFiles
— Names of FASTQ-formatted files with sequence and quality information. This input is a required input that must be satisfied. The default value is abioinfo.pipeline.datatypes.Unset
object, which means that the input value is not set yet.
Data Types: struct
Outputs
— Output ports
structure
This property is read-only.
Output ports of the block, specified as a structure. The field
names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output
objects. These objects describe the output port behaviors.
The field names of the output structure returned by the block run
method
are the same as the output port names.
The SeqFilter
block Outputs
structure has the
following fields:
FilteredFASTQFiles
— Output file names. By default, the name of each output file consists of the input file name followed by the output suffix ('_filtered'
).Tip
To see the actual location of these files, first get the results of the block. Then use the
unwrap
method as shown in this example.NumFilteredIn
— Number of sequences selected from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order inNumFilteredIn
corresponds to the order of the input files.NumFilteredOut
— Number of sequences excluded from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order inNumFilteredOut
corresponds to the order of the input files.
Data Types: struct
Options
— SeqFilter
options
bioinfo.pipeline.options.SeqFilterOptions
object (default)
SeqFilter
options, specified as a SeqFilterOptions
object. The default value is a default
SeqFilterOptions
object.
Object Functions
compile | Perform block-specific additional checks and validations |
copy | Copy array of handle objects |
emptyInputs | Create input structure for use with run method |
eval | Evaluate block object |
run | Run block object |
Examples
Filter Out Low Quality Sequences Using SeqFilter
Block
Use a SeqFilter
block to filter out sequences with
low-quality bases, where a base is considered low-quality if its quality score is less
than 15 (default).
import bioinfo.pipeline.block.* import bioinfo.pipeline.Pipeline FC = FileChooser(which("SRR005164_1_50.fastq")); SF = SeqFilter; P = Pipeline; addBlock(P,[FC,SF]); connect(P,FC,SF,["Files","FASTQFiles"]); run(P); R = results(P,SF)
R = struct with fields: FilteredFASTQFiles: [1×1 bioinfo.pipeline.datatypes.File] NumFilteredIn: 3 NumFilteredOut: 47
Call unwrap
on FilteredFASTQFiles
to see the
location of the output file.
unwrap(R.FilteredFASTQFiles)
ans = "C:\PipelineResults\SeqFilter_1\1\SRR005164_1_50_filtered.fastq"
Create a Simple Pipeline to Plot Sequence Quality Data
Import the Pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
qcpipeline = Pipeline;
Select an input FASTQ file using a FileChooser
block.
fastqfile = FileChooser(which("SRR005164_1_50.fastq"));
Create a SeqFilter
block.
sequencefilter = SeqFilter;
Define the filtering threshold value. Specifically, filter out sequences with a total of more than 10 low-quality bases, where a base is considered a low-quality base if its quality score is less than 20.
sequencefilter.Options.Threshold = [10 20];
Add the blocks to the pipeline.
addBlock(qcpipeline,[fastqfile,sequencefilter]);
Connect the output of the first block to the input of the second block. To do so, you need to first check the input and output port names of the corresponding blocks.
View the Outputs
(port of the first block) and Inputs
(port of the second block).
fastqfile.Outputs
ans = struct with fields:
Files: [1×1 bioinfo.pipeline.Output]
sequencefilter.Inputs
ans = struct with fields:
FASTQFiles: [1×1 bioinfo.pipeline.Input]
Connect the Files
output port of the fastqfile
block to the FASTQFiles
port of sequencefilter
block.
connect(qcpipeline,fastqfile,sequencefilter,["Files","FASTQFiles"]);
Next, create a UserFunction
block that calls the seqqcplot
function to plot the quality data of the filtered sequence data. In this case, inputFile
is the required argument for the seqqcplot
function. The required argument name can be anything as long as it is a valid variable name.
qcplot = UserFunction("seqqcplot",RequiredArguments="inputFile",OutputArguments="figureHandle");
Alternatively, you can also use dot notation to set up your UserFunction
block.
qcplot = UserFunction; qcplot.RequiredArguments = "inputFile"; qcplot.Function = "seqqcplot"; qcplot.OutputArguments = "figureHandle";
Add the block.
addBlock(qcpipeline,qcplot);
Check the port names of sequencefilter
block and qcplot
block.
sequencefilter.Outputs
ans = struct with fields:
FilteredFASTQFiles: [1×1 bioinfo.pipeline.Output]
NumFilteredIn: [1×1 bioinfo.pipeline.Output]
NumFilteredOut: [1×1 bioinfo.pipeline.Output]
qcplot.Inputs
ans = struct with fields:
inputFile: [1×1 bioinfo.pipeline.Input]
Connect the FilteredFASTQFiles
port of the sequencefilter
block to the inputFile
port of the qcplot
block.
connect(qcpipeline,sequencefilter,qcplot,["FilteredFASTQFiles","inputFile"]);
Run the pipeline to plot the sequence quality data.
run(qcpipeline);
Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)