BioIndexedFile Class
Superclasses:
Allow quick and efficient access to large text file with nonuniform-size entries
Description
The BioIndexedFile
class allows access
to text files with nonuniform-size entries, such as sequences, annotations,
and cross-references to data sets. It lets you quickly and efficiently
access this data without loading the source file into memory.
This class lets you access individual entries or a subset of entries when the source file is too big to fit into memory. You can access entries using indices or keys. You can read and parse one or more entries using provided interpreters or a custom interpreter function.
Construction
returns a BioIFobj
= BioIndexedFile(Format
,SourceFile
)BioIndexedFile
object BioIFobj
that
indexes the contents of SourceFile
following the parsing rules
defined by Format
, where SourceFile
and
Format
specify the names of a text file and a file format,
respectively. It also constructs an auxiliary index file to store information that
allows efficient, direct access to SourceFile
. The index file by
default is stored in the same location as the source file and has the same name as the
source file, but with an IDX extension. The BioIndexedFile
constructor uses the index file to construct subsequent objects from
SourceFile
, which saves time.
returns
a BioIFobj
= BioIndexedFile(Format
,SourceFile
,IndexDir
)BioIndexedFile
object BioIFobj
by
specifying the relative or absolute path to a folder to use when searching
for or saving the index file.
returns
a BioIFobj
= BioIndexedFile(Format
,SourceFile
,IndexFile
)BioIndexedFile
object BioIFobj
by
specifying a file name, optionally including a relative or absolute
path, to use when searching for or saving the index file.
returns
a BioIFobj
= BioIndexedFile(___,Name,Value
)BioIndexedFile
object BioIFobj
by
using any input arguments from the previous syntaxes and additional
options, specified as one or more Name,Value
pair
arguments.
Input Arguments
|
Character vector or string specifying a file format. Choices are:
Note For all file formats, the file contents must only use ASCII text characters. Non-ASCII characters may not be properly indexed. |
|
Character vector or string specifying the name of a text file. It can include a relative or absolute path. |
|
Character vector or string specifying the relative or absolute path to a folder to use when searching for or saving the index file. |
|
Character vector or string specifying a file name, optionally including a relative or absolute path, to use when searching for or saving the index file. |
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
|
Specifies if you can access the object Tip Set the value to Default: |
|
Specifies whether the constructor stores the indices in the
auxiliary index file and accesses them via memory maps ( Tip If memory is not an issue and you want to maximize performance
when accessing entries in the object, set the value to
Default: |
|
Handle to a function that the When When |
|
Controls the display of the status of the object construction.
Choices are Default: |
Note
The following name-value pair arguments apply only when both of the following are true:
There is no pre-existing index file associated with your source file.
Your source file has a general-purpose format such as
'TABLE'
,'MRTAB'
, or'FLAT'
.
For source files with application-specific formats, the following name-value pairs are pre-defined and you cannot change them.
|
Positive integer specifying the column in the Default: |
|
Character vector or string that occurs in each entry before the key, for
Default: |
|
Character vector or string specifying a prefix that denotes header lines in the source file so
the constructor ignores them when creating the object. If the value
is Default: |
|
Character vector or string specifying a prefix that denotes comment lines in the source file
so the constructor ignores them when creating the object. If the
value is Default: |
|
Specifies whether entries are on contiguous lines, which means
they are not separated by empty lines or comment lines, in the source
file or not. Choices are Tip Set the value to Default: |
|
Character vector or string specifying a delimiter symbol to use as a column separator for
Default: |
|
Character vector or string specifying a delimiter symbol to use as an entry separator for
Default: |
Properties
|
File format of the source file This information is read only. Possible values are:
|
|
Whether or not the entries in the source file can be indexed by an alphanumeric key. This information is read only. |
|
Path and file name of the auxiliary index file. This information is read only. Use this property to confirm the name and location of the index file associated with the object. |
|
Path and file name of the source file. This information is read only. Use this property to confirm the name and location of the source file from which the object was constructed. |
|
Handle to a function used by the This interpreter function must accept a character vector of
one or more concatenated entries and return a structure or an array
of structures containing the interpreted data. Set this property when
your source file has a |
|
Whether the indices to the source file are stored in a memory-mapped file or in memory. |
|
Number of entries indexed by the object. This information is read only. |
Methods
getDictionary | Retrieve reference sequence names from SAM-formatted source file associated with BioIndexedFile object |
getEntryByIndex | Retrieve entries from source file associated with BioIndexedFile object using numeric index |
getEntryByKey | Retrieve entries from source file associated with BioIndexedFile object using alphanumeric key |
getIndexByKey | Retrieve indices from source file associated with BioIndexedFile object using alphanumeric key |
getKeys | Retrieve alphanumeric keys from source file associated with BioIndexedFile object |
getSubset | Create object containing subset of elements from BioIndexedFile object |
read | Read one or more entries from source file associated with BioIndexedFile object |
Copy Semantics
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB® Programming Fundamentals documentation.
Examples
See Also
memmapfile
| fastaread
| fastqread
| samread
| genbankread