Main Content

One of the differences between tall arrays and in-memory MATLAB^{®} arrays is that tall arrays typically remain *unevaluated*
until you request that calculations be performed. (The exceptions to this rule include
plotting functions like `plot`

and `histogram`

and
some statistical fitting functions like `fitlm`

, which automatically
evaluate tall array inputs.) While a tall array is in an unevaluated state, MATLAB might not know its size, its data type, or the specific values it contains.
However, you can still use unevaluated arrays in your calculations as if the values were
known. This allows you to work quickly with large data sets instead of waiting for each
command to execute. For this reason, it is recommended that you use
`gather`

only when you require output.

MATLAB keeps track of all the operations you perform on unevaluated tall arrays as
you enter them. When you eventually call `gather`

to evaluate the queued
operations, MATLAB uses the history of unevaluated commands to optimize the calculation by
minimizing the number of passes through the data. Used properly, this optimization can save
huge amounts of execution time by eliminating unnecessary passes through large data
sets.

The display of unevaluated tall arrays varies depending on how much MATLAB knows about the array and its values. There are three pieces of information reflected in the display:

**Array size**— Unknown dimension sizes are represented by the variables`M`

or`N`

in the display. If no dimension sizes are known, then the size appears as`MxNx....`

.**Array data type**— If the array has an unknown underlying data type, then its type appears as`tall array`

. If the type is known, it is listed as, for example,`tall double array`

.**Array values**— If the array values are unknown, then they appear as`?`

. Known values are displayed.

MATLAB might know all, some, or none of these pieces of information about a given tall array, depending on the nature of the calculation.

For example, if the array has a known data type but unknown size and values, then the unevaluated tall array might look like this:

M×N×... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

If the type and relative size are known, then the display could be:

1×N tall char array ? ? ? ...

If some of the data is known, then MATLAB displays the known values:

100×3 tall double matrix 0.8147 0.1622 0.6443 0.9058 0.7943 0.3786 0.1270 0.3112 0.8116 0.9134 0.5285 0.5328 0.6324 0.1656 0.3507 0.0975 0.6020 0.9390 0.2785 0.2630 0.8759 0.5469 0.6541 0.5502 : : : : : :

`gather`

The `gather`

function is
used to evaluate tall arrays. `gather`

accepts
tall arrays as inputs and returns in-memory arrays as outputs. For
this reason, you can think of this function as a bridge between tall
arrays and in-memory arrays. For example, you cannot control `if`

or `while`

loop
statements using a tall logical array, but once the array is evaluated
with `gather`

it becomes an in-memory logical value
that you can use in these contexts.

`gather`

performs all queued operations on
a tall array and returns the *entire* result in
memory. Since `gather`

returns results as in-memory MATLAB arrays,
standard memory considerations apply. MATLAB might run out of
memory if the result returned by `gather`

is too
large.

Most of the time you can use `gather`

to see the entire result of a
calculation, particularly if the calculation includes a reduction operation such as
`sum`

or `mean`

. However, if the result is too large
to fit in memory, then you can use `gather(head(X))`

or
`gather(tail(X))`

to perform the calculation and look at only the first
or last few rows of the result.

`gather`

If you enter an erroneous command and `gather`

fails
to evaluate a tall array variable, then you must delete the variable
from your workspace and recreate the tall array using *only* valid
commands. This is because MATLAB keeps track of all the operations
you perform on unevaluated tall arrays as you enter them. The only
way to make MATLAB “forget” about an erroneous
statement is to reconstruct the tall array from scratch.

This example shows what an unevaluated tall array looks like, and how to evaluate the array.

Create a datastore for the data set `airlinesmall.csv`

.
Convert the datastore into a tall table and then calculate the size.

varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'}; ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ... 'SelectedVariableNames', varnames); tt = tall(ds)

tt = M×4 tall table ArrDelay DepDelay Origin Dest ________ ________ ______ _____ 8 12 'LAX' 'SJC' 8 1 'SJC' 'BUR' 21 20 'SAN' 'SMF' 13 12 'BUR' 'SJC' 4 -1 'SMF' 'LAX' 59 63 'LAX' 'SJC' 3 -2 'SAN' 'SFO' 11 -1 'SEA' 'LAX' : : : : : : : :

s = size(tt)

s = 1×2 tall double row vector ? ? Preview deferred. Learn more.

Calculating the size of a tall array returns a small answer
(a 1-by-2 vector), but the display indicates that an entire pass through
the data is still required to calculate the size of `tt`

.

Use the `gather`

function to fully evaluate
the tall array and bring the results into memory. As the command executes,
there is a dynamic progress display in the command window that is
particularly helpful with long calculations.

**Note**

Always ensure that the result returned by `gather`

will
be able to fit in memory. If you use `gather`

directly
on a tall array without reducing its size using a function such as `mean`

,
then MATLAB might run out of memory.

tableSize = gather(s)

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.42 sec Evaluation completed in 0.48 sec tableSize = 123523 4

This example shows how several calculations can be combined to minimize the total number of passes through the data.

Create a datastore for the data set `airlinesmall.csv`

.
Convert the datastore into a tall table.

varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'}; ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ... 'SelectedVariableNames', varnames); tt = tall(ds)

tt = M×4 tall table ArrDelay DepDelay Origin Dest ________ ________ ______ _____ 8 12 'LAX' 'SJC' 8 1 'SJC' 'BUR' 21 20 'SAN' 'SMF' 13 12 'BUR' 'SJC' 4 -1 'SMF' 'LAX' 59 63 'LAX' 'SJC' 3 -2 'SAN' 'SFO' 11 -1 'SEA' 'LAX' : : : : : : : :

Subtract the mean value of `DepDelay`

from `ArrDelay`

to
create a new variable `AdjArrDelay`

. Then calculate
the mean value of `AdjArrDelay`

and subtract this
mean value from `AdjArrDelay`

. If these calculations
were all evaluated separately, then MATLAB would require four
passes through the data.

AdjArrDelay = tt.ArrDelay - mean(tt.DepDelay,'omitnan'); AdjArrDelay = AdjArrDelay - mean(AdjArrDelay,'omitnan')

AdjArrDelay = M×1 tall double column vector ? ? ? : : Preview deferred. Learn more.

Evaluate `AdjArrDelay`

and view the first few
rows. Because some calculations can be combined, only three passes
through the data are required.

gather(head(AdjArrDelay))

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 3: Completed in 0.4 sec - Pass 2 of 3: Completed in 0.39 sec - Pass 3 of 3: Completed in 0.23 sec Evaluation completed in 1.2 sec ans = 0.8799 0.8799 13.8799 5.8799 -3.1201 51.8799 -4.1201 3.8799

Tall arrays remain unevaluated until you request output using

`gather`

.Use

`gather`

in most cases to evaluate tall array calculations. If you believe the result of the calculations might not fit in memory, then use`gather(head(X))`

or`gather(tail(X))`

instead.Work primarily with unevaluated tall arrays and request output only when necessary. The more queued calculations there are that are unevaluated, the more optimization MATLAB can do to minimize the number of passes through the data.

If you enter an erroneous tall array command and

`gather`

fails to evaluate a tall array variable, then you must delete the variable from your workspace and recreate the tall array using*only*valid commands.