# How to understand and apply arrayfun - getting intuition beyond the documentation.

5 views (last 30 days)

Show older comments

I've been trying to write a piece of code based on someone else's work for the same project, and I keep struggling with understanding how/ why their use of arrayfun is able to achieve the desired task.

I looked into the documentation and ran some of my own tests, but was still left with some confusion.

Basically, the data set being worked with is a 2 column matrix (arbitrary number of rows) that corresponds to a set of ordered pair data points. The goal is to find the mean of all the "y" (2nd column) measurements that correspond to the same "x" (1st column) value. So if theres 4 data points with an x value of 6, we'd want the mean of all their respective y values.

What my collaborator did was essentially as follows (I just used a random, simple data set here). Specifically, I'm confused about the L == x and how the unique(L) is being correlated to x. I understand the unique function overall I think, just not how it realtes to x here:

X = [1, 12; 1, 13; 1, 12; 2, 4; 2, 4; 2, 4; 3, 9; 3, 9; 3, 9];

L = X(:,1);

M = arrayfun(@(x) mean(X(L == x, 2)), unique(L));

##### 0 Comments

### Accepted Answer

Steven Lord
on 6 Jul 2022

Here's your original code:

X = [1, 12; 1, 13; 1, 12; 2, 4; 2, 4; 2, 4; 3, 9; 3, 9; 3, 9];

L = X(:,1);

U = unique(L); % Define this once, use it in multiple places

M = arrayfun(@(x) mean(X(L == x, 2)), U)

Another way to write this arrayfun call using a for loop would be:

M2 = zeros(numel(U), 1); % Preallocate the result

for k = 1:numel(U)

% Choose the rows where the first column of X is equal to a specific

% value from U, then extract the data in the second column of X for

% those rows

column2OfSelectedRows = X(L == U(k), 2);

% Take the mean of that data and store it in the appropriate row of M2

M2(k, 1) = mean(column2OfSelectedRows);

end

We can check that M and M2 are equal.

isequal(M, M2)

But there's another way to do this. What you want to do is use L as a grouping variable and summarize the rows in X that have the same value for that grouping variable. The groupsummary function will let you do this. Note that we're not using U or the unique function here; groupsummary will make one group for each unique value in the grouping variable.

groupingVariable = L;

dataVariable = X(:, 2);

% Apply the mean function to each subset of values in dataVariable whose

% corresponding elements in groupingVariable have the same value

M3 = groupsummary(dataVariable, groupingVariable, @mean)

Check:

isequal(M, M3)

##### 3 Comments

Jan
on 7 Jul 2022

### More Answers (1)

Jan
on 6 Jul 2022

Edited: Jan
on 6 Jul 2022

You can replace arrayfun by a loop:

X = [1, 12; 1, 13; 1, 12; 2, 4; 2, 4; 2, 4; 3, 9; 3, 9; 3, 9];

L = X(:,1);

M = arrayfun(@(x) mean(X(L == x, 2)), unique(L));

% As a loop:

uL = unique(L); % 2nd input of arrayfun

M = zeros(1, numel(uL)); % pre-allocate the output

for k = 1:numel(uL) % Loop over elements

M(k) = mean(X(L == uL(k), 2)); % Function to be applied

end

So arrayfun applies the function to each element of the 2nd argument. The result of the k.th argument is replied as k.th output.

A speed comparison:

X = randi([1, 255], 1e5, 2); % Test data

% Arrayfun:

tic

for rep = 1:20 % Loops just to improve the accuracy of the timings

L = X(:,1);

M = arrayfun(@(x) mean(X(L == x, 2)), unique(L));

end

toc

% Loop over unique values:

tic

for rep = 1:20

uL = unique(L);

M2 = zeros(numel(uL), 1);

for k = 1:numel(uL)

M2(k) = mean(X(L == uL(k), 2));

end

end

toc

arrayfun is usually slower than a simple loop. It is questionable if the more compact code is woth to use it here.

But there are faster methods:

% Loop over original values:

tic

for rep = 1:20

m = max(L);

s = zeros(m, 1); % Sum elements

n = zeros(m, 1); % Sum number of occurences

for k = 1:numel(L)

q = L(k);

s(q) = s(q) + X(k, 2);

n(q) = n(q) + 1;

end

M3 = s ./ n;

end

toc % 28 times faster than the ARRAYFUN method:

This is nicely fast, but ugly - C-style like somebody has rolled an angry armadillo over the keyboard.

% Accumarray

tic

for rep = 1:20

M4 = accumarray(X(:, 1), X(:, 2), [], @mean);

end

toc

Accumarray is slower than the loop here and as compact as arrayfun, but without its slowness.

tic % Another option:

for rep = 1:20

M5 = splitapply(@mean, X(:, 2), X(:, 1)); % Different order than in ACCUMARRAY

end

toc

Internally splitapply calls arrayfun , so it is expected to be slower.

The results are equal:

isequal(M, M2, M3, M4, M5)

By the way, replace @mean by: @(x) sum(x)/numel(x) to avoid the overhead of the mean() function. This can save 10% of the runtime.

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!