Fast way to perform multiple searches on a large array

I have a large time series array (10,000,000 elements) :
ts = [2; 1; 3; 4; 6; 7; .......]
I have a corresponding time array (same size as the above) :
times = [d1; d2; d3; d4; d5.......]
I have 2 arrays of start times and end times (also large ~ 30000 elements):
st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]
I need to create a new matrix with many many finds. Logic is :
results = NaN(300, numel(st));
for i=1:numel(st);
temp = ts(find(times > st(i) & times < en(i) , 300,'first');
results(:,i) = temp;
end;
Is there any ay I do this faster (ideally without a loop) ?
  • I have a 64 bit version so I can try a large in-memory solution.
Many thanks in advance, Nigel

8 Comments

I assume "time" should be "times" inside the loop.
you are absolutely right - apologies for the typo.
st is also fairly large - 30000 or so records
Do the intervals [st(i):en(i)] overlap?
Just to confirm times, st and en are all sorted?
Yes they are sorted by st and en(i)-st(i) = 300 seconds

Sign in to comment.

 Accepted Answer

I think by dumping the past times you might be able to speed up the find. If st(i+1) > en(i), then you could dump even more elements, but I think the savings will be small. This code relies on times, st, and en being sorted.
results = NaN(300, numel(st));
offset = 0;
for i=1:numel(st);
idx = find(times > st(i), 1,'first');
offset = offset+idx-1;
times = times(idx:end);
results(:,i) = ts(0:299+idx+offset);
end

1 Comment

Hi Daniel,
I used a modified version of your solution. Indeed it is a LOT quicker to search over smaller sized arrays.
Thank you all for your help.
N.

Sign in to comment.

More Answers (2)

Never let an array grow in each iteration! Pre-allocate the output:
results = NaN(300, numel(st));
for i = 1:numel(st) % Not size(st), which is a vector!
temp = ts(find(times > st(i) & times < en(i), 300, 'first');
if length(temp) == 300
results(:, i) = temp;
else
results(1:length(temp), i) = temp;
end
end
results = results(~isnan(results));
If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.
Can you solve the problem by using HISTC?

6 Comments

Unfortunately this method will not improve much in terms of speed because my results are already pre-allocated. I used vertcat to simplify the illustrated example.
The histc will also not work because I have an additional search element in the find that I have not mentioned in the question.
I can guarantee that the length of temp will always be 300 and never less, so the if statement is also not needed.
As you guessed correctly both st and times are sorted.
Well, then this answer wasted your and my time. Please include the pre-allocation and all other relevant details, if you post a code-snippet and ask for a speed improvement. Otherwise the trials to optimize your code start from a very unrealistic point.
My apologies Jan. I do really appreciate you taking time to view this.
Since you say you know it will always have 300 entries, then this:
find(times > st(i) & times < en(i), 300, 'first');
May as well be this:
find(times > st(i), 300, 'first');
and since times and st are sorted
0:299+find(times > st(i), 1, 'first')
WOW by removing the < en(i)the processing time nearly halved !!

Sign in to comment.

Certainly taking away the < en(i) helped. I'm a little hesitant to implement the dumping the past times part because I need the data for something a little later on.
Just for my own learning I would really like to know how could I vectorise this operation such that I didn't need to do this in a loop.
Thank you all once again for taking the time to look at and respond to my question.
N.

2 Comments

Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):
idx = find(times(offset:end) > st(i), 1,'first');
Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.
I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.

Sign in to comment.

Categories

Find more on Operators and Elementary Operations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!