Fast way to perform multiple searches on a large array
Show older comments
I have a large time series array (10,000,000 elements) :
ts = [2; 1; 3; 4; 6; 7; .......]
I have a corresponding time array (same size as the above) :
times = [d1; d2; d3; d4; d5.......]
I have 2 arrays of start times and end times (also large ~ 30000 elements):
st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]
I need to create a new matrix with many many finds. Logic is :
results = NaN(300, numel(st));
for i=1:numel(st);
temp = ts(find(times > st(i) & times < en(i) , 300,'first');
results(:,i) = temp;
end;
Is there any ay I do this faster (ideally without a loop) ?
- I have a 64 bit version so I can try a large in-memory solution.
Many thanks in advance, Nigel
8 Comments
Jan
on 4 Oct 2011
I assume "time" should be "times" inside the loop.
Nigel
on 4 Oct 2011
Andrei Bobrov
on 4 Oct 2011
What size of 'st'?
Nigel
on 4 Oct 2011
Jan
on 4 Oct 2011
Do the intervals [st(i):en(i)] overlap?
Nigel
on 4 Oct 2011
Daniel Shub
on 4 Oct 2011
Just to confirm times, st and en are all sorted?
Nigel
on 4 Oct 2011
Accepted Answer
More Answers (2)
Jan
on 4 Oct 2011
Never let an array grow in each iteration! Pre-allocate the output:
results = NaN(300, numel(st));
for i = 1:numel(st) % Not size(st), which is a vector!
temp = ts(find(times > st(i) & times < en(i), 300, 'first');
if length(temp) == 300
results(:, i) = temp;
else
results(1:length(temp), i) = temp;
end
end
results = results(~isnan(results));
If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.
Can you solve the problem by using HISTC?
6 Comments
Nigel
on 4 Oct 2011
Jan
on 4 Oct 2011
Well, then this answer wasted your and my time. Please include the pre-allocation and all other relevant details, if you post a code-snippet and ask for a speed improvement. Otherwise the trials to optimize your code start from a very unrealistic point.
Nigel
on 4 Oct 2011
Teja Muppirala
on 4 Oct 2011
Since you say you know it will always have 300 entries, then this:
find(times > st(i) & times < en(i), 300, 'first');
May as well be this:
find(times > st(i), 300, 'first');
Daniel Shub
on 4 Oct 2011
and since times and st are sorted
0:299+find(times > st(i), 1, 'first')
Nigel
on 4 Oct 2011
Nigel
on 4 Oct 2011
0 votes
2 Comments
Bjorn Gustavsson
on 10 Oct 2011
Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):
idx = find(times(offset:end) > st(i), 1,'first');
Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.
Daniel Shub
on 10 Oct 2011
I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.
Categories
Find more on Operators and Elementary Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!