Two Loops versus One Combined Performance Time?

15 views (last 30 days)
Hello, I got a very big file that I need to process, and the current script works with for loops for each variable. For example:
for i = 1:10
% do some string mapping on two unsorted arrays.
end
for i = 1:10
% do some more string mapping on two unsorted arrays.
end
My question is would this be faster than the above?:
for i = 1:10
% do some string mapping on two unsorted arrays.
% do some more string mapping on two unsorted arrays.
end
Please don't say try it, and see it for yourself. I am looking for sort of a theoretical answer, not a practical one. I would have tested it if I could. Difference might be minimal, but I got 50 million iterations to run through, so yes anything is appreciated.
EDIT Changed do sth to do string mapping on an unsorted array.
  2 Comments
Guillaume
Guillaume on 11 Sep 2018
Doing something once is usually faster than doing it twice, so I don't really understand why you even ask the question.
Guillaume
Guillaume on 11 Sep 2018
Edited: Guillaume on 11 Sep 2018
This is a computer science principle answer. Yes, matlab does optimise loops. That doesn't remove them. So counting twice is always going to be slower than counting once regardless of what happen at each step of the count.

Sign in to comment.

Answers (2)

Steven Lord
Steven Lord on 11 Sep 2018
It depends a lot on what "do sth" and "do sth else" are.
Picture "do sth" as "if the file doesn't exist, create it" and "do sth else" as "if the file exists, delete it." In that case, the first approach creates the file once and deletes it once. The second approach creates and deletes the file ten times.
If on the other hand "do sth" were "x = 1:10;" and "do sth else" were "y = 1:10;", those are probably going to take the same (or a very similar) amount of time.
So the theoretical answer is "It depends." The practical answer is "Try it several times with smaller data sets and/or fewer iterations than 50 million, and use that information to explore how the specific operations you're performing behave."
  2 Comments
Birtan Derin
Birtan Derin on 11 Sep 2018
I understand your point, so I am gonna try to be more specific. I have about 30 for loops that I use to map a string array to a bigger string array. The variables are unsorted, so per my basic CS knowledge, indexing execution time should be linear, O(n).
Although it is easy to change number of iterations from 50 million to 1000, it is not easy at all to rewrite the code to combine all the 30 loops together, considering how much preprocessing I do before putting them inside loops.
Steven Lord
Steven Lord on 11 Sep 2018
There's still not enough information here, and short of posting your code I don't think you will be able to provide enough information to receive an answer more specific than "it depends".
Exactly what you mean by "map a string array to a bigger string array", how you've implemented it (which functions and/or objects you're using), whether you're using a cell array of char vectors or a string array, all these and many other details may be relevant to how your code performs.
That leaves out completely the question of whether one or more of your for loops is suitable to be converted into a parfor loop, which may make the multiple loops more attractive (if some but not all of the loops can be executed in parallel, putting all your computations in one loop could prevent it from being run in parallel using parfor.)
If you're looking for advice on how to measure the performance of your code and identify potential efficiency improvements, I recommend reading through this section in the documentation.

Sign in to comment.


OCDER
OCDER on 11 Sep 2018
"Please don't say try it, and see it for yourself. "
Try it and see for yourself.
"I am looking for sort of a theoretical answer, not a practical one."
Why? In theory, a single loop is faster than a double loop. In practice, theory is not always right.
"I would have tested it if I could."
Perhaps use a simpler test case...
a = rand(1, 1000000);
tic
for j = 1:length(a)
b = sin(a(j));
end
for j= 1:length(a)
c = cos(a(j));
end
toc %.050427 sec
tic
for j = 1:length(a)
b = sin(a(j));
c = cos(a(j));
end
toc %0.041288 sec
" there is an explanation based on computer science principles that I want to learn."
That's too hard to predict... You need to understand JIT compilers which is the execution engine behind Matlab. There is a small overhead for starting a new for loop.
tic
for j = 1:length(a)
for k = 1
end
end
toc %0.938168 sec
tic
for j = 1:length(a)
end
toc %0.023095 sec
Here's a good article to get started on code optimization.
  2 Comments
Birtan Derin
Birtan Derin on 11 Sep 2018
Although I appreciate your effort, the examples you provide are not relevant at all. In the last one you use nested loop which is a completely different thing that I don't even want to get into.
In the other one with sin and cos, the difference in execution time is so small that it is safe to say that it is not caused by algorithmic efficiency. I bet if you run the same code 3-4 times you will see what I mean.
And why would theoretical explanations be hard too predict? That goes against all sciences. And no, I do not need to understand how the MATLAB engine works, because the question is about algorithmic efficiency, not compiling efficiency.
To brighten your brain little bit, I've been doing research since I posted this question, and seems like there might be no difference at all theoretically, because indexing on an unsorted array is linear so the number of loops might not matter.
But my understanding of how Data Structures work is very basic so I can be wrong, that's why I asked this question here.
OCDER
OCDER on 11 Sep 2018
In science, theory and fact are different words. A synonym for theory is speculation, and as you said correctly "prediction". It is not guaranteed to work out in practice. The miracle drug should work in theory, but it failed (and the stock market prices drops 40%...)
"I bet if you run the same code 3-4 times you will see what I mean."
We have been encouraging YOU to do it. We don't know what you want to do exactly. Your question is not clear. You have 3 people, all answering, all not satisfactory. It means your question needs more details.
Perhaps spend more time showing more code about what you want to do, instead of side tracking us with the speed of 2 for loops.
"...do string mapping on an unsorted array."
Looks like you want a mapping function? If so, I think JAVA's hashmap would work. Matlab has a containers.Map, but seems like people are having speed issues.
For JAVA util hashmap tests, see this:

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!