The str2double function is taking too long?

The built-in function "str2double" is very time consuming when I want to convert an array of string type to an array of numertic type, especially when I have an array of string type with many elements(46259*503 size). Is there any way to improve the performance?
my os: win10
matlab 2021a
当我准备把一个string类型的数组(46259*503 大小)转换为numertic类型数组时候,此内置函数“str2double”非常耗时,特别是我的string类型数组较多元素的时候。请问有什么办法可以提高性能?

4 Comments

Why and how you have an array of number of dimension 46259*503 into a string?
Joel Lynch
Joel Lynch on 10 Jun 2021
Edited: Joel Lynch on 10 Jun 2021
If you are actually extracting that much data, then if it is formatted in a regular pattern, you should be able to speed up reading the data using sscanf(), which works because time is not spent interpreting the strings
At present, a better implementation should be as follows.
IS = "_040825_1735_IS.log";
lines = strip(readlines(IS));
lines = lines(strlength(lines)>0);
lg = startsWith(lines,"%");
data = split(lines(~lg)); % 46259*503 size , string array
myNumerticData = double(data);
@KSSV Because I need to import experimental datasets _040825_1735_IS.log linked to the source "http://eia.udg.es/~dribas/" for my research, the data is log files recorded by the instrument,the pure data has a size of 46259*503 size.
@KSSV@Walter Roberson thanks very much!
T3 = readmatrix('_040825_1735_IS.log', 'delimiter',' ');
That gives 46264 rows, 502 variables, everything already numeric.

Sign in to comment.

 Accepted Answer

There have been some test results posted showing that double() of a string() object is even faster than str2double()
format long g
S = compose("%.16g", randn(1000,50));
S(1:3,1:3)
ans = 3×3 string array
"0.6965617957186385" "0.7061472333823291" "-0.7023246730823328" "0.02816411095732173" "1.507324316664719" "-1.236968837728482" "-0.834231469338338" "-0.8834500860277891" "-1.431844364400984"
time_for_double = timeit(@()double(S), 0)
time_for_double =
0.020335829
time_for_str2double = timeit(@()str2double(S), 0)
time_for_str2double =
0.574124829
time_for_sscanf = timeit(@()arrayfun(@(V)sscanf(V, '%f'),S))
time_for_sscanf =
3.451887829
t1 = tic;
arrayfun(@str2double,S);
time_for_str2double = toc(t1)
time_for_str2double =
5.699698

4 Comments

Why on earth would you put poor SSCANF inside ARRAYFUN, unless you intentionally want to slow it down?
Here is much more efficient use of SSCANF, with conversion speed on the same order as DOUBLE(S):
M = sscanf(sprintf(' %s',S.'), '%f', [503,Inf]).'
This is faster than your incorrectly named TIME_FOR_SSCANF (which should be named TIME_FOR_ARRAYFUN, as this is what 99.8% of its time is measuring).
Why? Because sscanf() does not operate on string arrays. I temporarily had a more complicated arrangement with nested calls and cell arrays, but realized that I was operating on the wrong datatype and simplified the code drastically.
Stephen23
Stephen23 on 10 Jun 2021
Edited: Stephen23 on 10 Jun 2021
"Because sscanf() does not operate on string arrays"
True, but using ARRAYFUN is an inefficient workaround.
The variable name is misleading, because 99.8% of that time is ARRAYFUN.
format long g
S = compose("%.16g", randn(1000,50));
S(1:3,1:3)
ans = 3×3 string array
"-0.87290576061866" "0.04846286053980744" "-2.710234520504661" "-1.026614391407934" "-0.0983918203860604" "-0.1155795524871058" "1.340938807862134" "0.2622135116577263" "0.7506333968994494"
t = tic;
double(S);
time_for_double = toc(t)
time_for_double =
0.029172
t0 = tic;
str2double(S);
time_for_str2double = toc(t0)
time_for_str2double =
0.624089
t1 = tic;
arrayfun(@(V) sscanf(V, '%f'),S);
time_for_sscanf = toc(t1)
time_for_sscanf =
3.423017
t2 = tic;
arrayfun(@(V) 1, S);
time_for_arrayfun = toc(t2)
time_for_arrayfun =
2.218439
t3 = tic;
arrayfun(@str2double,S);
time_for_str2double = toc(t3)
time_for_str2double =
5.802806
t4 = tic;
sscanf(sprintf(' %s',S.'), '%f', [size(S,2),Inf]).';
time_for_stephen_sscanf = toc(t4)
time_for_stephen_sscanf =
0.068427
t5 = tic;
reshape(str2num(strjoin(S)),size(S));
time_for_str2num = toc(t5)
time_for_str2num =
0.06454
The last of those is marginally better than your sscanf/sprintf approach... on this run.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!