Correcting date strings in cell arrays

I have a CSV file containing some financial market data.
The CSV file data looks like the following
1763753202,D,EUR/USD,2011-11-06 17:00:12.500000000,1.382700,1.382900
I am using textscan to read this data in as follows (to get the last three fields)
data = textscan(fid,'%*u %*s %*s %s %f %f','Delimiter',',','HeaderLines',1);
This gives me the timestamp as the first cell in the cell array. My problem is that I need to ensure that the data format is consistent so that I can convert them to serial dates. The code that wrote the CSV files appears to have a fault in it when the timestamp is exact to the second. This means that instead of getting
'2011-11-06 17:21:34.000000000'
I see
'2011-11-06 17:21:34'
Is it possible to fix this in a vector type statement or do I have to loop over all stamps and fix them individually? (speed is of essence as there is lots of data)

1 Comment

I see, that somebody has added the tag "for loop" - this might be the problem. Using a FOR loop would slow down the processing remarkably in this case!

Sign in to comment.

Answers (1)

C = {'2011-11-06 17:21:34.000000000'; ...
'2011-11-07 17:21:34'; ...
'2011-11-08 17:21:34.010000000'};
This can be directly converted:
Num = datenum(C);
So what is exactly the problem? You can create the date vectors manually by:
Str = sprintf('%s ', C{:});
M = sscanf(Str, '%d-%d-%d %d:%d:%g ');
M = transpose(reshape(M, 6, []));
Num = datenum(M); % Better: datenummx(M)
But I do not see a big advantage compared to using datenum directly.
Matlab's date functions are powerful and inconsequence they are slow. A dedicated M- or Mex-file can be much faster. E.g. datenummx(M) will save some time already, because it calls the underlying Mex file directly.
[EDITED]: Another method is appending the fractional seconds manually:
index = (cellfun('length', C) == 19);
C(index) = strcat(C(index), '.0');
But it wastes time to modify the original string list.
Some timings:
C = cell(1, 1000);
for i = 1:1000
C{i} = datestr(now, 31);
end
C(1:10:end) = strcat(C(1:10:end), '.000');
tic;
for i = 1:10
Num = datenum(C);
end
toc % 3.62 sec
tic;
for i = 1:10
Str = sprintf('%s ', C{:});
M = sscanf(Str, '%d-%d-%d %d:%d:%g ');
Num = datenum(transpose(reshape(M, 6, [])));
end
toc % 0.038 sec
If speed matters or you want to reduce your CO2 production, use simple string conversions instead of the really smart datestr function or the automatic internal conversion.

1 Comment

I guess you answered my question and showed my inexperience. I was using a conversion template as follows
>> timeStamp = datenum('2011-11-06 17:21:34','yyyy-mm-dd HH:MM:SS.FFF');
??? Error using ==> datenum at 178
DATENUM failed.
Caused by:
Error using ==> dtstr2dtnummx
Failed on converting date string to date number.
>> timeStamp = datenum('2011-11-06 17:21:34.000000000','yyyy-mm-dd HH:MM:SS.FFF');
>> timeStamp
timeStamp =
7.3481e+005
Clearly it seems bright enough to figure tings out on its own. Not sure how though.....
Many thanks

Sign in to comment.

Categories

Asked:

on 8 Jan 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!