reading mixed format csv data with empty value '-'

2 views (last 30 days)
Jiali
Jiali on 18 Dec 2014
Edited: per isakson on 19 Dec 2014
The data is mixed formatted. I just list the float number parts to show you my problem.
1 1 100.3 -45000 -0.23 0.2555 600000
2 1 100.3 -45000 -0.23 0.2555 800000
3 1 100.3 -45000 -0.23 0.2555 800000
4 1 - - - - -
5 1 - - - - -
I can not delete the empty lines since I want to know their location. But when I use
textscan (fid,'%f %f %f %f %f %f %f),
I have trouble with class of every column. And If I use
'TreatAsEmpty', '-'
inside textscan, all the negative value will be read as wrong. Does anyone have any suggestions?

Answers (1)

per isakson
per isakson on 19 Dec 2014
Edited: per isakson on 19 Dec 2014
If the file together with the parsed result fits in memory try
>> out = cssm('cssm.txt')
out =
1.0e+05 *
0.0000 0.0000 0.0010 -0.4500 -0.0000 0.0000 6.0000
0.0000 0.0000 0.0010 -0.4500 -0.0000 0.0000 8.0000
0.0000 0.0000 0.0010 -0.4500 -0.0000 0.0000 8.0000
0.0000 0.0000 NaN NaN NaN NaN NaN
0.0001 0.0000 NaN NaN NaN NaN NaN
>>
where
function out = cssm( filespec )
str = fileread( filespec );
str = regexprep( str, '(?<=\s)\-(?=\s)', 'nan' );
out = cell2mat(textscan(str,'%f%f%f%f%f%f%f','CollectOutput',true ));
end
and where cssm.txt contains
1 1 100.3 -45000 -0.23 0.2555 600000
2 1 100.3 -45000 -0.23 0.2555 800000
3 1 100.3 -45000 -0.23 0.2555 800000
4 1 - - - - -
5 1 - - - - -
&nbsp
&nbsp
cssm.m above fails if the last character of the file is -. To fix that replace
str = regexprep( str, '(?<=\s)\-(?=\s)', 'nan' );
by
str = regexprep( str, '(?<=\s)\-(?=\s|$)', 'nan' );
to match the character, -, when it is the last character of the string.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!