different number of delimiters error using readtable function

21 views (last 30 days)
I use the following code for reading text file:
fileID = fopen(full_file_name);
fclose(fileID);
tCOD=readtable(full_file_name,'FileType','text', ...
'headerlines',25,'readvariablenames',0,'MultipleDelimsAsOne', true);
The above codes work for most of the text files I read. I attached one of them (data_without_problem.txt). But some text files, I receive the following error:
Error using readtable (line 216)
Reading failed at line 121. All lines of a text file must have the same number of delimiters. Line 121 has 6 delimiters, while
preceding lines have 5.
Note: readtable detected the following parameters:
'Delimiter', '\t ', 'MultipleDelimsAsOne', true, 'Format', '%q%f%f%f%f%f'
I attached this kind of text file (data_with_problem.txt).
How I can modify the above readtable function for working with text files that different number of delimiters in all lines?
My Matlab version is 2019a.
  2 Comments
dpb
dpb on 15 Oct 2021
#dP2019 9 7 0 0 0.00000000 576 u+U IGS14 FIT GFZ
## 2069 518400.00000000 300.00000000 58733 0.0000000000000
+ 95 C01C02C03C04C05C06C07C08C09C10C11C12C13C14C16E01E02
+ E03E04E05E07E08E09E11E12E13E14E15E18E19E21E24E25E26
+ E27E30E31E33E36G01G02G03G04G05G06G07G08G09G10G11G12
+ G13G14G15G16G17G18G19G20G21G22G23G24G25G26G27G28G29
+ G30G31G32J02J03J07R01R02R03R05R07R08R09R11R12R13R14
+ R15R16R17R18R19R20R21R22R23R24 00 00 00 00 00 00 00
++ 10 10 10 10 10 6 8 6 6 8 10 8 8 6 6 6 6
++ 6 6 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6
++ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
++ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
++ 6 6 6 6 6 8 10 6 8 8 8 8 6 6 6 6 6
++ 6 6 6 6 8 8 6 6 6 6 0 0 0 0 0 0 0
%c M cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%c cc cc ccc ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%f 1.2500000 1.025000000 0.00000000000 0.000000000000000
%f 0.0000000 0.000000000 0.00000000000 0.000000000000000
%i 0 0 0 0 0 0 0 0 0
%i 0 0 0 0 0 0 0 0 0
/* PCV:IGS14_2062 OL/AL:FES2004 NONE YN CLK:CoN ORB:CoN
/* GeoForschungsZentrum Potsdam
/*
/*
* 2019 9 7 0 0 0.00000000
PC01 -32247.666769 27128.253711 852.734449 -142.854736
PC02 4291.889841 41959.462941 -227.014442 886.812431
PC03 -14756.270737 39468.969011 529.367042 -15.526662
PC04 -39608.430012 14398.971601 684.035369 -18.784543
...
is the beginning of the so-called "problem" file -- what do expect to be able to read from it?
It clearly has header information and different kinds of data in it; a "one size fits all" solution is unlikely to be possible unless you can just skip the header and read the regular data after the header information.
sermet OGUTCU
sermet OGUTCU on 15 Oct 2021
Edited: sermet OGUTCU on 15 Oct 2021
The problem isn't related to the header part. The problem is related to the following parts from the data_with_problem.txt:
* 2019 9 8 0 0 0.00000000
PC01 -32239.736154 27137.541640 844.727407 -138.707032 P P
PC02 4294.572473 41959.818862 -211.999776 884.760306 P P
PC03 -14769.274349 39464.569794 538.154451 -7.386696 P P
PC04 -39609.638005 14394.049565 685.546634 -20.124467 P P
PC05 21849.957336 36044.780362 -426.557427 34.985740 P P
PC06 -13524.102154 21018.274072 -33636.478150 425.878264 P P
PC07 -22156.702195 33972.389053 10694.698624 -115.964428 P P
PC08 -7463.496078 34069.209662 23990.306083 -2.364877 P P
PC09 2195.129733 26020.780507 -32809.344188 -49.805776 P P
PC10 -10965.396049 34662.341002 21172.021893 -155.406014 P P
PC11 3987.733082 19949.125865 19183.812893 125.613081 P P
PC12 -19966.340237 149.143914 19517.047857 -242.357648 P P
P P parts make the problem when using readtable. The data_without_problem.txt doesn't include the P P parts and readtable works without any problem.

Sign in to comment.

Accepted Answer

dpb
dpb on 15 Oct 2021
Edited: dpb on 16 Oct 2021
Use import options object -- although to write a fully generic import code you'll have to scan the file to find the number of header lines for each file as detectImportOptions isn't clever enough to know what you intend about the header data on its own...
I used the explicit number of header lines here
optW=detectImportOptions('data_with_problem.txt','NumHeaderLines',25,"CommentStyle",'*','ReadVariableNames',0);
optW.MissingRule='omitrow';
optW.SelectedVariableNames=opt.SelectedVariableNames(1:5);
tDW=readtable('data_with_problem.txt',optW);
This produces a file whos head and tail look like--
>> [head(tDW);tail(tDW)]
ans =
16×5 table
Var1 Var2 Var3 Var4 Var5
________ _______ _______ _______ _______
{'PC01'} -32248 27128 852.73 -142.85
{'PC02'} 4291.9 41959 -227.01 886.81
{'PC03'} -14756 39469 529.37 -15.527
{'PC04'} -39608 14399 684.04 -18.785
{'PC05'} 21860 36038 -443.83 39.711
{'PC06'} -13165 21105 -33718 427.66
{'PC07'} -22236 33734 11284 -113.86
{'PC08'} -7668.8 34363 23503 -2.0925
{'PR17'} -10797 3515.3 22840 258.4
{'PR18'} 2372.7 16249 19510 6.6669
{'PR19'} 13760 20695 5772.9 -52.585
{'PR20'} 17905 12385 -13261 -389.79
{'PR21'} 10304 -4036.6 -22990 -70.999
{'PR22'} -4738.6 -17784 -17720 -36.732
{'PR23'} -16617 -19348 7.969 252.43
{'PR24'} -17845 -10556 14834 -184.49
>>
>> whos tDW
Name Size Bytes Class Attributes
tDW 380x5 56568 table
>>
The same logic will work for the files without the trailing 'P' in the records; the key is to tell it to only import the field name and the four numeric variables.
That assumes you don't need those based on your above description. If you do need them, then use
optW.ExtraColumnsRule='addvars';
and don't limit the number of SelectedVariables size.
With the variable number of header lines determined externally first, the above will work for either file; you'll note I used the 'CommentStyle','*' to get rid of the date stamp rows; if you want to keep those to parse them separately, then remove that. By using it, readtable is not flexible enough to have more than one comment character so I used the 'omitrow' for 'Missing' to eliminate the last EOF record. If you keep the commented time fields, then you could set the comment character to 'E' for that purpose instead.
ADDENDUM:
A little routine to return the number of header lines could look something like --
function nHdr=getNumHeaderLines(file)
fid=fopen(file);
nHdr=1;
while ~startsWith(fgetl(fid),'* ')
nHdr=nHdr+1;
end
fid=fclose(fid);
end
The above logic at the command for the problem data file returns--
>> fid=fopen('data_with_problem.txt');
>> nHdr=1;
>> while ~startsWith(fgetl(fid),'* '),nHdr=nHdr+1;end
>> nHdr
nHdr =
25
>> fid=fclose(fid);
to illustrate it returns the value you want/need...

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!