Matlab text file opts varying for similar files
1 view (last 30 days)
Show older comments
Hi, I have 2 text files with the same amount of columns/headers, when a measurement is not completed it fills in the field with an "UND" - which can be "UND. -60001" or "UND. -62011". I have a script which usually has no problems but when it does it has been very difficult to pin down the cause, I have noticed by reading the opts that it is treating the two files differently, mfile and 2 data files attached.I don't see why the files should be treated any differently, any ideas?
The file that reads in ok has this in its 'opts'.
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'ISO-8859-1'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Nozzle_number', 'Frequency_khz', 'Velocity_ms' ... and 4 more}
VariableTypes: {'char', 'double', 'double' ... and 4 more}
SelectedVariableNames: {'Nozzle_number', 'Frequency_khz', 'Velocity_ms' ... and 4 more}
VariableOptions: Show all 7 VariableOptions
Whereas the file which does not load properly has this in its opts
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t' ' '}
Whitespace: '\b'
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'join'
LeadingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'ISO-8859-1'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'char', 'double', 'char' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: Show all 9 VariableOptions
0 Comments
Accepted Answer
dpb
on 18 Jun 2018
Edited: dpb
on 18 Jun 2018
The difference is that the second file has the UND indicator in the first data line whereas the first file has a completed record. It is that record that the options routine uses to try to parse the file and so for that file there are what appear to be nine variables in the data record but there are only six column names. That mismatch creates confusion.
In this case I would suggest to not call detectImportOptions(files(jj).name) but to use a specific hand-built options object for these files or dispense with it entirely and pass everything needed as named parameter pairs in the readtable call.
ADDENDUM
After looking at your files, I think I'd go at this somewhat differently; I'd just let readtable bring in the file as cell array, do the substitution on the bad data and convert. Is it likely there's ever a file that doesn't have at least one UND in the numeric data fields?
I don't know just what your other code after reading a file does, but I'd so that portion more nearly as:
d=dir('/Users/imagexpertinc/Desktop/odds/freq_sweeps/*.txt');
for i=1:length(d)
t=readtable(d(i).name,opts); % table as cellstr variables
v=cellfun(@str2num,regexprep(table2cell(t(:,3:end)),'UND.*','NaN')); % convert the UND to NaN on cell array of all variables, convert to doubles
for j=1:5 % put back into existing table
t.(j+2)=v(:,j);
end
...
% Now do what needs done with this table here before going on to next...
end
The opts table was created from an artificial RECORD.txt file that looks like a single record:
Nozzle_number Frequency_khz Velocity_ms Volume_pl Trajectory_deg X_coordinate_mm Y_coordinate_mm
- 4 UND. -60001 UND. -62011 UND. -60001 UND. -2011 UND. -2011
so the variables would all be recognized and imported as text; this makes the conversion performed the same on every column for every file whereas if there were a given file in which a specific variable was ok for every observation, by default that would be imported as numeric and logic would have to be written to handle it.
Unless, of course, the substituted missing value itself has significance for some reason; then would need to convert it, but your solution seems to not discern that difference, either.
9 Comments
dpb
on 19 Jun 2018
Edited: dpb
on 19 Jun 2018
Hmmm...you could achieve the same effect more easily by creating and using a fixed import options object excepting using the alternate variable encoding.
Turns out that it appears (somewhat to my surprise) that that actually works cleanly; while creating the opts file on the fly like that is excessively complex, using the opts file with all variables defined to be numeric except for the first actually seems to work to replace the non-convertible fields with NaN. If this proves true on the various previous problem files, it's by far the better implementation.
Some testing shows that it does, however, take a full-blown 'opts' object to set all the myriad of options; trying to use just a minimal number of named parameters fails miserably. Possibly one could eventually figure out how to set enough parameters to make that work but I'm not absolutely positive one has sufficient control that way and it is surely far more effort that just munging a little on the self-derived object.
More Answers (0)
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!