readtable is ignoring import options to get variable names

Am I being stupid or is this function not logical?
I want to import a csv file. There are 3 header lines. The actual variable names are on line 3. The units are on line 2. Line 1 is to be ignored.
So if I just opts = detectImportOptions and then set opts.VariableNamesLine=3, opts.VariableUnitsLine=2. It picks up the latter and ignores completely the former and just uses the original variablenames it picked up on Line 1.
If I detectImportOptions(file,'NumHeaderLines',1) it then picks up the units line as the names.
If I do it again and tell it to skip 2 lines, it picks the right names. I can then set opts.VariableUnitsLine to 2 and it does go back and pick the units correctly.
So I do get what I want in the end. But the function doesn't seem to work as expected? i.e. create the options and then modify the line options. Seems like whatever it initially picks up first as the names gets set in stone and you can't do anything about it (except the sorta hacky way I just worked out).

5 Comments

cannot do this. it is confidential.
Even the first 3-4 lines? I'll see if I can reproduce your problem from scratch.
You could anonymize the data or create a working example that has the same structure but fake data that produces the same behavior as your current file. Providing the relevant code would also be helpful.
I have not been able to solve it using readtable, but I was able to reproduce the problem easily using the attached textfile. So if anyone else wants to give it a try...

Sign in to comment.

 Accepted Answer

This works with Jonas's testfile.txt. You need to specify that there are 3 header lines when you call detectImportOptions().
opts = detectImportOptions(filename, 'NumHeaderLines', 3);
opts.VariableNamesLine = 3;
opts.VariableUnitsLine = 2;
opts.VariableNames
c = readtable(filename, opts);

6 Comments

But does the 2nd and 3rd lines have any effect at all here? It seems Alex wants to pick up the units as well.
Sure. The 2nd line are the units which are stored in the readtable output 'c'.
c.Properties.VariableUnits
The 3rd line are the headers which are stated in the table and can be found in
c.Properties.VariableNames
Hi all,
I had worked out the solution before I posted. I just wanted to know if that was how the function was designed to work. It just doesn't seem straight to me.
But yes, asking the function to ignore the first two lines with 'NumHeaderLines'= 2 makes the detectImportOptions pick up the right line for the names. I then set the opts.VariableUnitsLine to the correct line et viola it works.
I tried telling it that there were 3 header lines (as there are) and it ends up choosing the first line of data for the variable names, which is even less useful. And of course I cannot set opt.VariableNameLine =3 because that gets ignored.
This is what I now use:
opts = detectImportOptions(useP{k},'NumHeaderLines',2);
opts.VariableUnitsLine = 2;
opts = setvaropts(opts,'Exhaust_Flow_1','FillValue',0);
The last one is just to stop it putting NaN when filling with 0 makes it easier down the line to concatenate things
That sounds kludgy. If your data are formatted as you described and match the format of the testfile.txt, the solution above should work. Maybe you're using the 'TreatAsEmpty' or 'ReadVariableNames' parameters in readtable() which may interfere with the opts input. Just be vigilant in applying this to other files.
Yes, possibly because I notice that in the first line, the first two columns are actually empty. Well, they have [], [], but no actual text or numbers. Perhaps this is the reason.
Either way, all files are the same and I don't need to use on a different type of file.

Sign in to comment.

More Answers (2)

I had the same issue. I went through in debug several time; I believe this is a bug. Here is what I found:
Open TextImportOptions.m and go to line 211, it will read:
% Read Names
if opts.VariableNamesLine > 0 && rvn
names = readVariableNames(parser);
else
names = opts.SelectedVariableNames;
end
% Read Metadata
units = readVariableUnits(parser);
descr = readVariableDescriptions(parser);
The problem is that 'rvn' gets its value from a persistent variable, which means unless that parameter is specified on the first function call, it will always be false.
Change the &&, in the if statement, to 'OR' logic (read the 'NOTES' below, before doing so). Now the code will work as intended. This is what is should look like:
% Read Names
if opts.VariableNamesLine > 0 || rvn
names = readVariableNames(parser);
else
names = opts.SelectedVariableNames;
end
% Read Metadata
units = readVariableUnits(parser);
descr = readVariableDescriptions(parser);
Also, I'm not sure why the programmer decided to use an 'if else' statement to decide how to get the variable names, yet only calls a function to get the units and descriptions.
NOTES: (1) Making this change requires administrative access, (2) m file must be changed with a non matlab editor (ex: notepad++), (3) this change will only affect your local machine (i.e. other computers will have difficulties running if they do not have this change installed), (4) any updates that matlab installs may revert this code.

7 Comments

What is TextImportOptions.m? That's not a matlab file. Even detectImportOptions.m doesn't have the variable name opts.VariableNamesLine so I'm not sure what file you're working with.
Hmm, that's interesting. What version of matlab are you running? I'm on 9.3.0.713579 (R2017b).
TextImportOptions.m should exist in this folder:
C:\Program Files\MATLAB\R2017b\toolbox\shared\io\+matlab\+io\+text
The parent folders may be different depending on your setup.
readtable seems to have had quite a few updates over the last couple of releases, or a major one at some point recently. I always run into trouble when helping my colleagues with imports, as they are missing several key features.
detectImportOptions has been improved with every version since it's been introduced so I wouldn't expect the code to be similar from version to version. There's no TextImportOptions.m in R2018b, there's a getTextOpts.m instead which delegates the heavy lift to a built-in function (hence you can't see the actual detection code).
I see now, TextImportOptions is stored in a package directory which isn't allowed in the matlab path which is why it doesn't appear when I search for it using which() or similar methods (even in 2017b). It's a classdef m file. When you google " matlab TextImportOptions " there is nearly no information about this file.
Anyway, how did you end up in this classdef file? What function called it and how did you end up stepping through this file during debugging?
@Guillaume: That explains it. The fact that there are several different versions is unfortunate as it becomes difficult to write complex importopts for beginners on this forum. Many times people just reply with an error message, and therefore I usually opt for something more reliable such as textscan despite readtable usually being the more practical choice for semi-complex imports.
Sorry for interrupting your discussion, I will be on my way now :)
@Adam Danz I just kept stepping into every function that resulted in an error. I called the readtable function with arguments for both the fileName and the OPTS.

Sign in to comment.

The help for the function detectImportOptions() says
% "ReadVariableNames" - Whether or not to expect variable names in
% the file. Defaults to true.
However, for one large database, it did not got the variable names until I specified that as true in the command line, like this:
opts = detectImportOptions(path_filename,'NumHeaderLines',0,'ReadVariableNames',1)

Categories

Products

Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!