Readtable having problems with text files

23 views (last 30 days)
EDIT: running the code here
I'm trying to read several hundred text files using readtable. Each file has six numeric columns (date and time) and some of the lines have a text comment at the end. Not all files have text comments. The text files were generated by a linux bash script and the text comments were added using vi.
Readtable has problems with some, not all, of the text files with comments in them. And I can't work out why.
Two example files are attached. Here is what I get when I use readtable:
>> readtable("20231128-0841.txt")
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the
table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans =
3×1 table
x20231128085253_6RF
_________________________
{'2023 11 28 08 53 57.3'}
{'2023 11 28 08 54 51.9'}
{'2023 11 28 09 02 29.4'}
>> readtable("20231227-1456.txt")
ans =
22×7 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7
____ ____ ____ ____ ____ ____ __________
2023 12 27 14 56 38.7 {0×0 char}
2023 12 27 15 0 51.6 {0×0 char}
2023 12 27 15 0 54.5 {0×0 char}
2023 12 27 15 1 13.3 {0×0 char}
2023 12 27 15 1 31.8 {0×0 char}
: : : : : : :
readtable("20231128-0841.txt")
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans = 3×1 table
x20231128085253_6RF _________________________ {'2023 11 28 08 53 57.3'} {'2023 11 28 08 54 51.9'} {'2023 11 28 09 02 29.4'}
readtable("20231227-1456.txt")
ans = 22×7 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 ____ ____ ____ ____ ____ ____ __________ 2023 12 27 14 56 38.7 {0×0 char} 2023 12 27 15 0 51.6 {0×0 char} 2023 12 27 15 0 54.5 {0×0 char} 2023 12 27 15 1 13.3 {0×0 char} 2023 12 27 15 1 31.8 {0×0 char} 2023 12 27 15 2 39.1 {0×0 char} 2023 12 27 15 3 39.8 {0×0 char} 2023 12 27 15 4 18.4 {0×0 char} 2023 12 27 15 4 43.1 {0×0 char} 2023 12 27 15 5 27.7 {0×0 char} 2023 12 27 15 5 36.2 {0×0 char} 2023 12 27 15 5 45.1 {0×0 char} 2023 12 27 15 6 8.6 {0×0 char} 2023 12 27 15 6 24.2 {0×0 char} 2023 12 27 15 11 6.7 {0×0 char} 2023 12 27 15 11 11.1 {0×0 char}
Any suggestions as to why this is happening?

Accepted Answer

Cris LaPierre
Cris LaPierre on 15 Jan 2024
Edited: Cris LaPierre on 15 Jan 2024
When you don't specify your options, MATLAB has to automatically determine the file format. It won't always 'guess' correctly.
In the first case, it looks like it considers the one line with a comment to be your variable name line. Everything before it is skipped (treated as headerlines) and everything after is considered data. I suspect that, since there are 7 variable names, but only 6 columns of data, MATLAB has 'decided' to treat everything as a single text variable.
The second case has a slightly different format (2 lines with comments) which allows MATLAB to determine the 7th column is an additional variable, and reads in 7 columns.
By specifying a few import options, you can make the import consitent.
readtable("20231128-0841.txt",'NumHeaderLines',0,'ReadVariableNames',0, 'ExpectedNumVariables',7)
ans = 16×7 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 ____ ____ ____ ____ ____ ____ __________ 2023 11 28 8 41 16.5 {0×0 char} 2023 11 28 8 41 18.8 {0×0 char} 2023 11 28 8 41 47.8 {0×0 char} 2023 11 28 8 41 49.4 {0×0 char} 2023 11 28 8 42 28.2 {0×0 char} 2023 11 28 8 43 29.1 {0×0 char} 2023 11 28 8 44 5.2 {0×0 char} 2023 11 28 8 44 21.3 {0×0 char} 2023 11 28 8 45 20 {0×0 char} 2023 11 28 8 51 48.8 {0×0 char} 2023 11 28 8 52 9.8 {0×0 char} 2023 11 28 8 52 30 {0×0 char} 2023 11 28 8 52 53.6 {'RF?' } 2023 11 28 8 53 57.3 {0×0 char} 2023 11 28 8 54 51.9 {0×0 char} 2023 11 28 9 2 29.4 {0×0 char}
readtable("20231227-1456.txt",'NumHeaderLines',0,'ReadVariableNames',0, 'ExpectedNumVariables',7)
ans = 22×7 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 ____ ____ ____ ____ ____ ____ __________ 2023 12 27 14 56 38.7 {0×0 char} 2023 12 27 15 0 51.6 {0×0 char} 2023 12 27 15 0 54.5 {0×0 char} 2023 12 27 15 1 13.3 {0×0 char} 2023 12 27 15 1 31.8 {0×0 char} 2023 12 27 15 2 39.1 {0×0 char} 2023 12 27 15 3 39.8 {0×0 char} 2023 12 27 15 4 18.4 {0×0 char} 2023 12 27 15 4 43.1 {0×0 char} 2023 12 27 15 5 27.7 {0×0 char} 2023 12 27 15 5 36.2 {0×0 char} 2023 12 27 15 5 45.1 {0×0 char} 2023 12 27 15 6 8.6 {0×0 char} 2023 12 27 15 6 24.2 {0×0 char} 2023 12 27 15 11 6.7 {0×0 char} 2023 12 27 15 11 11.1 {0×0 char}
If you don't want/need the comments, then you use the ExtraColumnsRule name-value pair
readtable("20231227-1456.txt",'NumHeaderLines',0,'ReadVariableNames',0, 'ExpectedNumVariables',6,'ExtraColumnsRule','ignore')
ans = 22×6 table
Var1 Var2 Var3 Var4 Var5 Var6 ____ ____ ____ ____ ____ ____ 2023 12 27 14 56 38.7 2023 12 27 15 0 51.6 2023 12 27 15 0 54.5 2023 12 27 15 1 13.3 2023 12 27 15 1 31.8 2023 12 27 15 2 39.1 2023 12 27 15 3 39.8 2023 12 27 15 4 18.4 2023 12 27 15 4 43.1 2023 12 27 15 5 27.7 2023 12 27 15 5 36.2 2023 12 27 15 5 45.1 2023 12 27 15 6 8.6 2023 12 27 15 6 24.2 2023 12 27 15 11 6.7 2023 12 27 15 11 11.1

More Answers (2)

Hassaan
Hassaan on 15 Jan 2024
  1. Inconsistent Number of Columns: If some lines have comments (an extra column of text) and others don't, readtable might not know how to handle lines with different numbers of columns. You can tell readtable to treat any extra text as additional variable(s) by using the TextType and Delimiter options.
  2. Non-standard Delimiters: If the delimiter used in the files is not a tab or comma (the standard delimiters readtable looks for), you need to specify it using the Delimiter parameter.
  3. Headers: If your files do not contain headers, or the headers are inconsistent, you need to handle this using the HeaderLines parameter (to skip header lines) or by specifying the VariableNames directly.
  4. Text Qualifiers: If your comments are enclosed in qualifiers like quotes, you should specify this using the TextType parameter.
  5. Malformed Lines: Sometimes files have hidden or non-printable characters that can cause readtable to misinterpret lines. Ensure that your files are clean and uniform in their structure.
% Specify the options for readtable
opts = detectImportOptions('20231227-1456.txt');
opts = setvartype(opts, 'Var7', 'string'); % Set the type of the comment column to string
opts.Delimiter = ' '; % Set the delimiter if it's not the default
opts.TextType = 'string'; % Read text data as raw strings
opts.MissingRule = 'fill'; % Fill missing data with NaN or empty strings
% Read the table with the specified options
dataTable = readtable('20231227-1456.txt', opts);
If readtable still doesn't work as expected, you may need to preprocess the files to make them uniform. This could involve using fgetl in a loop to read each line of the file, parsing the line manually, and then constructing the table row by row.
To handle comments, you might use regular expressions with regexp to separate the numeric data from the comments and then concatenate the numeric data into a matrix and the comments into a cell array.
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.
  1 Comment
Stephen23
Stephen23 on 15 Jan 2024
Edited: Stephen23 on 15 Jan 2024
"You can tell readtable to treat any extra text as additional variable(s) by using the TextType and Delimiter options."
How does setting the TextType make any difference to how many variables are recognised?
"Non-standard Delimiters: If the delimiter used in the files is not a tab or comma (the standard delimiters readtable looks for), you need to specify it using the Delimiter parameter."
Actually READTABLE looks for more than just those, see the table under "Delimiter" here:
"Text Qualifiers: If your comments are enclosed in qualifiers like quotes, you should specify this using the TextType parameter."
Setting the TextType only changes the output class. It has no effect on recognising quoted text.
The Format can be used to specify quoted text.

Sign in to comment.


dormant
dormant on 16 Jan 2024
Many thanks to both answerers. I thought I was going to have to do it the old-fashioned way, line-by-line.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Tags

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!