Clear Filters
Clear Filters

readmatrix returns NaNs instead of numeric values for nearly indistinguishable .txt file

52 views (last 30 days)
Input Files
The attached file1.txt and file2.txt have identical structure (9 lines of header followed by data arranged in 15 columns). The header looks like this:
ITEM: TIMESTEP
881000
ITEM: NUMBER OF ATOMS
37
ITEM: BOX BOUNDS pp pp pp
-9.6850194863609573e-01 1.0509150710611115e+02
-8.0199580669506787e-01 8.7024035559953262e+01
-2.0781435615505643e+02 2.0781435615505643e+02
ITEM: ATOMS mass id type x y z c_1 c_2 f_eco[1] f_eco[2] c_sv[1] c_sv[2] c_sv[3] c_sv[4] backforth
Expected Behavior
I want to extract the numerical data in the header contained on lines 6-8 (3 rows and 2 columns). I use readmatrix to do this as follows (for file1.txt.):
simcell = readmatrix('file1.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
-0.9646 105.0876 -0.7987 87.0208 -207.7989 207.7989
It also works to extract the data that appears after the header as follows:
data_raw = readmatrix('file1.txt','FileType','text','NumHeaderLines',9); % output hidden for brevity
I have hundreds of thousands of files like this, and this approach works for almost all of them, but occasionally it fails...
Unexpected Behavior
When I do the same thing for file2.txt, it returns NaNs and I can't figure out why:
simcell = readmatrix('file2.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×1
NaN NaN NaN
In an effort to debug the issue I looked at hidden characters, delimiters, character encoding and all appear identical between the two input files. However, I did find that if I manually delete all of the data after the header (attached as file2short.txt) I get the correct result:
simcell = readmatrix('file2short.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
-0.9685 105.0915 -0.8020 87.0240 -207.8144 207.8144
Question
I know there are many other ways one could accomplish the desired result, but that is not my question. My question is: why does this unexpected behavior occur in this example?
  3 Comments
Kangming Xu 10/181
Kangming Xu 10/181 on 10 Aug 2022
Edited: Kangming Xu 10/181 on 10 Aug 2022
Hi Oliver,
Thank you for reaching out.
I successfully reproduced the issue in MATLAB R2022a and reported the issue to our development team. I will let you know once I have an update. Let me know if you have any questions in the meantime!

Sign in to comment.

Accepted Answer

Kangming Xu 10/181
Kangming Xu 10/181 on 11 Aug 2022
Moved: Walter Roberson on 19 Aug 2022
Here is the update.
The difference between the two files is that the delimiter is default detected as {','} for file2.txt and detected as {'\t' ' '} for file1.txt. The reason for it is that the provided "Range" values is limited, so rows outside of the Range are used to determine the format of a file to ensure the best result. As there are more rows of the numeric space-delimited rows of data in file1, the delimiter is selected as {'\t' ' '}.
As for why the function works properly without "Delimiter" property for file2short.txt , the detection heuristics would depend strongly on the selected data if there are only a few rows.
If the format of files and range of selected data is same, you could capture the detection options in a "DelimitedTextImportOptions" object. Please refer to the link below for more information.
Eg:
opts = delimitedTextImportOptions('Delimiter', ' ', 'DataLines', [6 8], 'NumVariables', 2, 'VariableTypes', {'double', 'double'})
data1 = readmatrix('file1.txt', opts)
data2 = readmatrix('file2.txt', opts)

More Answers (0)

Tags

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!