readmatrix returns NaNs instead of numeric values for nearly indistinguishable .txt file

Question

Oliver Johnson on 4 Aug 2022

1
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1774285-readmatrix-returns-nans-instead-of-numeric-values-for-nearly-indistinguishable-txt-file

Moved: Walter Roberson on 19 Aug 2022

Input Files

The attached file1.txt and file2.txt have identical structure (9 lines of header followed by data arranged in 15 columns). The header looks like this:

ITEM: TIMESTEP
881000
ITEM: NUMBER OF ATOMS
37
ITEM: BOX BOUNDS pp pp pp
-9.6850194863609573e-01 1.0509150710611115e+02
-8.0199580669506787e-01 8.7024035559953262e+01
-2.0781435615505643e+02 2.0781435615505643e+02
ITEM: ATOMS mass id type x y z c_1 c_2 f_eco[1] f_eco[2] c_sv[1] c_sv[2] c_sv[3] c_sv[4] backforth 

Expected Behavior

I want to extract the numerical data in the header contained on lines 6-8 (3 rows and 2 columns). I use readmatrix to do this as follows (for file1.txt.):

simcell = readmatrix('file1.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
   -0.9646  105.0876
   -0.7987   87.0208
 -207.7989  207.7989

It also works to extract the data that appears after the header as follows:

data_raw = readmatrix('file1.txt','FileType','text','NumHeaderLines',9); % output hidden for brevity

I have hundreds of thousands of files like this, and this approach works for almost all of them, but occasionally it fails...

Unexpected Behavior

When I do the same thing for file2.txt, it returns NaNs and I can't figure out why:

simcell = readmatrix('file2.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×1
   NaN
   NaN
   NaN

In an effort to debug the issue I looked at hidden characters, delimiters, character encoding and all appear identical between the two input files. However, I did find that if I manually delete all of the data after the header (attached as file2short.txt) I get the correct result:

simcell = readmatrix('file2short.txt','Filetype','text','Range',[6 1 8 2]) % read row 6 column 1 through row 8 column 2
simcell = 3×2
   -0.9685  105.0915
   -0.8020   87.0240
 -207.8144  207.8144

Question

I know there are many other ways one could accomplish the desired result, but that is not my question. My question is: why does this unexpected behavior occur in this example?

3 Comments
Show 1 older commentHide 1 older comment

Kangming Xu 10/181 on 10 Aug 2022

Edited: Kangming Xu 10/181 on 10 Aug 2022

Hi Oliver,

Thank you for reaching out.

I successfully reproduced the issue in MATLAB R2022a and reported the issue to our development team. I will let you know once I have an update. Let me know if you have any questions in the meantime!

Oliver Johnson on 18 Aug 2022

@Kangming, thank you for the explanation. If you submit it as an answer, I am happy to accept it.

Sign in to comment.

Sign in to answer this question.

Answer 1

Kangming Xu 10/181 on 11 Aug 2022

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1774285-readmatrix-returns-nans-instead-of-numeric-values-for-nearly-indistinguishable-txt-file#answer_1029025

Moved: Walter Roberson on 19 Aug 2022

Open in MATLAB Online

Here is the update.

The difference between the two files is that the delimiter is default detected as {','} for file2.txt and detected as {'\t' ' '} for file1.txt. The reason for it is that the provided "Range" values is limited, so rows outside of the Range are used to determine the format of a file to ensure the best result. As there are more rows of the numeric space-delimited rows of data in file1, the delimiter is selected as {'\t' ' '}.

As for why the function works properly without "Delimiter" property for file2short.txt , the detection heuristics would depend strongly on the selected data if there are only a few rows.

If the format of files and range of selected data is same, you could capture the detection options in a "DelimitedTextImportOptions" object. Please refer to the link below for more information.

https://www.mathworks.com/help/matlab/ref/matlab.io.text.delimitedtextimportoptions.html

Eg:

opts = delimitedTextImportOptions('Delimiter', ' ', 'DataLines', [6 8], 'NumVariables', 2, 'VariableTypes', {'double', 'double'})
data1 = readmatrix('file1.txt', opts)
data2 = readmatrix('file2.txt', opts)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

readmatrix returns NaNs instead of numeric values for nearly indistinguishable .txt file

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

readmatrix returns NaNs instead of numeric values for nearly indistinguishable .txt file

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments