Textscan: read large text files with varying format

Question

ch555 on 31 Jan 2016

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/265968-textscan-read-large-text-files-with-varying-format

Edited: Stephen23 on 1 Feb 2016

Hello,

I'm trying to read a large text file (up to some GB) using textscan. The file is divided in several blocks comprised of comments and data, each comment block is followed by data. However, the formats of the data blocks may vary.

The text file looks as follows:

    $comment c1                                                                    1
    $comment c2 - important                                                        2
    $comment c3                                                                    3
    $comment c4 - important                                                        4
    $comment c5 - important                                                        5
          0.000000E+00     -6.000000E-01     -1.734401E+01      0.000000E+00       6
    -CONT-                 -3.022156E+02      0.000000E+00     -5.884746E+01       7
    -CONT-                  5.884746E+01      0.000000E+00                         8
          4.120000E+00     -6.000000E-01     -1.735009E+01      2.538575E-02       9
    -CONT-                 -3.023943E+02      6.774698E-01     -5.885033E+01      10
    -CONT-                  5.885033E+01     -3.824576E-02                        11
          5.056700E+01     -6.000000E-01     -1.736840E+01      5.097235E-02      12
    -CONT-                 -3.029319E+02      1.360927E+00     -5.885897E+01      13
    -CONT-                  5.885897E+01     -7.653293E-02                        14
          9.570000E+01     -6.000000E-01     -1.739909E+01      7.696529E-02      15
    -CONT-                 -3.038334E+02      2.056497E+00     -5.887338E+01      16
    -CONT-                  5.887338E+01     -1.149036E-01                        17
  ...more data...
    $comment c1                                                                   55
    $comment c2 - important                                                       56
    $comment c3                                                                   57
    $comment c4 - important                                                       58
          230500           -6.000000E-01     -1.736840E+01      5.097235E-02      60
    -CONT-                 -3.029319E+02                                          61
          630500            5.000000E-01     -1.936840E+01      5.197235E-02      62
    -CONT-                 -4.029319E+02                                          63
  etc.

The comments give information about the upcoming data format. Hence, I want to read the comment block using e.g.

commentBlock = textscan(fid,'%s',5,'delimiter','\n')

and then define the format string(s) to read the data block. However, due to the "-CONT-" fields (and empty fields as well), I cannot define one format spec that is able to read the whole data block. Since the amount of data (i.e. number of lines) for each block is unknown and quite large: is it possible to read this kind of file in an easy (and fast) manner?

My ideas:

1) Use textscan to read the comment block; then define two format strings for each part of the data block to make use of the repeating sequences, e.g.

% first two lines 
  formatSpec1 = '%s %f %f %f %d';
% third line (empty value at the end)
  formatSpec2 = '%s %f %f %d';

and iterate until the next comment block (read "all at once"). I neither know how create such a loop within a "while ~feof(fid)" loop though nor how to stop when the next comment block is reached.

2) Use textscan to read comment block; read the data block line by line using textscan or fgetl until the comment block. How do I specify the format and stop when the comments start?

Is this even possible?

Thank you very much!

1 Comment
Show -1 older commentsHide -1 older comments

Stephen23 on 31 Jan 2016

Edited: Stephen23 on 1 Feb 2016

Can you please upload a sample file for us to try: edit your question, click the paperclip button, then both Choose file and Attach file buttons.

I will have a look at it now if you attach a (small) sample file. It does not have to be the whole file, just a representative sample. A few questions:

Do the data blocks folded always have three columns?
"The comments give information about the upcoming data format." What information do the comments contain?
Does each line really end with a line-number? (Just before the newline character).

Sign in to comment.

Sign in to answer this question.

Textscan: read large text files with varying format

1 Comment
Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Textscan: read large text files with varying format

1 Comment Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments