- Do the data blocks folded always have three columns?
- "The comments give information about the upcoming data format." What information do the comments contain?
- Does each line really end with a line-number? (Just before the newline character).
Textscan: read large text files with varying format
5 views (last 30 days)
Show older comments
Hello,
I'm trying to read a large text file (up to some GB) using textscan. The file is divided in several blocks comprised of comments and data, each comment block is followed by data. However, the formats of the data blocks may vary.
The text file looks as follows:
$comment c1 1
$comment c2 - important 2
$comment c3 3
$comment c4 - important 4
$comment c5 - important 5
0.000000E+00 -6.000000E-01 -1.734401E+01 0.000000E+00 6
-CONT- -3.022156E+02 0.000000E+00 -5.884746E+01 7
-CONT- 5.884746E+01 0.000000E+00 8
4.120000E+00 -6.000000E-01 -1.735009E+01 2.538575E-02 9
-CONT- -3.023943E+02 6.774698E-01 -5.885033E+01 10
-CONT- 5.885033E+01 -3.824576E-02 11
5.056700E+01 -6.000000E-01 -1.736840E+01 5.097235E-02 12
-CONT- -3.029319E+02 1.360927E+00 -5.885897E+01 13
-CONT- 5.885897E+01 -7.653293E-02 14
9.570000E+01 -6.000000E-01 -1.739909E+01 7.696529E-02 15
-CONT- -3.038334E+02 2.056497E+00 -5.887338E+01 16
-CONT- 5.887338E+01 -1.149036E-01 17
...more data...
$comment c1 55
$comment c2 - important 56
$comment c3 57
$comment c4 - important 58
230500 -6.000000E-01 -1.736840E+01 5.097235E-02 60
-CONT- -3.029319E+02 61
630500 5.000000E-01 -1.936840E+01 5.197235E-02 62
-CONT- -4.029319E+02 63
etc.
The comments give information about the upcoming data format. Hence, I want to read the comment block using e.g.
commentBlock = textscan(fid,'%s',5,'delimiter','\n')
and then define the format string(s) to read the data block. However, due to the "-CONT-" fields (and empty fields as well), I cannot define one format spec that is able to read the whole data block. Since the amount of data (i.e. number of lines) for each block is unknown and quite large: is it possible to read this kind of file in an easy (and fast) manner?
My ideas:
1) Use textscan to read the comment block; then define two format strings for each part of the data block to make use of the repeating sequences, e.g.
% first two lines
formatSpec1 = '%s %f %f %f %d';
% third line (empty value at the end)
formatSpec2 = '%s %f %f %d';
and iterate until the next comment block (read "all at once"). I neither know how create such a loop within a "while ~feof(fid)" loop though nor how to stop when the next comment block is reached.
2) Use textscan to read comment block; read the data block line by line using textscan or fgetl until the comment block. How do I specify the format and stop when the comments start?
Is this even possible?
Thank you very much!
1 Comment
Stephen23
on 31 Jan 2016
Edited: Stephen23
on 1 Feb 2016
Can you please upload a sample file for us to try: edit your question, click the paperclip button, then both Choose file and Attach file buttons.
I will have a look at it now if you attach a (small) sample file. It does not have to be the whole file, just a representative sample. A few questions:
Answers (0)
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!