Extracting data from ascii file using regexp

3 views (last 30 days)
Bob Thompson
Bob Thompson on 8 Jun 2018
Edited: per isakson on 14 Sep 2019
I have an ascii file which produces the following text.
1 $FLAG1 NP1=8.,NP2=1.,P2=2.36,P3=3000000.,
2 P1=0.,4.,8.,12.,
3 P1(5)=16.,20.,24.,28.,$
4 $FLAG2 G1=18.75,$
5 $FLAG3 G2=11.25,G3=3.75,G4=26.25,G5=2.,$
6 $FLAG3 G6=.TRUE.,G7=10.,G8=2.5,G9=4.,G10=4.,$
I would like to extract the values of P2 and P3 into an array, but I need to be able to look for a variable amount for both. Additionally, the dictation of P2 and P3 might take up multiple lines, but I don't want to capture the line numbers which show up on the left. I have been attempting to use regexp() to pull out the desired values but have run into several problems. (I apologize for the inconvenience, but no, I cannot provide a sample file.)
So far my code looks something like the following:
inputs = fileread('filename');
P2s = regexp(inputs, '\<NP2=\d+.,P2=(.{1,100})P3=','tokens');
P3s = regexp(inputs, '\<P3=([1234567890., ]+)','tokens');
The challenges I have are:
1) P2s seems to only recognize the first occurrence of the of the pattern, despite the fact that it occurs at least twice in the file.
2) P3s is intended to terminate at the next alphabetic character, but since \W includes numbers I chose to define the allowable characters manually. I do not know how to include a return character '\n' into the brackets to the potential of multiple lines of P3 values.
3) I would like to not include line numbers if multiple lines need to be read in. I cannot accurately predict the values of the line numbers, nor can I specifically predict the following P title (except P3, which has followed P2 in all files I have seen). There is a duplicate of this portion of the file shortly after it, without the line numbers, however, the regexp() results do not seem to reliably pick up on this repeat (P2s was 1x1 cell, while P3s was 1x2 cell).
Any advice would be appreciated.
  1 Comment
Paolo
Paolo on 13 Jun 2018
You mentioned that P2 and P3 may take up multiple lines. What does your data look like in that situation?
Is it something like this:
1 $FLAG1 NP1=8.,NP2=1.,P2=2.36,P3=3000000.,
2 123124, P1=...
or
1 $FLAG1 NP1=8.,NP2=1.,P2=2.36,P3=3000000
2 123124, P1=...
Basically what determines whether P2 or P3 carry on the following line?

Sign in to comment.

Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!