how to match patterns begining with : and endsing with space

5 views (last 30 days)
Hello,
I've tried to extract data from output files. The data line looks like below
OPT: ATTC_RD:3 ATT2R:0 VGARSH:194 ATT:-10.900 VGA: 9.118 NET: -1.782 METRIC:135764.002 SN:P4105473 CYL: 4319 HD: 0 ONB:master2
I would like to extract the patterns after : and before the space with the following:
a = regexp(tline(6:end),'^(.*):[.*]\s+','match')
But it doesn't work.
Also I have trouble to get the last pattern "master2".
Could someone help me on this ?
Thank you.
Jane
  2 Comments
Azzi Abdelmalek
Azzi Abdelmalek on 3 Sep 2013
Edited: Azzi Abdelmalek on 3 Sep 2013
str={'OPT: ATTC_RD:3 ATT2R:0 VGARSH:194 ATT:-10.900 VGA: 9.118 NET: -1.782 METRIC:135764.002 SN:P4105473 CYL: 4319 HD: 0 ONB:master2'}
What are you expecting as result?
Jane
Jane on 3 Sep 2013
I'm expecting to get
3 0 194 -10.900 9.118 -1.782 135764.002 P4105473 4913 0 maste2
Now, with the following a = regexp(tline(6:end),':\s*(-)*\d*\w*(\.*)\d*','match')
I can get
a =
Columns 1 through 10
':3' ':0' ':194' ':-10.900' ': 9.118' ': -1.782' ':135764.002' ':P4105473' ': 4319' ': 0'
Column 11
':master2'
but I still have trouble to get rid of :.
Jane

Sign in to comment.

Accepted Answer

Kelly Kearney
Kelly Kearney on 3 Sep 2013
This one isn't a one-liner, but should capture everything (including empty entries like the OPT one). The inconsistent spacing makes it easier to look for the patterns before the : than after:
[a,b,c] = regexp(str, '[A-Z_0-9]*:', 'match', 'start', 'end');
idx = [c+1; b(2:end)-1 length(str)];
for ii = 1:size(idx,2)
val{ii} = str(idx(1,ii):idx(2,ii));
end
[a' val']

More Answers (2)

Azzi Abdelmalek
Azzi Abdelmalek on 3 Sep 2013
Edited: Azzi Abdelmalek on 3 Sep 2013
regexp(str,'\d+(\s)?\d','match')
  2 Comments
Jane
Jane on 3 Sep 2013
Hi, Azzi,
This would not solve my problem. Here's result
>> a =regexp(tline,'\d+(\s)?\d','match')
a =
'194' '10' '900' '118' '782' '135764' '002' '4105473' '4319'
The key part is the patterns are not always numbers. They could start with word as well.
Jane
Azzi Abdelmalek
Azzi Abdelmalek on 3 Sep 2013
str={'OPT: ATTC_RD:3 ATT2R:0 VGARSH:194 ATT:-10.900 VGA: 9.118 NET: -1.782 METRIC:135764.002 SN:P4105473 CYL: 4319 HD: 0 ONB:master2'}
out=regexp(str,'[\w-]+(\.)?[\w-]+','match')
celldisp(out)

Sign in to comment.


Walter Roberson
Walter Roberson on 3 Sep 2013
regexp(str, '(?<=:)[^ :]+', 'match')
  3 Comments
Walter Roberson
Walter Roberson on 3 Sep 2013
Your requirement was for patterns beginning with : and ending with space. A space immediately after the : matches the termination requirement. Perhaps you have not completely specified your problem.
In your substring "OPT: ATTC_RD:3" is "OPT: ATTC_RD" considered as one single label whose associated value is "3", or is "OPT" considered a label whose associated value is "" and "ATTC_RD" is considered a label whose associated value is "3" ? In the substring "VGA: 9.118 NET:" then why is the value asociated with "VGA" not considered to be the empty string, followed by a label that is "9.118 NET" ? Are there any characters that cannot appear in labels? Are there any characters that cannot appear in associated values? Is it the case that when there is a non-numeric value such as "master2" then it is prohibited for there to be any space between the label and the value ?
Jane
Jane on 3 Sep 2013
Hi, Walter,
Thank you for your comments. I should attach the result to explain this issue better.
>> str
str =
OPT: ATTC_RD:3 ATT2R:0 VGARSH:194 ATT:-10.900 VGA: 9.118 NET: -1.782 METRIC:135764.002 SN:P4105473 CYL: 4319 HD: 0 ONB:master2
>> a=regexp(str, '(?<=:)[^ :]+', 'match')
a =
'3' '0' '194' '-10.900' '135764.002' 'P4105473' 'master2'
As you can see, the numbers with space after : were not caught. The first "OPT:" is confusing. I tried to use str(6:end) to ignore it. It should be ignored. Except this first label, all other labels will be followed with values in either number or word mixed with numbers. Before this value, there could be space(s). The value could be described as below pattern. I don't know how to remove the colon. :\s*(-)*\d*\w*(\.*)\d* Your pattern is simple.I'm still trying to understand it.
Thanks, Jane

Sign in to comment.

Categories

Find more on Numeric Types in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!