MATLAB Answers

How to extract before and after a character up to a certain limit?

31 views (last 30 days)
Brett Baxter
Brett Baxter on 29 Sep 2020 at 18:37
Commented: madhan ravi on 29 Sep 2020 at 20:29
Hey everyone, I'm playing around with extractBefore and extractAfter and I was wondering if I could get Matlab to extract everything before and after a character up to a specified character boundary. Like so,
str = 'aazbbkkcbbsszaa'
I want to take something like this example string and extract all the characters before and after "c" up until it reaches the letter "z". SO my outputs might look like,
extractAfter = 'bbss'
extractBefore = 'bbkk'
How can I do this?

  0 Comments

Sign in to comment.

Accepted Answer

madhan ravi
madhan ravi on 29 Sep 2020 at 19:25
Before = regexp(str, '(?<=\z)(?:.*)(?=\c)', 'match', 'once')
After = regexp(str, '(?<=\c)(?:.*)(?=\z)', 'match', 'once')

  5 Comments

Show 2 older comments
Walter Roberson
Walter Roberson on 29 Sep 2020 at 19:46
The \ are not needed in the above code. However, if you were, for example, looking for a literal period, then the \ would be needed.
Walter Roberson
Walter Roberson on 29 Sep 2020 at 19:51
Before = regexp(str, '(?<=\z)(?:.*)(?=\c)', 'match', 'once')
In that code, the .* followed by (?=\c) tells regexp to go from the current position (imemdiately following a z) as far as possible towards the end of the string, and then to "back up" until just before a c. An implication of that is that if there are more than one c in the string after the z, that the .* part will match everything up to the last of the c instead of everything up to the first of the c.
You can fix that by changing to (?:.*?) or by using the construct I used, [^c]+

Sign in to comment.

More Answers (2)

Walter Roberson
Walter Roberson on 29 Sep 2020 at 19:17
regexp(str, {'(?<=z)[^c]+', '(?<=c)[^z]+'}, 'match','once')

  1 Comment

Walter Roberson
Walter Roberson on 29 Sep 2020 at 19:53
If you wanted to allow for the possibility of an empty match, if the string contained z immediately followed by c, then you should change the [^c]+ to [^c]* . If you want to allow for the possibility of the c being the last character in the string and you want to return empty, then change the [^z]+ to [^z]*

Sign in to comment.


Image Analyst
Image Analyst on 29 Sep 2020 at 19:42
If you want to use those specific functions, I did it by calling them twice, once with c and once with z.
str = 'aazbbkkcbbsszaa'
sb = extractBefore(str, 'c')
sa = extractAfter(str, 'c')
stringBefore = extractAfter(sb, 'z')
stringAfter = extractBefore(sa, 'z')
Of course you could combine them into fewer lines (2 instead of 4), though at the drawback of making it somewhat more cryptic:
stringBefore = extractAfter(extractBefore(str, 'c'), 'z')
stringAfter = extractBefore(extractAfter(str, 'c'), 'z')

  0 Comments

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!