Looking for an alternative to regexp.
7 views (last 30 days)
Show older comments
I'm looking for an alternative way to parse through strings to find bits of information, or for a way to use regexp that doesn't give me nested cells. I'm tired of dealing with the nested cells.
I've got a string that contains node numbers and locations. I would like to capture all of the node numbers, and then put them into a double array. I can identify and extract the numbers with regexp, but any time I use regexp with tokens I end up with cells inside of cells for a reason that I don't entirely understand. Am I doing something to create the extra layer of cells, or is there another command that can parse and extract the information I want?
singlestring = 'nxyzs=74xyz[0]:-2.0447000e+010.0000000e+001.8288000e+00Nearestnodeis7736664atadistanceof4.6823094e-03locatedat-2.0451682e+012.2396341e-161.8288000e+00';
repeatstrings = repmat(singlestring,1,5);
nodes = regexp(repeatstrings,'Nearestnodeis(\d+)','tokens');
The nodes variable will contain a 1x5 cell matrix, where each cell contains a 1x1 cell, which contains the node number string.
2 Comments
Stephen23
on 24 Mar 2021
Edited: Stephen23
on 25 Mar 2021
Tokens are always returned in a cell array (with size equal to the number of tokens (thus in your case scalar, because you only specified one token)). If multiple matches is enabled (the default) then every output is nested in a cell array (with size equal to the number of matches made), so you will get nested cell arrays of tokens.
FYI, if you only need to match the regular expression exactly once, then you can specify the 'once' option and the outputs are not nested in cell arrays. This does not apply to your example, but is useful in other cases.
As well as concatenating the output data or using named tokens as the answers below show, you can also use a look-behind assertion and return the matched string (no nested cell arrays), which makes post-processing much simpler:
nodes = regexp(repeatstrings,'(?<=Nearestnodeis)\d+','match')
vec = str2double(nodes)
Answers (2)
Star Strider
on 23 Mar 2021
See if adding either:
Out = cell2mat([nodes{:}].')
or:
Out = str2num(cell2mat([nodes{:}].'))
to the posted code provides the desired result.
Note that str2num is not generally recommended, however it works when str2double produces an unacceptable result.
0 Comments
Walter Roberson
on 23 Mar 2021
singlestring = 'nxyzs=74xyz[0]:-2.0447000e+010.0000000e+001.8288000e+00Nearestnodeis7736664atadistanceof4.6823094e-03locatedat-2.0451682e+012.2396341e-161.8288000e+00';
repeatstrings = repmat(singlestring,1,5);
nodes = regexp(repeatstrings,'Nearestnodeis(?<NN>\d+)','names');
str2double({nodes.NN})
3 Comments
Walter Roberson
on 23 Mar 2021
(?<WORD>PATTERN)
creates a named token; whatever is matched by PATTERN gets stored in a struct field named WORD, as text. But even though it is called a "named token", oddly enough to get back the struct, you have to ask for "names" instead of for "tokens".
You get back a struct array, one struct array entry for each time the overall pattern matches -- in this case one for each time Nearestnodeis is followed by a sequence of digits. So a 5 x 1 struct in this case, each with a field named as indicated, NN. So as usual with struct arrays you call pull out all of the entries using struct expansion inside a {}, creating a cell array of character vectors, and then you can convert them all at once using str2double() on the cell array.
See Also
Categories
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!