Regexp help for seperating numbers, matrices, strings

1 view (last 30 days)
Hi all,
I have been trying to seperate strings of arbitrary fields including numbers, matricies, and more strings. These are comma seperated lists, and each list seperated by a semicolon.
samplestr = '{ 'auto', 1 ,1; 1, 'load('var.mat')', 'rand(nodes,1)' }'
Some of my strings also contain other strings, such as load.
I had regular expressions that original worked to seperate the comma seperated list from each other using
%Identifies arrays and matrices
matexp = '\[.*?(?=\])\]';
%Identifies strings and strings in parenthases within strings (i.e. 'load('var.mat')' is found as one string)
strexp = '''.*?(?(?='')(?<!\()))''';
%Identifies numbers
numexp = '(?:+|-)?\d*\.?\d*';
%numexp = '[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?';
%Identifies rows containing comma seperated list of (matexp|strexp|numexp)
%ending with by ';'
rowexp = ['((\s*(?:(?:' matexp ')|(?:' strexp ')|(?:' numexp '))\s*,?)*(?=(?:;|$))'];
rowgroups = regexp(samplestr,rowexp,'match')
rowgroups = ' 'auto', 1 ,1' ' 1, 'load('var.mat')', 'rand(nodes,1)' '
This will seperate the strings when all of the numbers do not include any exponents ( finds '1' not '1e-6'). The commented out 'numexp' can identify all numbers even with exponentials, but will not seperate the strings at the ';' anymore.
Does anyone know why this is and a fix? I have spent a lot of time trying to debug this but am not sure why it is happening, if anyone has a better way of doing it then please let me know.
  3 Comments
Joseph
Joseph on 14 Oct 2011
I am trying to parse entries from a batchfile. I have all of input parameters in a textfile and allow the user to enter in freeform style cells (i.e. not needing matching indicies and allowing for scalar expansion of parameters to meet the required number of parameters). I have already seperated the var/val pairs and removed the outer '{}'. All of the other parts of my code work correctly except the identification of numbers with an exponent and if I change to the more complicated regexp to catch the numbers then it won't seperate the rowgroups. I subsequently use named tokens to capture the numbers,strings,matrices of the rowgroups into element groups.
elementexp = ['(?<mat>' matexp ')|(?<string>''.*?(?='')(?<!\()'')|(?<number>' numexp ')'];
rowelements = regexp(rowgroup{1},elementexp,'names');
Ultimately should be able to build this:
inital text:
example = {4,5,'auto';6;7}
final output:
example = {4, 5, 'auto'; 6, 6, 6; 7 7 7}
Joseph
Joseph on 14 Oct 2011
I have identified how to make it seperate the samplestr properly by changing
numexp = '[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?'
to
numexp = '[-+]?[0-9]*\.?[0-9]*(?:[eE][-+]?[0-9]+)?'
I don't know why this works, but it does. The '+' shpuld(?) work fine there.
But when using the 'elementexp' (see above comment) then it seperates 'load('var.mat')' into
'load('var.mat' ', '
I am trying to use the scheme of finding the first ''' , lazy qunatifying all characters until it tests ahead the next ''', if it finds its then make sure the previous character is not '('. But then when it gets to the third ''', it fails because the previous character is not '('. I am unsure of what an elegant way of capturing the 'load('var.mat')' string. Matlab needs a little more on the precedence ordering of the lookaround operators for the regexp, such as if you already test the strings ahead for ''' and then back for '(' then how can a check ahead of the ''' for ')'.

Sign in to comment.

Answers (0)

Categories

Find more on Cell Arrays in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!