extracting numbers from strings

I am trying to read my experimental data from an excel file, where what is in one column decides that a number should be read in another, in multiple verses, in multiple files, and creates a double as an output. (I will omit the part of the code that loads the files)
the below code works well for numerical values.
NegativeView=[];
for nl=1:length(logs(:,1))
if (strfind(logs{nl,5},'negative_view_start'))==1 % jak
NegativeView(end+1)=str2num(logs{nl,6});
end;
that is if the stuff in column six is a number, I get what I wanted to get
however for another variable I have a mixed string, namely the value in a column that needs to be read will be output_33, output_66 etc, and I'd like to have a double with just 33 or 66 as a numerical value.
Tried using the regexprep function, to transform output_33 to 33 etc.; with no success. HELP
an example of what I tried is below:
rate={}
output=[]
for nl=1:length(logs(:,1))
if strcmp(logs{nl,4},'output_')
rate(end+1)=(regexprep(logs{nl,5},'output_',''))
output(end+1)=str2num(rate)
end;

Answers (2)

If your strings are always of the form
someString_someNumber
then you can just use something more simple like
splitStr = strsplit( str, '_' );
n = str2num( splitStr{2} )

1 Comment

I still don't know how to include either this function or the regexprep function in my code, so that it does what I want it to do. I need matlab to extract the number from the someString_someNumber form in column 5 each time that column 4 in the same row contains the string 'outcome_'. And I need the output to be a double array.
Using regexprep seems roundabout. Why not use regexp to extract what you need rather than replacing what you don't need?
One possible regex:
output(end+1) = str2double(regexp(logs{nl, 5}, '(?<=output_).*', 'match', 'once'))

2 Comments

Thank you Could you explaing what these stand for: '(?<=output_).*', 'match', 'once'
The regular expression language is well detailed in matlab's documentation and, if it's not enough, there are plenty of tutorials on the net.
(?<= ) is a lookbehind. It means that the match must be preceded by the expression in the lookbehind, in this case, output_
. is a match for all characters. * is a quantifier which means match 0 or more of the preceding character. Actually, I should have used + (1 ore more).
So the regular expression match a sequence of 0 or more of any character immediately following output_. There are many other ways you could have written the expression depending on what you want to accept/reject. E.g:
regexp(logs{nl, 5}, '\d+', 'match', 'once')
may also work for you if you're only looking at integer (it simply extracts any sequence of numeric digits.
As per the documentation of regexp, 'match' tells it to return the match (by default it just return the start position), and 'once' tells it to only do the matching once. It's not strictly necessary in your case.

This question is closed.

Tags

Asked:

on 15 Dec 2016

Closed:

on 20 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!