Find repeated expression in array of strings, return logical.

Question

0 votes

I have data of the type

looking_for = ["apple", "melon"]

in

my_data = ["The apple is red", "The bee was yellow", "I am eating a melon", "The melon is sweet"]

with

timing = [2.5, 5, 10, 18]

I want to find when a regular expression was repeated consecutively and then return a logical index that pertains to the first observation of the repetition.

My approach:

1) Find out if the string contains one of the regular expression in looking_for, e.g. melon. I solve this using

idx = cellfun(@(x)( ~isempty(x) ), regexp(my_data, "apple"));

2) Then i transpose and multiply my indexing with the timing to get the relevant timings & remove the zeros (not shown here)

apple_timing = transpose(idx).*timing;

Which would give me a cell called apple_timing with a value of 2.5, which is exactly what I want.

I would like a bit of code that returns a variable called repeat_timing. In the case of the melon, this would return 18 - the first observed consecutive repeat of the regular expression melon.

1 Comment
Show -1 older comments Hide -1 older comments

Jos (10584) on 22 Dec 2017

Open in MATLAB Online

huh, I don't see apple being repeated in your strings?

And why do you use cellfun and regexp rather than the dedicated string find function CONTAINS which returns a logical array directly?

contains(my_data, looking_for) % → [1 0 1 1]

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Stephen23 on 22 Dec 2017

Edited: Stephen23 on 22 Dec 2017

Open in MATLAB Online

1 vote

Here is one solution based around cumsum:

% Data:
LF = {'apple', 'melon'};
MD = {'The apple is red','The bee was yellow','I am eating a melon','The melon is sweet'};
TV = [2.5, 5, 10, 18];
% Locate patterns:
fun = @(p)~cellfun('isempty',strfind(MD,p));
BM = cell2mat(cellfun(fun,LF(:),'uni',0));
CS = cumsum(BM,2);

You can use this to identify the first, second, third, etc. times that a pattern occurs, and find the related timing value:

>> [R1,C1] = find(CS==1 & BM); % First occurrence.
>> LF{R1}
ans = apple
ans = melon
>> TV(C1)
ans =
    2.5000   10.0000
>> [R2,C2] = find(CS==2 & BM); % Second occurrence.
>> LF{R2}
ans = melon
>> TV(C2)
ans =  18

You can easily automate this for an arbitrary number of matches, here I locate the first, second, and third occurrences (of which there are none in your sample data):

baz = @(n)find(CS==n & BM);
[row,col] = arrayfun(baz,1:3,'uni',0);
typ = cellfun(@(r)LF(r),row,'uni',0);
val = cellfun(@(c)TV(c),col,'uni',0);

giving:

>> typ{:}
ans =
  'apple'
  'melon'
ans =
  'melon'
ans = {}
>> val{:}
ans =
    2.5000   10.0000
ans =  18
ans = []
>>

2 Comments
Show None Hide None

Tobias on 27 Dec 2017

Edited: Tobias on 29 Dec 2017

Open in MATLAB Online

Hi Stephen, and thanks for the answer.

However, your code does not seem addresses the constraint of finding consequetive repeats. E.g:

{'The apple is good', 'The apple is red', 'The bee has stripes'}

should lead to one consecutively repeated instance, while

{'The apple is good', 'The bee has stripes', 'The apple is red'}

should lead to none.

Stephen23 on 3 Jan 2018

Edited: Stephen23 on 3 Jan 2018

Open in MATLAB Online

Ah, if you only want to identify adjacent cells then you do not need cumsum. A simple logical and will do the trick:

>> CS = BM & circshift(BM,1,2);
>> CS(:,1) = false;
>> [R1,C1] = find(CS)
R1 =  2
C1 =  4
>> LF{R1}
ans = melon
>> TV(C1)
ans =  18
>>

Sign in to comment.

Find repeated expression in array of strings, return logical.

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (0)

Categories

Tags

Community Treasure Hunt

Find repeated expression in array of strings, return logical.

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None