Search/Find sub strings in a string

1 view (last 30 days)
Philipp Mueller
Philipp Mueller on 9 Feb 2021
Edited: Stephen23 on 12 Feb 2021
I have two cell arrays:
  1. The first cell array is a 1x279 cell array which contains components abbreviations like - Ra, Bt, Rag, Rg, Vm, SzF and so on.
  2. The second cell array is a 1x439 cell array which contains strings like 'F.y_1T.Rg1_N7' or 'F.y_1T.Ra1.SzF2_N590'
You can see the component abbreviations of the first cell array are in the second cell arrays as a substring / part of the whole string .
  1. For example second cell array 'F.y_1T.Rg1_N7' -> Rg -component abbreviation found from the first cell array. (The rest of the string is something else - not important).
  2. Second Example: Second cell array F.y_1T.Ra1.SzF2_N590 -> Ra and SzF are found from the the first cell array
So my target is to know which of the 279 component abbreviations are found in the second array? I tried a lot but with no working solution. I forgot my matlab skills cause i dont need very often.
Here is a small part of my code (not working/ not the whole code)
for t=1:size;%279
for h=1:numCols1%439
locigal(t,h) = strcmp(Messstelle{1,h}, abbrev_components{1,t})
%locigal_1{t,h} = strcmp(Messstelle{1,h}, abbrev_components{1,t})
%locigal{t,h} = strmatch(Messstelle{1,h},abbrev_components{1,t})
What is the difference between 'F.y_1T.Rg1_N7'`and "F.y_1T.Rg1_N7" both are cell arrays. How can i convert it?
Thank you
Philipp Mueller
Philipp Mueller on 9 Feb 2021
Thank you for your answer. rag = rag -> ra is in this case false. I want to prevent matching ra.

Sign in to comment.

Answers (2)

Stephen23 on 9 Feb 2021
Edited: Stephen23 on 9 Feb 2021
This matches abbreviations followed by one digit (thus avoiding the Ra/Rag matching problem):
C = {'Ra','Bt','Rag','Rg','Vm','SzF'};
D = {'F.y_1T.Rg1_N7','F.y_1T.Rag1.SzF2_N590'}; % Ra changed to Rag !
rgx = sprintf('|%s',C{:});
rgx = sprintf('(%s)%s',rgx(2:end),'(?=\d)');
tmp = regexp(D,rgx,'match');
ans = 1x1 cell array
ans = 1x2 cell array
{'Rag'} {'SzF'}
If there can be other characters trailing the abbreviations then adapt the lookahead assertion as required:
After this you can simply do ismember on each cell of tmp:
boo = cellfun(@(c)ismember(C,c),tmp,'uni',0);
boo = vertcat(boo{:})
boo = 2x6 logical array
0 0 0 1 0 0 0 0 1 0 0 1
Stephen23 on 12 Feb 2021
Edited: Stephen23 on 12 Feb 2021
"I just want to have a simple array where are no duplicate entries."
Each row of array boo contains true to indicate if an abbreviation occurs one or more times in the corresponding string. It does not contain duplicate true values for any one abbreviation.
Please give an example string with duplicate values, and also the expected output.

Sign in to comment.

Jan on 9 Feb 2021
Edited: Jan on 9 Feb 2021
KeyList = {'Ra', 'Bt', 'Rag', 'Rg', 'Vm', 'SzF'};
StringList = {'F.y_1T.Rg1_N7', 'F.y_1T.Ra1.SzF2_N590', ...
nKey = numel(KeyList);
nString = numel(StringList);
L = false(nString, nKey);
for t = 1:nKey
% Mask other keys:
exclude = find(contains(KeyList, KeyList{t}));
exclude(exclude == t) = [];
S = StringList;
for k = exclude
S = strrep(S, KeyList{k}, '*');
L(:, t) = contains(S, KeyList{t});
% Matlab < R2016b:
% L(:, t) = ~strcmp(S, strrep(StringList, KeyList{t}, ''));
Now "Rag" is masked, if "Ra" is searched.


Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!