Replacing only certain instances of text within matlab character array

>> str = '___6002__6002___6002___6002__';
>> idx = regexp(str,'6002','once','end');
>> strcat(str(1:idx),strrep(str(idx+1:end),'6002','0'))
ans =
___6002__0___0___0__

Method Two: use a placeholder

>> str = '___6002__6002___6002___6002__';
>> str = regexprep(str,'6002','\b','once');
>> str = strrep(str,'6002','0');
>> regexprep(str,'\b','6002')
ans =
___6002__0___0___0__

Note that the original string must not contain \b.

Method Three: dynamic regular expression

>> str = '___6002__6002___6002___6002__';
>> regexprep(str,'(.*?6002)(.*)','$1${strrep($2,''6002'',''0'')}')
ans =
___6002__0___0___0__

2 Comments
Show NoneHide None

John Leal on 16 Oct 2017

Open in MATLAB Online

I have a similar problem. I need to replace some words for others in an extense array. I have the code but is too slow. Can you help me to find a way to make it better?:

if true

% code
textData = regexprep(textData, '[@$/#.-:-&*+=[]?!(){},''">_<;%]|', ' ');
% Remove any non alphanumeric characters
textData = regexprep(textData, '[^a-zA-Zñ ]', '');
textData = regexprep(textData, '[0-9]+', ' ');
textData = regexprep(textData, '<[^<>]+>', ' ');
textData = regexprep(textData, 'á', 'a');
textData = regexprep(textData, 'é', 'e');
textData = regexprep(textData, 'í', 'i');
textData = regexprep(textData, 'ó', 'o');
textData = regexprep(textData, 'ú', 'u');
textData = regexprep(textData, 'ñ', 'n');
textData = regexprep(textData, 'x', 's');
textData = regexprep(textData, 'cc', 'c');
textData = regexprep(textData, 'ci', 'si');
% deletedWords = ["helllo","hello";"moter","mother"] ... 50000 rows
% excludedWords = ["father","three", "tree"]... words I don't want to replace  
% textData = ["my mother lives with my father";"hello Word"]... 2 million rows.
m = length(deletedWords(:,1));
for idx=1:m
    w_new = deletedWords{idx,1};
    w_ok = deletedWords{idx,2};
      f = find(excludedWords==w_new, 1);
      % only if it is not in excludesWords
      if isempty(f)
          % Replace EXACT word match"
          textData = regexprep(textData,"(?<![\w])"+w_new+"(?![\w])" ,w_ok );
      end
  end
end

John Leal on 16 Oct 2017

The main idea is to correct misspelling words in SPANISH. It is like a handmade stem adjust to my specific data. deletedWords contains the misspelling word and the correct word. These words are extracted from the same textData using jaro wrinkler to convert less frequent word to a high frequent word with more than 95% similarity.

Ty

Sign in to comment.

Replacing only certain instances of text within matlab character array

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Replacing only certain instances of text within matlab character array

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None