Replacing only certain instances of text within matlab character array
4 views (last 30 days)
Show older comments
I have a large character array in matlab: 'lineDataA' - containing many different numbers.
I would like to find and replace all instances of the number '6002' and replace with '0', apart from the very first instance.
lineData = replace(lineDataA, '6002', '0');
This replaces all instances
And
where6002 = strfind(lineDataA, '6002');
Gives the position of all the instances. However I am not sure how to replaces all the instances except the first?
Many thanks for your help,
Rob
0 Comments
Accepted Answer
Stephen23
on 20 Jan 2017
Edited: Stephen23
on 20 Jan 2017
Method One: split the string
>> str = '___6002__6002___6002___6002__';
>> idx = regexp(str,'6002','once','end');
>> strcat(str(1:idx),strrep(str(idx+1:end),'6002','0'))
ans =
___6002__0___0___0__
Method Two: use a placeholder
>> str = '___6002__6002___6002___6002__';
>> str = regexprep(str,'6002','\b','once');
>> str = strrep(str,'6002','0');
>> regexprep(str,'\b','6002')
ans =
___6002__0___0___0__
Note that the original string must not contain \b.
Method Three: dynamic regular expression
>> str = '___6002__6002___6002___6002__';
>> regexprep(str,'(.*?6002)(.*)','$1${strrep($2,''6002'',''0'')}')
ans =
___6002__0___0___0__
2 Comments
John Leal
on 16 Oct 2017
I have a similar problem. I need to replace some words for others in an extense array. I have the code but is too slow. Can you help me to find a way to make it better?:
if true
% code
textData = regexprep(textData, '[@$/#.-:-&*+=[]?!(){},''">_<;%]|', ' ');
% Remove any non alphanumeric characters
textData = regexprep(textData, '[^a-zA-Zñ ]', '');
textData = regexprep(textData, '[0-9]+', ' ');
textData = regexprep(textData, '<[^<>]+>', ' ');
textData = regexprep(textData, 'á', 'a');
textData = regexprep(textData, 'é', 'e');
textData = regexprep(textData, 'í', 'i');
textData = regexprep(textData, 'ó', 'o');
textData = regexprep(textData, 'ú', 'u');
textData = regexprep(textData, 'ñ', 'n');
textData = regexprep(textData, 'x', 's');
textData = regexprep(textData, 'cc', 'c');
textData = regexprep(textData, 'ci', 'si');
% deletedWords = ["helllo","hello";"moter","mother"] ... 50000 rows
% excludedWords = ["father","three", "tree"]... words I don't want to replace
% textData = ["my mother lives with my father";"hello Word"]... 2 million rows.
m = length(deletedWords(:,1));
for idx=1:m
w_new = deletedWords{idx,1};
w_ok = deletedWords{idx,2};
f = find(excludedWords==w_new, 1);
% only if it is not in excludesWords
if isempty(f)
% Replace EXACT word match"
textData = regexprep(textData,"(?<![\w])"+w_new+"(?![\w])" ,w_ok );
end
end
end
John Leal
on 16 Oct 2017
The main idea is to correct misspelling words in SPANISH. It is like a handmade stem adjust to my specific data. deletedWords contains the misspelling word and the correct word. These words are extracted from the same textData using jaro wrinkler to convert less frequent word to a high frequent word with more than 95% similarity.
Ty
More Answers (0)
See Also
Categories
Find more on Environment and Settings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!