Best solution to finding repeating characters on a line.

7 views (last 30 days)
rogox
rogox on 13 Jul 2021
Commented: rogox on 29 Jul 2021
I am looking for any instances of two characters (e/d) being repeated in a row greater then or equal to 10. I just want to either print every line that this occurs to the command line or stop and print the location of the stop everytime it is detected. Basically I am trying to find when e and d show up over ten times grouped together in a large data file. For example:
asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs
asseefadfefeeedddeeedddasdfsdf
asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs
asseefadfefeeedddeeedddasdfsdf
The script would then print out line 2 and line 4 in the command line.
Thank you for your help
  5 Comments
rogox
rogox on 29 Jul 2021
@Walter Roberson I understand and won't do it again, sorry for the trouble. Thank you for posting the link @Stephen Cobeldick.

Sign in to comment.

Accepted Answer

Stephen
Stephen on 13 Jul 2021
Edited: Stephen on 13 Jul 2021
inp = {'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs';'asseefadfefaaadddaaadddasdfsdf';'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs';'asseefadfefaaadddaaadddasdfsdf'};
rgx = '(.)(??$1*)(.?)(??[$1$2]*)';
spl = regexp(inp,rgx,'match');
idx = cellfun(@(c)any(cellfun(@numel,c)>9),spl);
find(idx)
ans = 2×1
2 4
  12 Comments
rogox
rogox on 13 Jul 2021
I didn't realize it was that simple, thank you soo much.

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 13 Jul 2021
You say "10 or over", so is it correct that the program needs to all possible patterns? For example,
'adadadadaaaadadadadaaa'
ans = 'adadadadaaaadadadadaaa'
(length 22) should be located if it exists?
S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf'}
S = 1×2 cell array
{'asseefadfefaaadddaaadddasdfsdf'} {'asseeadadadadaaaadadadadaaadfsdf'}
matches = regexp(S, '([ad]{5,})\1', 'match');
celldisp(matches)
matches{1}{1} = aaadddaaaddd matches{2}{1} = adadadadaaaadadadadaaa
  5 Comments
Walter Roberson
Walter Roberson on 14 Jul 2021
Example of reading from file:
%create a file for demonstration purposes only
tname = [tempname() '.txt'];
fid = fopen(tname, 'w');
T = regexprep('asseefadfefaaadddaaadddasdfsdf\nasseeadadadadaaaadadadadaaadfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\n', 'a', 'e');
fprintf(fid, T);
fclose(fid);
%okay, main function
filename = tname;
%okay, main function
S = readlines(filename);
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
matches = 4×1 string array
"esseefedfefeeedddeeedddesdfsdf" "esseeededededeeeededededeeedfsdf" "esseefedfefeeedddeeedddesdfsdf" "esseefedfefeeedddeeedddesdfsdf"
%alternative without readlines
S = regexp(fileread(filename), '\r?\n', 'split');
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
matches = 1×4 cell array
{'esseefedfefeeedddeeedddesdfsdf'} {'esseeededededeeeededededeeedfsdf'} {'esseefedfefeeedddeeedddesdfsdf'} {'esseefedfefeeedddeeedddesdfsdf'}
%alternative without splitting
S = fileread(filename);
matches = regexp(S, '^.*[de]{10}.*$', 'match', 'dotexceptnewline', 'lineanchors');
matches
matches = 1×4 cell array
{'esseefedfefeeedddeeedddesdfsdf'} {'esseeededededeeeededededeeedfsdf'} {'esseefedfefeeedddeeedddesdfsdf'} {'esseefedfefeeedddeeedddesdfsdf'}

Sign in to comment.

Tags

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!