Finding strings with common character
6 views (last 30 days)
Show older comments
I would like to find in a text file all unique strings with common first character, e.g. "G" (unique i.e. without repetition: if any, the same, string occurs several tims I need to specify/print it only once.
Any help would be appreciated.
1 Comment
madhan ravi
on 22 Dec 2023
Edited: madhan ravi
on 22 Dec 2023
Give an example or attach your text file and show the expected result.
Accepted Answer
Hassaan
on 22 Dec 2023
Edited: Hassaan
on 22 Dec 2023
You can use a regular expression to separate the strings and then filter out the unique ones that start with 'G'.
% Specify the file name and the common character
filename = 'yourfile.txt'; % Replace with your text file name
commonChar = 'G'; % Replace with the common character you're looking for
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the entire file content as a single string
fileContent = fscanf(fileID, '%c');
fclose(fileID); % Close the file after reading
% Use regular expression to separate strings that start with 'G'
pattern = ['\' commonChar '\w*'];
allMatches = regexp(fileContent, pattern, 'match');
% Find unique strings
uniqueStrings = unique(allMatches);
% Print the unique strings
disp(['Unique strings starting with the character ' commonChar ':']);
for i = 1:length(uniqueStrings)
disp(uniqueStrings{i});
end
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
Output:
Unique strings starting with the character G:
G123
G123Gabc
G321Yo
Gabc
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
2 Comments
Hassaan
on 22 Dec 2023
One of the many approaches without using regexp:
% The character to search for
searchChar = 'G';
% Specify the file name
filename = 'code.txt'; % Replace with your text file name
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the entire file content as a single string
fileContent = fscanf(fileID, '%c');
fclose(fileID); % Close the file after reading
% Remove newlines and carriage returns
fileContent = strrep(fileContent, newline, '');
fileContent = strrep(fileContent, char(13), ''); % Carriage return
% Split the text into individual words assuming 'G' is the delimiter
words = strsplit(fileContent, searchChar);
% Reattach 'G' to the start of each non-empty word
words = words(~cellfun('isempty', words));
words = strcat(searchChar, words);
% Find unique words that start with 'G'
uniqueWords = unique(words);
% Print the unique strings
disp(['Unique strings starting with the character ' searchChar ':']);
disp(uniqueWords);
% Print the unique strings
disp(['Unique strings starting with the character ' commonChar ':']);
for i = 1:length(uniqueWords)
disp(uniqueWords{i});
end
This approach will filter the words that start with the searchChar and remove any empty entries that result from the strsplit. Then, it finds the unique words and prints them out. Make sure to adjust the filename to the actual file you're reading from.
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
G123GabcYo123G321Yo
Output
Unique strings starting with the character G:
G123
G321Yo
Gabc
GabcYo123
Gabcabcde
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
More Answers (3)
Steven Lord
on 22 Dec 2023
Read the data into MATLAB, split it into separate words if necessary, then use startsWith to determine which words start with your desired character.
L = readlines('bench.dat');
oneLine = L(1) % Just operate on the first line
s = split(oneLine)
startsWithB = startsWith(s, "B")
wordStartingWithB = s(startsWithB)
The unique function likely will be useful to you as well.
0 Comments
Hassaan
on 22 Dec 2023
Edited: Hassaan
on 22 Dec 2023
To achieve this in MATLAB, you would typically read the text file into a string array or cell array, then use string manipulation functions to find and list the unique strings. Here's a step-by-step guide with code snippets:
Read the Text File: Load the contents of the text file into MATLAB.
filename = 'yourfile.txt'; % Replace with your text file name
fileID = fopen(filename, 'r');
data = textscan(fileID, '%s');
fclose(fileID);
extractedStrings = data{1};
Filter Strings by First Character: Find strings that start with the specified character.
commonChar = 'G'; % Replace with the common character you're looking for
startsWithG = strncmp(extractedStrings, commonChar, 1);
filteredStrings = extractedStrings(startsWithG);
Find Unique Strings: Get the unique strings from the filtered list.
uniqueStrings = unique(filteredStrings);
Print Unique Strings: Display or print the unique strings.
disp(uniqueStrings);
On MATLAB, you can run this script after replacing 'yourfile.txt' with the actual path to your text file and commonChar with the character you're interested in. This will print all unique strings that start with that character, displaying each string only once.
Full Code:
% Specify the file name and the common character
filename = 'yourfile.txt'; % Replace with your text file name
commonChar = 'G'; % Replace with the common character you're looking for
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the content of the file into a cell array of strings
data = textscan(fileID, '%s');
fclose(fileID); % Close the file after reading
extractedStrings = data{1}; % Extract the strings from the cell array
% Filter strings by the first character
startsWithCommonChar = strncmp(extractedStrings, commonChar, 1);
% Get the unique strings that start with the specified character
filteredStrings = extractedStrings(startsWithCommonChar);
uniqueStrings = unique(filteredStrings);
% Print the unique strings
disp('Unique strings starting with the specified character:');
disp(uniqueStrings);
Input file content:
Gabc
abcde
G123
Output:
Unique strings starting with the specified character:
{'G123'}
{'Gabc'}
For instance, if you need the output as a simple list without the curly braces and single quotes, you can loop through the cell array and print each string:
disp('Unique strings starting with the character G:');
for i = 1:length(uniqueStrings)
disp(uniqueStrings{i});
end
Input file content:
Gabc
abcde
G123
Output:
Unique strings starting with the specified character:
G123
Gabc
-----------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
4 Comments
Dyuman Joshi
on 22 Dec 2023
You've only updated for the 2nd point I raised.
Say the input is -
G123Gabc
Yo123G321Yo
What should be the output then?
Paul
on 22 Dec 2023
type Gfile.txt
% assuming strings to return are space delimited
text = split(string(fileread('Gfile.txt')));
unique(text(startsWith(text,"G")))
0 Comments
See Also
Categories
Find more on Entering Commands in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!