Grouping and Reading Files Sharing Unique Strings

2 views (last 30 days)
I'm currently trying to simplify and reduce the processing time needed to read through files in a folder.
One of the problems is that I need to group certain files together based on sharing the same numeric string, then pull variables from these related files to create a row in a table and repeat this for all unique file numbers in the folder. However, the number of related matches might range from just one unique file up to 3 related files, so I can't work through the folder in a step wise manner.
What would be some corrections or alternative structure to decrease the processing time?
Here is my code below, but even this without the main part of the code is taking a long time:
reports = dir(fullfile(reports_folder, '*.doc'));
k = 1;
while k <= length(reports)
case_regex = '\d+\-\d+';
baseFileName = reports(k).name;
base_no = regexp(filename, case_regex, 'match'); %ID Case
possibleMatchFile = reports(k+1).name; %put into temporary list if they match, through which will always be in alphabetical order
Match_1 = regexp(filename, case_regex, 'match'); %ID Case
possibleMatchFile2 = reports(K+2).name;
Match_2 = regexp(filename, case_regex, 'match'); %ID Case
list_same_case = [baseFileName];
if isequal(Match_1 , base_no )
list_same_case(end+1) = possibleMatchFile;
end
if isequal(Match_2 , base_no)
list_same_case(end+1) = possibleMatchFile2; %At this point, it should have added all the names of the additional files with the same case number, hopefully it's only the case_number name, not the entire path
end
filename = fullfile(reports_folder, baseFileName);
%Read and grab variables from files of interest, store
k = k + length(list_same_case)
end

Answers (1)

Zinea
Zinea on 23 Feb 2024
You can use a map data structure. This greatly reduces processing time as it avoids the need to compare each file with every other file as is explained below:
  1. One-time scan: The map is populated by scanning through the list of files only once. Each file’s case number is extracted and used as a key in the map. If the case number has already been encountered, the file is appended to the list associated with that case number; otherwise, a new list is created.
  2. Constant-time Access: Maps provide near-constant access for inserting and retrieving values based on keys. This is much faster than searching through a list or array to find if a case number is already present.
You can refer below to the given code using map:
reports = dir(fullfile(reports_folder, '*.doc'));
num_reports = length(reports);
case_regex = '\d+-\d+';
% Use a map to group files by their numeric string
file_map = containers.Map('KeyType', 'char', 'ValueType', 'any');
for i = 1:num_reports
baseFileName = reports(i).name;
case_number = regexp(baseFileName, case_regex, 'match', 'once'); % Extract case number
% Check if the case number is already in the map
if isKey(file_map, case_number)
file_map(case_number){end+1} = baseFileName;
else
file_map(case_number) = {baseFileName};
end
end
% Now iterate over each unique case number
for case_number = keys(file_map)
list_same_case = file_map(case_number{1});
end

Categories

Find more on MATLAB Report Generator in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!