Grouping and Reading Files Sharing Unique Strings

Question

0 votes

I'm currently trying to simplify and reduce the processing time needed to read through files in a folder.

One of the problems is that I need to group certain files together based on sharing the same numeric string, then pull variables from these related files to create a row in a table and repeat this for all unique file numbers in the folder. However, the number of related matches might range from just one unique file up to 3 related files, so I can't work through the folder in a step wise manner.

What would be some corrections or alternative structure to decrease the processing time?

Here is my code below, but even this without the main part of the code is taking a long time:

reports = dir(fullfile(reports_folder, '*.doc'));
k = 1;
while k <= length(reports)  
    case_regex = '\d+\-\d+';
    baseFileName = reports(k).name;
    
    base_no =  regexp(filename, case_regex, 'match'); %ID Case
    possibleMatchFile = reports(k+1).name; %put into temporary list if they match, through which will always be in alphabetical order
    Match_1 = regexp(filename, case_regex, 'match'); %ID Case
    possibleMatchFile2 = reports(K+2).name;
    Match_2 = regexp(filename, case_regex, 'match'); %ID Case
    
    list_same_case = [baseFileName];
    if isequal(Match_1 , base_no ) 
        list_same_case(end+1) = possibleMatchFile;
    end
    if isequal(Match_2 , base_no)
        list_same_case(end+1) = possibleMatchFile2; %At this point, it should have added all the names of the additional files with the same case number, hopefully it's only the case_number name, not the entire path
    end
    
    filename = fullfile(reports_folder, baseFileName);
    %Read and grab variables from files of interest, store
    k = k + length(list_same_case)
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Zinea on 23 Feb 2024

Open in MATLAB Online

0 votes

Hi Connie Chang-Chien,

You can use a map data structure. This greatly reduces processing time as it avoids the need to compare each file with every other file as is explained below:

One-time scan: The map is populated by scanning through the list of files only once. Each file’s case number is extracted and used as a key in the map. If the case number has already been encountered, the file is appended to the list associated with that case number; otherwise, a new list is created.
Constant-time Access: Maps provide near-constant access for inserting and retrieving values based on keys. This is much faster than searching through a list or array to find if a case number is already present.

You can refer below to the given code using map:

reports = dir(fullfile(reports_folder, '*.doc')); 
num_reports = length(reports); 
case_regex = '\d+-\d+'; 
% Use a map to group files by their numeric string 
file_map = containers.Map('KeyType', 'char', 'ValueType', 'any'); 
for i = 1:num_reports 
    baseFileName = reports(i).name; 
    case_number = regexp(baseFileName, case_regex, 'match', 'once'); % Extract case number 
    
    % Check if the case number is already in the map 
    if isKey(file_map, case_number) 
        file_map(case_number){end+1} = baseFileName; 
    else 
        file_map(case_number) = {baseFileName}; 
    end 
end 
% Now iterate over each unique case number 
for case_number = keys(file_map) 
    list_same_case = file_map(case_number{1}); 
end 

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Grouping and Reading Files Sharing Unique Strings

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Grouping and Reading Files Sharing Unique Strings

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments