how to extract data from directory

I have main directory and 20 sub directories in main directory. In each sub directories there are 4 files. Now, I would like to extract those 4 files from each sub directories in matlab files so that I can do manipulation for each file without doing manually. Is there any suggestion for this?

 Accepted Answer

project_dir = pwd(); %or give a particular directory name
mainDirContents = dir(project_dir);
mainDirContents(~[MainDirContents.isdir]) = []; %remove non-folders
mask = ismember( {MainDirContents.name}, {'.', '..'} );
mainDirContents(mask) = []; %remove . and .. folders
num_subfolder = length(mainDirContents);
for subfold_idx= 1 : num_subfolder
this_folder = fullfile( project_dir, mainDirContents(subfold_idx).name );
fprintf('Now working with folder "%s"\n', this_folder );
at2Contents = dir( fullfile(this_folder, '*.AT2') );
num_at2 = length(at2Contents);
for at2idx = 1 : num_at2
this_at2 = fullfile( this_folder, at2Contents(at2idx).name );
now do something with the file whose complete name is given by this_at2
end
end

7 Comments

Hi Robertson,
Thank you for your answer.
How to store data of every sub-directories in matrix form so that i can manipulate some computation for stored data?
subfolder_results = cell(num_subfolder, 1);
for subfold_idx= 1 : num_subfolder
this_folder = fullfile( project_dir, mainDirContents(subfold_idx).name );
fprintf('Now working with folder "%s"\n', this_folder );
at2Contents = dir( fullfile(this_folder, '*.AT2') );
num_at2 = length(at2Contents);
each_file_results = cell(num_at2, 1);
for at2idx = 1 : num_at2
this_at2 = fullfile( this_folder, at2Contents(at2idx).name );
now do something with the file whose complete name is given by this_at2
each_file_results{at2idx} = the results to be saved;
end
subfolder_results{subfold_idx} = each_file_results;
end
The output would be a cell array, with one element per subfolder. The cell elements would each be cell arrays with one element per file in the subfolder (the number of files per subfolder is not necessarily the same.) The results for the file would be whatever you assigned. You might want to keep track of the file name as part of the results.
In the case that you are sure that the results for one particular subfolder are all going to be numeric and the same size, you might want to do something like,
nd = ndims(each_file_results{1});
subfolder_results_as_array = cat(nd+1, each_file_results{:});
and then store that into subfolder_results{subfold_idx} instead of storing each_file_results there.
At the end, if you were sure that all of the subfolders had the same number of files and that the output for each file was numeric and exactly the same size, you could use the same kind of technique to produce an overall numeric array where the second last index was the file number within the subfolder and the last index was the subfolder number.
nd = ndims(subfolder_results{1});
all_results_as_array = cat(nd+1, subfolder_results{:});
I appreciate your answer. I have next interesting concept to do.... Like I have 22 sub-directories and we have extracted *.AT2 file of each sub directory. Now I have a excel-sheet that have 22 scale factor (scalar quantity) for 22 sub directories to be multiplied for each directory .AT2 file (2 AT2 file) by particular scale factor from excel-sheet and want to store data (you have given solution to this). Is there any idea we can do this?
Which scale factor corresponds to which sub-directory? You have the same number, but what links the order?
Are your subdirectories numbered 1 to 22? Or do they have the names given? Those names include slashes, and slashes mark directory delimiters, so you cannot be directly using those names.
num = xlsread('NormalizationFactors.xls');
norm_factors = num(:,end);
subfolder_results = cell(num_subfolder, 1);
for subfold_idx = 1 : num_subfolder
subfolder_name = mainDirContents(subfold_idx).name;
this_folder = fullfile( project_dir, subfolder_name );
fprintf('Now working with folder "%s"\n', this_folder );
folder_id = sscanf(subfolder_name, '%d', 1);
this_norm_value = norm_factors(folder_id);
at2Contents = dir( fullfile(this_folder, '*.AT2') );
num_at2 = length(at2Contents);
each_file_results = cell(num_at2, 1);
for at2idx = 1 : num_at2
this_at2 = fullfile( this_folder, at2Contents(at2idx).name );
at2_content = ReadAT2File( this_at2 ); %might as well put it inside a function
each_file_results{at2idx} = this_norm_value * at2_content;
end
subfolder_results{subfold_idx} = each_file_results;
end

Sign in to comment.

More Answers (1)

Geoff Hayes
Geoff Hayes on 3 Jan 2017
sam - use dir to get an array of the sub-directories (from the main directory). Then iterate over each element in this array using isdir to ensure that it is a directory. If so, then again use dir on this sub-directory to get an array of files within this sub-directory. Iterate over this array, and perform whatever manipulation that you need for the file depending upon its type. See Data Import and Export for details.

6 Comments

sam moor
sam moor on 3 Jan 2017
Edited: sam moor on 3 Jan 2017
I could not get your answer. I want to read all these sub directories and for each sub directory there are two files I want to read. For reference, I have attached the image.
What have you tried? From the main directory, use dir to grab the list of directories and files. Something like
mainDirContents = dir;
Now iterate over each element in this array and check to see if it is a directory
for k=1:length(mainDirContents)
if mainDirContents(k).isdir
fprintf('dir name: %s\n',mainDirContents(k).name);
end
end
Depending upon your OS, you may have a couple of invalid directories that you will need to avoid. On my Mac, I would need to do
for k=1:length(mainDirContents)
if mainDirContents(k).isdir && ~strcmpi(mainDirContents(k).name,'.') && ~strcmpi(mainDirContents(k).name,'..')
fprintf('dir name: %s\n',mainDirContents(k).name);
end
end
Now, you mention in your question that you are looking for four files but in your comment you are looking for two files. Which is it? Looking at your other question, there seems to be a bunch of files of which you are only interested in RSN953_NORTHR_MUL009.AT2 and RSN953_NORTHR_MUL279.AT2. So create the path to these files and do something with them
for k=1:length(mainDirContents)
if mainDirContents(k).isdir && ~strcmpi(mainDirContents(k).name,'.') && ~strcmpi(mainDirContents(k).name,'..')
% get the path to the first file
pathToFileA = fullfile(pwd,mainDirContents(k).name,'RSN953_NORTHR_MUL009.AT2');
fprintf('path to file A %s\n',pathToFileA);
% get the path to the second file
pathToFileB = fullfile(pwd,mainDirContents(k).name,'RSN953_NORTHR_MUL279.AT2');
fprintf('path to file B %s\n',pathToFileB);
end
end
What you want to do with the file is up to you...
the filename in every sub directory is different. I am getting same file name in every sub directory with your code
if mainDirContents(k).isdir && ~strcmpi(mainDirContents(k).name,'.') && ~strcmpi(mainDirContents(k).name,'..')
can be simplified to
if ~ismember(mainDirContents(k).name, {'.', '..'})
This code is valid for all operating systems that MATLAB is currently supported on (it might not work if you are still stuck using the VMS version of MATLAB, but it would work for every other operating system that MATLAB has ever be implemented on.)
Hi Geoff, let say If I only want to import data file having extension .AT2 from each sub folders and run a loop in every subfolders. I used fullfile but gives me error
sam - as Walter has shown below, use a filter with dir to find all those files that have an extension of AT2.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!