How to read text files from each subfolder

12 views (last 30 days)
Hi,
I have a main folder which contains several sub folders, now I want to read text files from each subfolder, save the data into ".xlsx" of each subfolders data by its subfolder name. For example read data from subfolder1 and save the data as "subfolder1.xlsx", and subfolder2 data as "subfolder2.xlsx".
Read text files, and extract the data as mentioned below:
1. Above the first dotted (-----------) line, extract the information: RainFallID, IINT, Rain Result, Start Time
2. Between two dotted lines(-------), the first column and 3rd column, in 3rd column if the data is mixed only keep the first part (for example 0.67 mm--> 0.67, 60.67+34e %-->60.67+34e, and if it is text like "False End"-->False End).
Please help some one kindly,

Accepted Answer

Cedric Wannaz
Cedric Wannaz on 9 Sep 2017
Edited: Cedric Wannaz on 9 Sep 2017
Try something along this line:
% - Define output header.
header = {'RainFallID', 'IINT', 'Rain Result', 'Start Time', 'Param1.pipe', ...
'10 Un Para2.pipe', 'Verti 2 mixing.dis', 'Rate.alarm times'} ;
nHeaderCols = numel( header ) ;
% - Build listing sub-folders of main folder.
D_main = dir( 'Mainfolder' ) ;
D_main = D_main(3:end) ; % Eliminate "." and ".."
% - Iterate through sub-folders and process.
for dId = 1 : numel( D_main )
% - Build listing files of sub-folder.
D_sub = dir( fullfile( 'Mainfolder', D_main(dId).name, '*.txt' )) ;
nFiles = numel( D_sub ) ;
% - Prealloc output cell array.
data = cell( nFiles, nHeaderCols ) ;
% - Iterate through files and process.
for fId = 1 : nFiles
% - Read input text file.
inLocator = fullfile( 'Mainfolder', D_main(dId).name, D_sub(fId).name ) ;
content = fileread( inLocator ) ;
% - Extract relevant data.
rainfallId = str2double( regexp( content, '(?<=RainFallID\s+:\s*)\d+', 'match', 'once' )) ;
iint = regexp( content, '(?<=IINT\s+:\s*)\S+', 'match', 'once' ) ;
rainResult = regexp( content, '(?<=Rain Result\s+:\s*)\S+', 'match', 'once' ) ;
startTime = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?= -)', 'match', 'once' )) ;
param1Pipe = str2double( regexp( content, '(?<=Param1.pipe\s+[\d\.]+\s+\w+\s+)[\d\.]+', 'match', 'once' )) ;
tenUn = str2double( regexp( content, '(?<=10 Un Para2.pipe\s+[\d\.]+\s+\w+\s+)[\d\.]+', 'match', 'once' )) ;
verti2 = regexp( content, '(?<=Verti 2 mixing.dis\s+\S+\s%\s+)\S+', 'match', 'once' ) ;
rateAlarm = strtrim( regexp( content, '(?<=Rate.alarm times\s+\S+\s+)[^\r\n]+', 'match', 'once' )) ;
% - Populate data cell array.
data(fId,:) = {rainfallId, iint, rainResult, startTime, ...
param1Pipe, tenUn, verti2, rateAlarm} ;
end
% - Output to XLSX.
outLocator = fullfile( 'OutputFolder', sprintf( '%s.xlsx', D_main(dId).name )) ;
fprintf( 'Output XLSX: %s ..\n', outLocator ) ;
xlswrite( outLocator, [header; data] ) ;
end
Note that if you have a recent version of MATLAB, you can use the `folder` field of the struct outputed by DIR, and simplify most FULLFILE calls.
EDIT 4:09pm
Just a few extra comments. While it may look complicated, you should be fine with most of the code here. The general approach is
Iterate through sub folders of 'Mainfolder'
Iterate through files of sub folder
Extract data from file and store in data array
Export data array to relevant Excel file
The part that will likely be the most complex for you is the data extraction. One quick option for this is pattern matching using regular expressions. You can see a series of calls to REGEXP:
.. = regexp( content, pattern, option1, option2, .. )
This extract from content a string that matches the pattern. When you need to export a number we convert it to double using STR2DOUBLE. When it may capture extra white spaces we trim it using STRTRIM.
Regular expressions are a big topic, so it is normal if you don't really understand the patterns. In short,
aAb,123 etc : are literals; they are simply matched and they don't have
any special meaning
\s, \S, \d : match a single white-space, non white-space, numeric digit respectively
*, + : mean zeros or more, and one or more respectively times the pattern that precedes
\d+ hence means one or more numeric digit
[..], [^..] : defines a set of characters (-sets) to match or not to match respectively
[\d\s]+ hence means one or more element of either \d or \s
(?<=..) : defines a look behind
(?<=hello )world matches 'world' when it is preceded by 'hello '
(?=..) : defines a look forward
hello(?= world) matches 'hello' when it is followed by ' world'
. : matches any character. To match a the character '.', it has to be escaped with \
.[\d\.]+ matches any character followed by one or more characters that
are either a numeric digit or a '.'
Given this information, you can understand the pattern for extracting e.g. the value of IINT:
'(?<=IINT\s+:\s*)\S+'
which is, match
(?<=..)\S+ : one or more non white-space preceded by something
and the something is
IINT\s+:\s* : the literal 'IINT' followed by one or more white-spaces,
followed by the literal ':', followed by zero or more white-spaces
Cheers,
Cedric
  22 Comments
Cedric Wannaz
Cedric Wannaz on 28 Sep 2017
Awesome, congratulations for what you have learned in the process!

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!