Selective columns from multiple text folders

1 view (last 30 days)
Hey guys,
I would really appreciate your help here.
I have a dataset containing 400 folders whit one text file inside. the text file has 13 columns and I want one of them! at the end, I want a text file with 400 columns. so far, I have it for one folder (by help of importer!) but I dont know how to implement the loop. folders and the text files (same name) have this order:
m0000000
m0000001
m0000002
...
m0000399
Here is the code so far ...
filename = 'D:\work\1ST EXP\4.2\m0000000\m0000000.TXT';
delimiter = '\t';
formatSpec = '%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
fclose(fileID);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
rawData = dataArray{1};
for row=1:size(rawData, 1);
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
invalidThousandsSeparator = false;
if any(numbers==',');
thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
end
end
if ~invalidThousandsSeparator;
numbers = textscan(strrep(numbers, ',', ''), '%f');
numericData(row, 1) = numbers{1};
raw{row, 1} = numbers{1};
end
catch me
end
end
%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%% Allocate imported array to column variable names
Rho = cell2mat(raw(:, 1));
%clearvars filename delimiter formatSpec fileID dataArray ans raw col numericData rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me R;
diary merged.txt
Rho
  2 Comments
Afshin sadeghi
Afshin sadeghi on 10 Mar 2021
Thanks alot for answering.
I need the Rho columns.
Cheers.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 11 Mar 2021
Edited: Stephen23 on 11 Mar 2021
Here is one simple and efficient approach (untested, but should get you started):
P = 'D:\work\1ST EXP\4.2';
V = 0:399;
N = numel(V);
C = cell(1,N);
for k = 1:N
F = sprintf('m%07d',V(k));
F = fullfile(P,F,sprintf('%s.txt',F));
T = readtable(F,'VariableNamingRule','preserve');
C{k} = T.Rho;
end
M = [C{:}] % only if all files have the same number of rows.
You can then save matrix M, e.g.:
writematrix(M,'myfile.txt')
Just out of interests sake, why do you need that complex string parsing and thousand's separator handling? The sample file does not include any thousands separators that I can see:
T = readtable('m0000000.txt','VariableNamingRule','preserve')
T = 465x17 table
C1(xm) C1(ym) C1(zm) C2(xm) C2(ym) C2(zm) P1(xm) P1(ym) P1(zm) P2(xm) P2(ym) P2(zm) Rho I U D Time ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ _______ ____ ___________________ 0 0 0 1 0 0 2 0 0 26 0 0 -0.3 147.22 -44.88 7.63 01/21/2021 01:00:03 0 0 0 1 0 0 3 0 0 27 0 0 0.84 146.98 123.9 2.76 01/21/2021 01:00:06 0 0 0 1 0 0 4 0 0 28 0 0 -4.34 147.13 -638.77 0.39 01/21/2021 01:00:08 0 0 0 1 0 0 5 0 0 29 0 0 1.68 147.17 246.72 1.32 01/21/2021 01:00:10 0 0 0 1 0 0 2 0 0 3 0 0 0.03 147.17 4.28 5.18 01/21/2021 01:00:11 0 0 0 1 0 0 3 0 0 4 0 0 -0.81 147.19 -119.69 0.41 01/21/2021 01:00:13 0 0 0 1 0 0 4 0 0 5 0 0 1.22 147.2 179.94 0.52 01/21/2021 01:00:15 0 0 0 1 0 0 2 0 0 27 0 0 0.87 147.19 128.65 2.57 01/21/2021 01:00:16 0 0 0 1 0 0 3 0 0 28 0 0 -5.16 147.07 -758.41 0.39 01/21/2021 01:00:18 0 0 0 1 0 0 4 0 0 29 0 0 2.9 147.15 427.2 0.58 01/21/2021 01:00:20 0 0 0 1 0 0 2 0 0 6 0 0 -24.59 147.12 -3617.8 0.1 01/21/2021 01:00:21 0 0 0 1 0 0 3 0 0 7 0 0 -24.44 146.97 -3592.2 0.11 01/21/2021 01:00:23 0 0 0 1 0 0 4 0 0 8 0 0 -24.26 146.99 -3565.7 0.09 01/21/2021 01:00:25 0 0 0 1 0 0 5 0 0 9 0 0 -24.91 146.98 -3661.2 0.11 01/21/2021 01:00:27 0 0 0 1 0 0 2 0 0 30 0 0 -27.07 146.97 -3978.1 0.14 01/21/2021 01:00:28 0 0 0 1 0 0 3 0 0 31 0 0 -22.67 146.99 -3331.9 0.17 01/21/2021 01:00:30
  6 Comments

Sign in to comment.

More Answers (1)

ANKUR KUMAR
ANKUR KUMAR on 10 Mar 2021
Edited: ANKUR KUMAR on 10 Mar 2021
Since you have not attached a text file, I just put a sample text file having multiple rows in multiple folders, and use the below code to have a matrix containing the first columns from all files.
files = dir('D:\matlab_ask')
dirFlags = [files.isdir];
subFolders = files(dirFlags);
for i =3:length(subFolders)
cd(subFolders(i).name)
filename = 'test.txt';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, '%f%*s%[^\n\r]', 'Delimiter', ',');
merged_matrix(:,i-2) = [dataArray{1:end-1}];
cd ..
end
merged_matrix
  2 Comments
ANKUR KUMAR
ANKUR KUMAR on 12 Mar 2021
I am still at learning stage. Thanks for your suggestions. I appreciate it.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!