Selective columns from multiple text folders
2 views (last 30 days)
Show older comments
Afshin sadeghi
on 10 Mar 2021
Commented: Afshin sadeghi
on 15 Mar 2021
Hey guys,
I would really appreciate your help here.
I have a dataset containing 400 folders whit one text file inside. the text file has 13 columns and I want one of them! at the end, I want a text file with 400 columns. so far, I have it for one folder (by help of importer!) but I dont know how to implement the loop. folders and the text files (same name) have this order:
m0000000
m0000001
m0000002
...
m0000399
Here is the code so far ...
filename = 'D:\work\1ST EXP\4.2\m0000000\m0000000.TXT';
delimiter = '\t';
formatSpec = '%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
fclose(fileID);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
rawData = dataArray{1};
for row=1:size(rawData, 1);
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
invalidThousandsSeparator = false;
if any(numbers==',');
thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
end
end
if ~invalidThousandsSeparator;
numbers = textscan(strrep(numbers, ',', ''), '%f');
numericData(row, 1) = numbers{1};
raw{row, 1} = numbers{1};
end
catch me
end
end
%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%% Allocate imported array to column variable names
Rho = cell2mat(raw(:, 1));
%clearvars filename delimiter formatSpec fileID dataArray ans raw col numericData rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me R;
diary merged.txt
Rho
Accepted Answer
Stephen23
on 11 Mar 2021
Edited: Stephen23
on 11 Mar 2021
Here is one simple and efficient approach (untested, but should get you started):
P = 'D:\work\1ST EXP\4.2';
V = 0:399;
N = numel(V);
C = cell(1,N);
for k = 1:N
F = sprintf('m%07d',V(k));
F = fullfile(P,F,sprintf('%s.txt',F));
T = readtable(F,'VariableNamingRule','preserve');
C{k} = T.Rho;
end
M = [C{:}] % only if all files have the same number of rows.
You can then save matrix M, e.g.:
writematrix(M,'myfile.txt')
Just out of interests sake, why do you need that complex string parsing and thousand's separator handling? The sample file does not include any thousands separators that I can see:
T = readtable('m0000000.txt','VariableNamingRule','preserve')
More Answers (1)
ANKUR KUMAR
on 10 Mar 2021
Edited: ANKUR KUMAR
on 10 Mar 2021
Since you have not attached a text file, I just put a sample text file having multiple rows in multiple folders, and use the below code to have a matrix containing the first columns from all files.
files = dir('D:\matlab_ask')
dirFlags = [files.isdir];
subFolders = files(dirFlags);
for i =3:length(subFolders)
cd(subFolders(i).name)
filename = 'test.txt';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, '%f%*s%[^\n\r]', 'Delimiter', ',');
merged_matrix(:,i-2) = [dataArray{1:end-1}];
cd ..
end
merged_matrix
2 Comments
ANKUR KUMAR
on 12 Mar 2021
I am still at learning stage. Thanks for your suggestions. I appreciate it.
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!