Auto Detect different file types?

2 views (last 30 days)
Stuart Nezlek
Stuart Nezlek on 21 Jan 2022
Edited: Stuart Nezlek on 2 Feb 2022
Hello,
I am trying to edit a program so that it is capable of auto detecting different text files. Currently, I am using two different pograms to open and report the seperate text files using the following bits of code:
Program 1:
filespec=[fpath char(fnameALL(2))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDdata, ~]= readtext(filespec, delimiter, comment, quotes, options);
pVelocity = TDdata(6,62)*100; pCadence = (TDdata(6,21)+TDdata(6,48))/2;
pStride = TDdata(6,65)*100; pStepWidth = TDdata(6,68)*100;
pGSR=(pCadence/60)/(pVelocity/100);
pRTO = (TDdata(6,39)/TDdata(6,33))*100; pLTO = (TDdata(6,12)/TDdata(6,9))*100;
pRSS = (TDdata(6,30)/TDdata(6,33))*100; pLSS = (TDdata(6,57)/TDdata(6,9))*100;
pRSTEP = TDdata(6,42)*100; pLSTEP = TDdata(6,15)*100;
pROTO = (TDdata(6,36)/TDdata(6,33))*100; pLOTO = (TDdata(6,60)/TDdata(6,9))*100;
ToeOff = [pRTO pLTO];
filespec=[fpath char(fnameALL(3))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDalldata, result]= readtext(filespec, delimiter, comment, quotes, options);
Num_trialstd=(length(TDalldata(1,:))-1)/68;
Program 2:
[fname fpath]=uigetfile('*.txt','Please select the _td file');
conditionid=input('Enter the condition (no spaces): ','s');
cd(fpath);
[dataALL,results]=readtext(fname,';','','','numeric');
[row, col]=find(dataALL(:,3)>0);
data=dataALL(row:length(dataALL),:);
What I am wondering is if there is a function I am unaware of that would automatically be able to distinguish the differences between text files?
If what i'm asking is unclear, I can provide clarification.
Thank you.
  2 Comments
Stuart Nezlek
Stuart Nezlek on 24 Jan 2022
In theory, yes that's what I am attempting to do. I've never used the try, catch in Matlab before so I am wondering that since I would try to open file 1 with the 1st read would it automatically try and ready file 2 in the same manner? I've attached the two different text files I'm working with as an example.
My idea so far was to use the different endings of the files to differentiate the two and do this for however many files I specify via input. Does that make sense?

Sign in to comment.

Accepted Answer

Voss
Voss on 24 Jan 2022
Since the only difference between the way the two file types are read with readtext() is the delimiter, you can try different delimiters until you find one that works. With those two files you posted, I found that readtext() returns all NaNs if you use the wrong delimiter, so I'm using that as the condition that determines whether the file was read correctly or not. (If you have any file that returns all NaNs which needs to be considered valid, then you'd have to use a different condition.)
The following code loops over a set of files and for each file tries readtext() with each different delimiter (in this case just '[\t]' and ';' but the code will work for any number of delimiters) until one gives something that's not all NaNs. Then, for the next file, the delimiter that worked is tried first.
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:numel(my_files)
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
continue
end
if delimiter_idx == 1
% do file type 1 stuff
else
% do file type 2 stuff
end
end
preparing to read file 111111111d_test_HT_td.txt:
trying readtext() with delimiter '[\t]' ...
success
preparing to read file 111111111e_ss.txt:
trying readtext() with delimiter '[\t]' ...
failed
trying readtext() with delimiter ';' ...
success
  6 Comments
Voss
Voss on 1 Feb 2022
When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?
If that's the case, and now you want to know how to figure out from the contents of each cell whether it was a type 1 or type 2 file, then you'd have to be able to distinguish between the two file types based on what comes from readtext() for each file type. readtext() returns a matrix, so you'd have to know something about the size of possible matrices returned by readtext() in each case or the possible locations of the NaN's in the matrix, etc. I have no idea about the range of possiblities for what those files could possibly contain, so I wouldn't be able to put any conditions on the matrices from readtext() in order to distinguish one type from another. But you may know more about what the possibilities are for those file types and hence what the matrices from readtext should look like, so you may be able to come up with some condition to distinguish the two types.
However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above:
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
N = numel(my_files);
file_type = zeros(1,N);
file_data = cell(1,N);
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:N
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
file_type(i) = delimiter_idx;
file_data{i} = data;
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
end
end
Then you could run through your subsequent operations with the data from the files like this:
for i = 1:N
if file_type(i) == 1
% do file type 1 stuff with file_data{i}
elseif file_type(i) == 2
% do file type 2 stuff with file_data{i}
end
end
I'm not sure if that answers your question. If not, let me know.

Sign in to comment.

More Answers (1)

Stuart Nezlek
Stuart Nezlek on 2 Feb 2022
Edited: Stuart Nezlek on 2 Feb 2022
"When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?"
  • Correct. I've changed it so that I am given back a 1x(number of files read in with the two different types of delimiters)
"However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above"
  • I've implemented this small change, and I see where I was messing up after reading your example! I kept on getting the same error that the cell contents couldn't be read but that was because I was reading them and not having anywhere for the data to be stored after being read so the data was being overwritten for the length of the cell array. By assigning variables (like your example did), I have now been able to define the different data file types and Can continue to work the the data.
Again, thank you for pointing out my simple errors! You have helped me a lot.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!