Searching through files for missing data
6 views (last 30 days)
Show older comments
Hi,
I have a set of 8000 files in the format of
YYYYMMDDHHMMSS
year/month/day/hour/minute/second
The files should increase by 5 minutes each time and I need to write a function that would check the file names are named in a logical way. And if any files are missing it can identify this for me.
The function strcmp and join have been reccomended to me.
Does anyone know how to do this?
0 Comments
Accepted Answer
KSSV
on 4 Feb 2021
You may follow something like this:
files = dir('*txt') ; % give your extension
% Create a datetime vector for the files present with the names mentioned
[P,N,E] = cellfun(@fileparts,f,'UniformOutput',0) ;
t = datetime(N,'InputFormat','yyyyMMddHHmmSS') ; % this is datetime for the files present
%% Create 5 mins possible datetime vector
file1 = files(1).name ;
[path, name1, extension] = fileparts(file1) ;
t0 = datetime(name1,'InputFormat','yyyyMMddHHmmSS') ;
file2 = files(end).name ;
[path, name2, extension] = fileparts(file2) ;
t1 = datetime(name2,'InputFormat','yyyyMMddHHmmSS') ;
% Make datetime arrray
t0 = t0:minutes(5):t1 ; % this is used for comparison
% Get the indices which are present
idx = ismember(t0,t1) ;
% Dates which donot exist
t0(~idx)
7 Comments
Adam Danz
on 5 Feb 2021
That's because this solution missing step 2 in my modifed list of 4 clear steps as I mentioned in a comment under this answer as well.
More Answers (1)
Adam Danz
on 4 Feb 2021
>Does anyone know how to do this?
Lots of people know how to do this and we're here to help but few people will devote a portion of their day to do it for you.
Let's start by figuring out where you're stuck. There are just a few basic steps in your process and you can find lots of information in this forum, on the web, and in the documentation for each step.
- Get a list of files. See dir()
- Read in the file. There are lots of ways to read files depending on the filetype and content (review).
- Are your time stamps in datetime format? If not convert them to datetime.
- If all you want to do is check whether a file is missing, you just need to store the following 3 data points for each file as 2 separate variables. This will be done in your loop: The first and last datetime value can be stored in an nx2 matrix for n files and the filename stored as an nx1 string array.
- Once all files are read and the 3 data points are stored for each file, you can sort the datetime values in case the files are read out of order and then compare the first datetime of file n with the last datetime from file n-1. If that difference is more than 5 minutes, you know you're missing a file and you can use the filename array to help identify which file is missing.
If you get stuck on any step leave a comment below and show us where you're at with the code and what the problem is.
9 Comments
Adam Danz
on 11 Feb 2021
Looks like you're making progress.
1. In this line, you could add the file extension if it's the same for all files. That should list all of the files you need, assuming they are all in the same folder.
files = dir('C:\Users\drb17135\Documents\August_Radar\*.*')
files = dir('C:\Users\drb17135\Documents\August_Radar\*.hdf') % change to this
2. This is where things go wrong. "tmp" should be the "files" variable above. You don't need this line. Replace tmp with "files".
tmp=dir;
3. Instead of these 3 lines,
files = dir('C:\Users\drb17135\Documents\August_Radar\*.*')
file = myfile(5:5); %isolates the file in terms of 'yyyyMMddHHmmSS'
datetime(file,'InputFormat','yyyyMMddHHmmSS'); %gives name in format of '10-Sep-2020 04:00:00' - for example
Use these two, based on example: 'T_PAAH72_C_EIDB_19991231134501.hdf' (YYYYMMDDHHmmSS)
% >>files(idx).name
% ans =
% 'T_PAAH72_C_EIDB_19991231134501.hdf'
[~, timestamp] = regexp(files(idx).name, '([0-9]*).hdf','match','once','tokens');
% Which returns
% timestamp =
% {'19991231134501'}
timestampDT = datetime(timestamp{1},'InputFormat','yyyyMMddHHmmss')
% Which returns
% timestampDT =
% datetime
% 31-Dec-1999 13:45:01
4. Instead of assuming you have 8000 files, use the actual number of files identified to define the loop.
for idx = [1:8000] % Not this
for idx = 1:numel(files) % use this, where "files" is defined in my step #1 above.
5. Lastly, you need to store the datetime stamps in the loop so the loop should be structured like this (using variable names above).
timestampDT = nat(numel(files),1); % preallocate the loop variable
for idx = 1:numel(files)
% < PUT YOUR OTHER STUFF HERE >
% store all datetimes from the file names
timestampDT(idx) = datetime(timestamp{1},'InputFormat','yyyyMMddHHmmss');
end
Then you can differentiat
dt = diff(timestampDT)
See Also
Categories
Find more on Whos in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!