Sorting problem with months in file names, while taking average

When I read the file names from a selected directory, the names are coming out to be like the following:
I tried
folder_path=uigetdir;
cd(folder_path)
flist=dir('*.dat');
flist.name gives
A01-Apr-2014.dat
A01-Apr-2015.dat
A01-Apr-2016.dat
A01-Aug-2014.dat
A01-Aug-2015.dat
A01-Aug-2016.dat
A01-Dec-2014.dat
A01-Dec-2015.dat
A01-Dec-2016.dat
....
Is there a way in MATLAB to read and access the file names in the actual month order?
E.g:
A01-Jan-2014.dat
A01-Feb-2014.dat
A01-Mar-2014.dat
A01-Apr-2014.dat
...
A01-Jan-2015.dat
A01-Feb-2015.dat
A01-Mar-2015.dat
....
Then, taking hourly averages from each month and writing the entire average values in a single file would be easy and flawless.
Thank you.

 Accepted Answer

Two solutions:
1) Use ISO 8601 dates in the filenames, then they can be sorted using a trivial character sort.
2) Convert those dates to date numbers, sort them, and then apply those indices to the filenames. Like this:
C = {flist.name};
V = datenum(C,'Add-mmm-yyyy.dat');
[~,idx] = sort(V); % sort the date numbers.
C = C(idx); %sort the filenames into date order.
for k = 1:numel(C)
C{k} % filename
...
end

13 Comments

Hi Stephen,
Thank you for your response.
However, I get the following error:
Error using datenum (line 178)
DATENUM failed.
Error in hourly_mean (line 9)
V = datenum(C,'dd-mmm-yyyy.dat');
Caused by:
Error using dtstr2dtnummx
Failed on converting date string to date number.
@Venkata: look at the date format string that you used in your code. Now look at the date format string that I used in my answer: are they the same? You need to include the literal characters (e.g. like the leading 'A' at the start, like I used).
Another option would be to extract the date substrings, e.g. using a regular expression:
D = regexpi(C,'\d{2}-[A-Z]{3}-\d{4}','match','once');
V = datenum(D,'dd-mmm-yyyy');
@Stephen: I am sorry, I used the exact lines you suggested. I was experimenting and removed 'A'. I pasted that modification by mistake.
However, both didn't work.
Error using datenum (line 106)
The input to DATENUM was not an array of strings.
@Venkata: that is a completely different error. Clearly your input to datenum is not suitable. You will have to show the code that you used, because it is difficult to guess what bug/s you have in your code.
What version of MATLAB are you using?
@Stephen: Thank you for your response.
I basically want to take hourly averages of a single day and do the same for couple of years and finally I want to write all those averages onto a single file.
Given the format of my input file names, the 'dir' command is not helpful for my case.
I mentioned the commands I am using in the question.
The
length(flist)
tells me the number of files until which I need to run the loop. But the averages will have no meaning if I am taking April month first and October last and writing all those averages in a single file sequentially.
I am stuck at the starting point itself before entering into the full code.
Here I needed help.
This is how the data in a file look like: (12 columns)
01 04 2014 00 00 00 41.8603336 -57.2303136 6853977.84 9357.64375169823 -10173.3276724836 -28518.7725948112
01 04 2014 00 00 01 41.866182 -57.16687 6853976.07 9358.75585824525 -10166.9937661957 -28500.127702996
01 04 2014 00 00 02 41.8719959 -57.1034255 6853974.29 9359.91156592362 -10160.687210838 -28481.5237440649
01 04 2014 00 00 03 41.8777758 -57.0399801 6853972.51 9361.00012212707 -10154.3361307134 -28462.929827532
01 04 2014 00 00 04 41.8835217 -56.9765337 6853970.71 9362.13378864187 -10147.6519871217 -28444.4854711224
I want to take hourly averages from column:7 through column:12.
The output file should look like:
01 01 2014 01 avg1 avg2 avg3 ...
01 01 2014 02 avg1 avg2 avg3 ...
... ...
01 02 2014 01 avg1 avg2 avg3 ...
01 02 2014 02 avg1 avg2 avg3 ...
... ...
When I use the current output of
flist.name
I will have:
01 04 2014 01 avg1 avg2 avg3 ...
01 04 2014 02 avg1 avg2 avg3 ...
... ...
01 08 2014 01 avg1 avg2 avg3 ...
01 08 2014 02 avg1 avg2 avg3 ...
... ...
I don't want this.
I am using version 2016a.
@Venkata: thank you for your explanation, which was interesting but did not actually give either of the two pieces of information that I asked for in my last comment.
Please:
  • upload this cell array in a .mat file: C = {flist.name}
  • tell us exactly what MATLAB version you are using.
@Stephen: Please find attached the .mat file (C.mat) with the file names.
I am using
>> version
ans =
9.0.0.341360 (R2016a)
Thank you.
@Venkata: both of the methods I have given you worked with your data:
>> load C.mat
>> D = regexpi(C,'\d{2}-[A-Z]{3}-\d{4}','match','once');
>> V = datenum(D,'dd-mmm-yyyy');
>> [~,idx] = sort(V);
or with just datenum and literal characters:
>> V = datenum(C,'Add-mmm-yyyy.dat');
>> [~,idx] = sort(V);
You then just need to use idx to sort the cell array C and/or the structure flist.
@Stephen: Yes, the first solution worked but somehow the second one is not. Nevertheless, it doesn't matter. I have one solution now.
Thank you so much for your time and concern.
One query:
To get all the filenames with a given extension I am using the following procedure.
folder_path=uigetdir;
cd(folder_path)
flist=dir('*.dat');
Is there a better way of doing it?
The input data and the code are in different directories. Every time I run the code I am asked to change the path.
Thank you.
"Is there a better way of doing it?"
Yes: avoid using cd, which slows down your code and makes debugging harder. It is more efficient and more robust to use absolute filenames:
D = uigetdir();
S = dir(fullfile(D,'*.dat'));
You will also have to use fullfile inside the loop where you read the filenames (using the sorted names in the cell array C):
for k = 1:numel(S)
F = fullfile(D,C{k});
...
end
@Stephen: That really helped.
One final query regarding this task:
At the end, I will have a bunch of cell arrays as:
array_data=
[24x10 double]
[24x10 double]
[24x10 double]
[24x10 double]
[24x10 double]
To write them onto a text file I am doing:
array_data=cell2mat(array_data)
dlmwrite('outfile.dat',array_data,'delimiter',' ','precision','%10.6f')
The
dlmwrite
is placing the decimals to the integers too. I understood that, that's what I asked the function but can I avoid it and apply precision only to the real numbers?
Thank you.
@Venkata: here are two solutions:
1) use format '%g', e.g. '%10.5g', or something similar.
2) write the file using fprintf and define your own format string.
Read the fprintf help to know how to define dlmwrite's precision format string, and of course the fprintf format string.
@Stephen: Thanks a ton for your help and time.
Sure, I did mark it as accepted answer and voted too :).
I make sure that, I properly acknowledge and respect the people who helped me in my tough time, especially.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 16 Oct 2018

Commented:

on 17 Oct 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!