You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Matlab slowing down while reading netcdf (in for loop)
11 views (last 30 days)
Show older comments
This seems to be a problem that has cropped up once or twice on this forum but I am not sure I saw a satisfactory answer. I am reading many netcdf files in a for loop using ncread. With each successive iteration, matlab slows down considerably and is crawling by iteration 4 or 5. Within each iteration, I am doing the following:
(i) Read four 3d matrices that are not even huge (122x122x62)
(ii) Read two 2d matrices (122x122)
(iii) A few calculations
(iv) No figures
And that's it! Any help is appreciated.
PS: I will try reading using netcdf.getVar instead of ncread to see if that makes a difference.
20 Comments
Adam Danz
on 11 Feb 2020
Do the files differ in any way?
Are the arrays (no such thing as a 3D matrix) all the same size and class?
Are you storing variables within the loop?
Are you opening files without closing them (ie, fopen, which isn't required for ncread, but just checking)?
Can you point to some other threads in this forum on this topic ("This seems to be a problem that has cropped up once or twice on this forum") ?
Can you provide a minimal working example (attach a zip file with 4 or 5 netcdf files and the minimal code required to reproduce the problem)?
jessupj
on 11 Feb 2020
I've had this problem before too when reading local files. In addition to making sure that I was indexing the subsets correctly, closing the files when I was done reading them to memory solved it.
Stephen23
on 11 Feb 2020
oceanmod's "Answer" moved here:
Am pasting some of the code below (only relevant bits) but want to briefly respond to your questions first:
The files do not differ in any way, they are all output files from an ocean model at different timesteps.
The files have a ton of variables including scalar, 1d, 2d and 3d arrays. But I only need a small subset that I call within the code.
I am storing variables within the loop. More specifically, after reading in the fields, I have a nested FOR loop that does some calculations.
I am not using fopen.
I am attaching a profile report. Most of the time spent is for the ncread calls, it seems.
Here is an earlier thread but the thread is old (2011) and the responses appear to suggest that upgrading matlab should solve the problem.
https://www.mathworks.com/matlabcentral/answers/13803-matlab-slowing-down-while-reading-netcdf
I am unable to attach any netcdf files as the output is massive, each output file is 16G.
A few lines about the code below. It is not self-contained but I have added comments that hopefully convey the essential parts of the code.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
files=dir(strcat(outdir,caseid,year,'*nc'));
nfiles=max(size(files));
% 1st FOR loop begins here------------------------------------------------------------------
for ifile=1:nfiles;
pop_histpath=strcat(outdir,files(ifile).name)
time=ncread(pop_histpath,'time');
ntimes=max(size(time));
% 2nd FOR loop begins here------------------------------------------------------------------
for tid=1:ntimes;
counter=(count_file-1)*ntimes+tid;
if (counter ==1)
tlon=ncread(pop_histpath,'TLONG',start2d,count2d,stride2d);
tlat=ncread(pop_histpath,'TLAT', start2d,count2d,stride2d);
zt= ncread(pop_histpath,'z_t');
time=ncread(pop_histpath,'time');
ntimes=max(size(time));
end
%---READ IN NETCDF FIELDS--------------------------------------------------------------
pv = ncread(pop_histpath,'PV', start3d,count3d,stride3d); % Units in 1/cm/s
T = ncread(pop_histpath,'TEMP', start3d,count3d,stride3d);
S = ncread(pop_histpath,'SALT', start3d,count3d,stride3d);
rho = ncread(pop_histpath,'PD', start3d,count3d,stride3d);
%-----FOR loop for calculating/storing derived variables-----------------------------------
for i=1:ilen;
for j=1:jlen;
[Tinv(i,j),kinv]=get_Tinv(squeeze(T(i,j,:)));
Hinv(i,j)=zt(kinv);
PVinv(i,j)=pv(kinv);
end
end
pos=find(Hinv<=40);
Tinv_le40=Tinv(pos);
pos=find(Hinv>40);
Tinv_gt40=Tinv(pos);
try
Tinv_maxup(counter)=max(Tinv_le40);
catch
Tinv_maxup(counter)=NaN;
end
try
Tinv_maxlo(counter)=max(Tinv_gt40);
catch
Tinv_maxlo(counter)=NaN;
end
clear pv T S rho Tinv Hinv PVinv Tinv_maxup Tinv_maxlo
end % for tid loop
end % ifile loop
Stephen23
on 11 Feb 2020
Edited: Stephen23
on 11 Feb 2020
@oceanmod: your code is badly aligned. Poorly aligned code is one way tlhat users hide basic errors and bugs in their code. You should align your code consistently. I recommend using the MATLA editor's default alignment settings (you can also use them to align existing code: select the code text, press ctrl+i).
jessupj
on 11 Feb 2020
each of your 4 x NFILES x NTIMES ncread statements requires seeking to the right spot of a 16GB file, even if the subset you want is small. is your pop_histpath on your local disk or on remote file server?
oceanmod
on 11 Feb 2020
Apologies for the indenting. In my editor, the indenting is a lot better but while copying and pasting, some of it got messed up.
The file pop_histpath refers to is on the same machine as the code, not a remote file server.
I should add that I realize subsetting large netcdf files (e.g. climate model output) within matlab, repeatedly, might not be a smart thing to do. There are specific software (like NCO) that can do various kinds of subsetting with simple command-line statements. The catch is they work better if the data is organized on a regular grid. The output in my case is not because 'dx' and 'dy' (grid spacing in x and y directions) themselves vary with longitude and latitude, if that makes sense. I am still trying to figure out a way to subset using NCO and then calling only the subsetted files within matlab for whatever calculations I want to do. But until that happens, I will have to do with Matlab.
Adam Danz
on 11 Feb 2020
Edited: per isakson
on 31 May 2021
Here's the code properly aligned, otherwise it's just too difficult to read as Stephen pointed out.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
files=dir(strcat(outdir,caseid,year,'*nc'));
nfiles=max(size(files));
% 1st FOR loop begins here------------------------------------------------------------------
for ifile=1:nfiles
pop_histpath=strcat(outdir,files(ifile).name);
time=ncread(pop_histpath,'time');
ntimes=max(size(time));
% 2nd FOR loop begins here------------------------------------------------------------------
for tid=1:ntimes
counter=(count_file-1)*ntimes+tid;
if (counter ==1)
tlon=ncread(pop_histpath,'TLONG',start2d,count2d,stride2d);
tlat=ncread(pop_histpath,'TLAT', start2d,count2d,stride2d);
zt= ncread(pop_histpath,'z_t');
time=ncread(pop_histpath,'time');
ntimes=max(size(time));
end
%---READ IN NETCDF FIELDS--------------------------------------------------------------
pv = ncread(pop_histpath,'PV', start3d,count3d,stride3d); % Units in 1/cm/s
T = ncread(pop_histpath,'TEMP', start3d,count3d,stride3d);
S = ncread(pop_histpath,'SALT', start3d,count3d,stride3d);
rho = ncread(pop_histpath,'PD', start3d,count3d,stride3d);
%-----FOR loop for calculating/storing derived variables-----------------------------------
for i=1:ilen
for j=1:jlen
[Tinv(i,j),kinv]=get_Tinv(squeeze(T(i,j,:)));
Hinv(i,j)=zt(kinv);
PVinv(i,j)=pv(kinv);
end
end
pos=find(Hinv<=40);
Tinv_le40=Tinv(pos);
pos=find(Hinv>40);
Tinv_gt40=Tinv(pos);
try
Tinv_maxup(counter)=max(Tinv_le40);
catch
Tinv_maxup(counter)=NaN;
end
try
Tinv_maxlo(counter)=max(Tinv_gt40);
catch
Tinv_maxlo(counter)=NaN;
end
clear pv T S rho Tinv Hinv PVinv Tinv_maxup Tinv_maxlo
end % for tid loop
end % ifile loop
Adam Danz
on 11 Feb 2020
Edited: Adam Danz
on 11 Feb 2020
What are some typical values of ilen and jlen? I'm wondering if the Tinv, Hinv, PVinv arrays are growing very large. Also, I'm wondering if your full code is preallocating those arrays which would likely speed up the processing.
Have you tried stepping through the code in debug mode iteration by iteration (or even line by line) to get a sense which lines/sections are becoming slow?
oceanmod
on 11 Feb 2020
"Have you tried stepping through the code in debug mode iteration by iteration (or even line by line) to get a sense which lines/sections are becoming slow?"
Based on the profile output I attached (as a screenshot) in an earlier response, the ncread calls are taking up the most time.
jessupj
on 11 Feb 2020
Edited: jessupj
on 11 Feb 2020
Pre-allocating the arrays is almost always a good idea. You could also just 'X=0.*X' instead of clearing the arrays to keep memory for them allocated. I would try to put a tic/toc around the e.g. the vorticity ncread statement, and track see how much time it's taking to read the 16gb file and just how much it's slowing during each iteration. NCL definitely works better with huge files. In similar applications with smaller ~8GB files, I used NCL (within a matlab script, using system calls) to write smaller temporary netcdf files with ONLY the variables I needed as an intermediate step before ; it made spatiotemporal subsetting within matlab much faster.
Adam Danz
on 11 Feb 2020
Edited: Adam Danz
on 11 Feb 2020
That's not so big but you should definitely be pre-allocating those arrays.
If I were you I'd run the program in debug mode and step through sections until it starts to get slow. Then I'd step through single lines until I find the bottleneck. This is a deeper look than what the profiler report gives you. Unfortunately I can't debug this remotely.
oceanmod
on 11 Feb 2020
"If I were you I'd run the program in debug mode and step through section until it starts to get slow. Then I'd step through single lines until I find the bottleneck. Unfortunately I cann't debug this remotely. "
It appears the bottleneck is the ncread calls. I will try pre-allocating the arrays like some of you have suggested.
Adam Danz
on 11 Feb 2020
Edited: Adam Danz
on 11 Feb 2020
"It appears the bottleneck is the ncread calls."
Yes, I see that from the profiler report you shared which was helpful. You mentioned that the program gets slow after reading 4-5 files. If the files contain the same amount of data it doesn't make sense that ncread would take noticeably more time on some file than others. That's why I'm suggesting that there's something else that's slowing things down. It could be memory resources, for example.
If you find that there's a certain file that's slow, say file number 5, try running only that file and see if it's just as slow relative to the other files if it's the first file you run. If it is, I'd have doubt that the files are the same sizes etc. If it isn't as slow when it's the first file you run, that may point to memory resources, for example.
jessupj had a good idea to put a tic/toc at the top/bottom of the loop to measure the processing time between files.
oceanmod
on 11 Feb 2020
Thanks for the detailed response!
In addition to pre-allocating, I will also add tic/toc to check for file-specific differences.
Dardag
on 19 May 2020
Hi there,
Were you able to find a solution to this problem?
I am also trying to read 3D arrays ( netcdf file sizes are around 20GB) and the code slows down eventually, I am also using start, count and stride options (although I read that some people claiming that the stride option might be slowing the code down).
The odd part is, if I restart the code say after 20 steps (files), those 20 files are being read super fast. Which made me think of some sort of a buffer issue but I wasn't able to resolve or address that issue. I am already preallocating arrays.
Suggestions on the following link (clear mex, clear functions, close all force) didn't help:
https://www.mathworks.com/matlabcentral/answers/13803-matlab-slowing-down-while-reading-netcdf
oceanmod
on 19 May 2020
Hi Dardag,
In my case I ended up figuring out how to use NCO to slice the big file into smaller chunks that only covered the region I was interested in (e.g., Pacific Ocean). I was wrongly thinking that NCO does not let me extract me smaller chunks if the data is on an irregular horizontal grid with varying grid spacing. But that is not true. So, I never found the reason why the code was crawling when it was reading in the entire (global) dataset.
I realize this might not be possible in your case as you might actually need the entire 20gb of data that your code is reading in. Sorry if this isn't of much help.
Dardag
on 19 May 2020
I was using NCO before, perhaps I should return to just using that.
I work with curvilinear grid and ncks works for that like you said.
Themistoklis Chronis
on 30 May 2021
hey guys did you find a solution to this? I have an identical problem. MATLAB starts crawling after opening a couple of hundred netcdf files...
Answers (0)
See Also
Categories
Find more on NetCDF in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)