How to read multiple grb2 files on a webpage ?
2 views (last 30 days)
Show older comments
Dear all,
I like to download all the grb2 files from the following website:
The file link can be identified, for instance, the first one is:
Is there a way to automatically update the link to download these files?
Thanks for the help.
2 Comments
Rik
on 14 Jun 2021
What have you tried so far? You can probably either use a regex, or use strfind to find all pairs of http and .grb2.
Answers (1)
Chetan
on 30 Apr 2024
Hi @shukui liu
It appears you are seeking an automated method to detect and download `.grb2` files from a specified URL, without the need to manually list the file names. You've previously attempted this with an FTP server and are now interested in utilizing web server functionalities.
To accomplish this, MATLAB's web functionalities can be leveraged to read webpage content, extract URLs for `.grb2` files using regular expressions as suggested by Rik, and then download these files. Here is how you can proceed:
- Read Webpage Content: Utilize `webread` to fetch the HTML content of the page listing the `.grb2` files.
- Extract URLs with Regular Expressions: Employ MATLAB's `regexp` function to identify all occurrences of `.grb2` file URLs or paths within the webpage content.
- Download Files: Iterate through the extracted URLs or file paths and use `websave` to download each file.
Below is an example script illustrating this process:
% URL of the page listing the .grb2 files
pageUrl = 'https://polar.ncep.noaa.gov/waves/hindcasts/multi_1/200502/gribs/';
% Directory to save the downloaded files
dataFolder = 'testing';
if ~exist(dataFolder, 'dir')
mkdir(dataFolder);
end
% Read the webpage content
pageContent = webread(pageUrl);
% Regular expression to match .grb2 file links
% Adjust the regex pattern if the webpage structure changes
pattern = 'href="([^"]+\\.grb2)"';
% Find all matches
fileLinks = regexp(pageContent, pattern, 'tokens');
% Flatten the cell array if necessary
fileLinks = [fileLinks{:}];
% Base URL for constructing the full file URL if needed
baseUrl = pageUrl;
% Download each file
for i = 1:length(fileLinks)
fileUrl = [baseUrl, fileLinks{i}];
[~, name, ext] = fileparts(fileUrl);
fileName = [name, ext];
filePath = fullfile(dataFolder, fileName);
% Check if the file already exists to avoid re-downloading
if ~exist(filePath, 'file')
fprintf('Downloading %s\n', fileName);
websave(filePath, fileUrl);
else
fprintf('File %s already exists. Skipping download.\n', fileName);
end
end
Refer to the following MathWorks documentation for detailed usage of the functions:
- https://www.mathworks.com/help/matlab/ref/webread.html
- https://www.mathworks.com/help/matlab/ref/websave.html
- https://www.mathworks.com/help/matlab/ref/regexp.html
I hope this helps.
0 Comments
See Also
Categories
Find more on Web Services in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!