MATLAB Answers

Extract documents from a website's hyperlinks

6 views (last 30 days)
dsmalenb
dsmalenb on 22 May 2019
Answered: Koundinya on 29 May 2019
Hello folks!
I have a few websites from which I am trying to pull the files from their embedded hyperlinks. Does Matlab have a way to do this? For example, if we look at the website:
https://en.wikipedia.org/wiki/Quantum_mechanics we notice several hyperlinks at the bottom as references. In this case disregard the earlier hyperlinks that lead to other articles or to these references.
Is there a way to extract these documents automatically via Matlab?

  0 Comments

Sign in to comment.

Accepted Answer

Koundinya
Koundinya on 29 May 2019
That could be done using webread to retrieve data from the webpage and regexp to extract all the hyperlinks in the page by parsing through the retrieved text.
html_text = webread(https://en.wikipedia.org/wiki/Quantum_mechanics);
hyperlinks = regexp(html_text,'<a.*?/a>','match');

  0 Comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.