Extract text from webpage (webread)

10 views (last 30 days)
Valentin Blanc
Valentin Blanc on 22 Jan 2022
Commented: Rik on 24 Jan 2022
Hello everyone, I'm trying to extract text data from an url with the webread function. However, when I try to do this, the result is something like the HTML code of the page and the text that I wanted to extract is not even present in this HTML code.
An example of a webpage from which I would like to extract the text data is : https://worldwide.espacenet.com/patent/search/family/076708046/publication/EP3936426A1?q=EP3936426 The data that is of interest is the text at the bottom of the webpage (below "Abstract" or "Abrégé").
I tried naively something like this : url = 'https://worldwide.espacenet.com/patent/search/family/076708046/publication/EP3936426A1?q=EP3936426'; test_data = webread(url) I hoped that I could get all the data of the webpage (something like select all and copy) and then extract the part that was of interest. I guess the issue is linked with the website itself because the same method works with other websites.
Thank you in advance for your help !
  1 Comment
Rik
Rik on 24 Jan 2022
The actual data is fetched by a Javascript backend API, so you will have to download the page and run that Javascript yourself. It will probably not work.

Sign in to comment.

Answers (0)

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!