Extract text from webpage (webread)
10 views (last 30 days)
Show older comments
Hello everyone, I'm trying to extract text data from an url with the webread function. However, when I try to do this, the result is something like the HTML code of the page and the text that I wanted to extract is not even present in this HTML code.
An example of a webpage from which I would like to extract the text data is : https://worldwide.espacenet.com/patent/search/family/076708046/publication/EP3936426A1?q=EP3936426 The data that is of interest is the text at the bottom of the webpage (below "Abstract" or "Abrégé").
I tried naively something like this : url = 'https://worldwide.espacenet.com/patent/search/family/076708046/publication/EP3936426A1?q=EP3936426'; test_data = webread(url) I hoped that I could get all the data of the webpage (something like select all and copy) and then extract the part that was of interest. I guess the issue is linked with the website itself because the same method works with other websites.
Thank you in advance for your help !
1 Comment
Rik
on 24 Jan 2022
The actual data is fetched by a Javascript backend API, so you will have to download the page and run that Javascript yourself. It will probably not work.
Answers (0)
See Also
Categories
Find more on String Parsing in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!