MATLAB Answers

Perform Google Search in Matlab

54 views (last 30 days)
dsmalenb
dsmalenb on 4 Jun 2019
Answered: Monika Phadnis on 27 Jun 2019
Hi!
I am trying to figure out how to perform a Google search automatically in matlab and save the results in an array.
Say I wanted to save the paths to the pdf files: "site:www.cnn.com filetype:pdf"
Some answers in the list should then be:
...
I have seen some scripts (links below) but unfortunately they are outdated or simply do not work. I am guessing it may be possible to do this but I cannot seem to figure it out. Any assistance would be very welcome!
Links:

  3 Comments

Joel Handy
Joel Handy on 4 Jun 2019
I think what you want is possible. I checked out your second link to the file exchange. It doesnt work for me either but it looks like it could be brought back to working order.
The key line, below, captures teh google results of a search defined in q
html_txt = urlread(['https://www.google.com/search?q=',q]);
The problem is those resiults are in the form of html code which the remainder of the code no longer knows how to properly mine for the desired data. With some sluething, you could update the getInfo and getlinks functions and have a worth contribution to the file exchange.
dsmalenb
dsmalenb on 4 Jun 2019
Joel,
Thank you for your response. Perhaps I am missing something significant but after parsing through the html I tried to compare the parts so I can made the neccesary changes. However, it does not seem as if all the necessary parts of the link are available. I have included an example below. It is for the first arciel that the search displays.
We have:
  1. The file typoe is in GREEN
  2. The Article's title is in YELLOW
  3. The parts of the link are in MAGENTA
I am missing "2004" and "01/23/" to complete the link. These parts do not seem to be listed in the HTML code.
Any idea how to get these pieces?
snippet.jpg
Joel Handy
Joel Handy on 10 Jun 2019
After doing some more research, it looks like scraping (thats what we are doing, scraping googles search results) is against their terms of service and they actively attempt to thwart it. That would explain why some older tools are no longer maintained. I'm not a web expert, There appear to be ways of doing what you want but I dont think any of them are simple.
Sorry I couldnt be more help.

Sign in to comment.

Answers (1)

Monika Phadnis
Monika Phadnis on 27 Jun 2019
I followed the example given on this link to extract data from the url.
As for the url, I used " http://www.google.com/search?q=cnn.com+filetype%3Apdf " this as the url parameter for webread for the example given by you. This gives string array of the href links, you can try parsing the array for the required links.
In my output strings starting with " /url " had the search links.

  0 Comments

Sign in to comment.

Sign in to answer this question.

Products


Release

R2019a