I'm new to creating VBA code and I'm slowly getting a basic understanding of it, however I'm unable to pass this point of my project without assistance. I have the code below and runs great up until I need to continue the code with the new page that opens. I have no idea on how to be able to continue the code and the plan is to be able to click on the odds comparison tab and extract data from that page. Any assistance would be much appreciated.
Sub odd_comparison()
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "http://www.flashscore.com/basketball/"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
objIE.document.getElementById("fs").Children(0) _
.Children(2).Children(2).Children(0).Children(2).Click
End Sub
Try to make loop until the webpage ready as described in this and this answers (you know, replace WScript.Sleep with DoEvents for VBA).
Inspect the target element on the webpage with Developer Tools (using context menu or pressing F12). HTML content is as follows:
bwin.fr Odds
As you can see there is onclick attribute, and actually you can try to execute jscript code from it instead of invoking click method:
objIE.document.parentWindow.execScript "setNavigationCategory(4);pgenerate(true, 0,false,false,2); e_t.track_click('iframe-bookmark-click', 'odds');", "javascript"
Going further you can find the following spinner element, which appears for the short time while data is being loaded after the tab clicked:
<div id="preload" class="preload pvisit" style="display: none;"><span>Loading ...</span></div>
So you can detect when the data loading is completed by checking the visibility state:
Do Until objIE.document.getElementById("preload").style.display = "none"
DoEvents
Loop
The next step is extracting the data you need. You can get all tables from central block: .document.getElementById("fs").getElementsByTagName("table"), loop through tables and get all rows oTable.getElementsByTagName("tr"), and finally get all cells .getElementsByTagName("td") and innerText.
The below example shows how to extract all table data from the webpage odds comparison tab to Excel worksheet:
Option Explicit
Sub Test_Get_Data_www_flashscore_com()
Dim aData()
' clear sheet
Sheets(1).Cells.Delete
' retrieve content from web site, put into 2d array
aData = GetData()
' output array to sheet
Output Sheets(1).Cells(1, 1), aData
MsgBox "Completed"
End Sub
Function GetData()
Dim oIE As Object
Dim cTables As Object
Dim oTable As Object
Dim cRows As Object
Dim oRow As Object
Dim aItems()
Dim aRows()
Dim cCells As Object
Dim i As Long
Dim j As Long
Set oIE = CreateObject("InternetExplorer.Application")
With oIE
' navigate to target webpage
.Visible = True
.navigate "http://www.flashscore.com/basketball/"
' wait until webpage ready
Do While .Busy Or Not .readyState = 4: DoEvents: Loop
Do Until .document.readyState = "complete": DoEvents: Loop
Do While TypeName(.document.getElementById("fscon")) = "Null": DoEvents: Loop
' switch to odds tab
.document.parentWindow.execScript _
"setNavigationCategory(4);pgenerate(true, 0,false,false,2); e_t.track_click('iframe-bookmark-click', 'odds');", "javascript"
Do Until .document.getElementById("preload").Style.display = "none": DoEvents: Loop
' get all table nodes
Set cTables = .document.getElementById("fs").getElementsByTagName("table")
' put all rows into dictionary to compute total rows count
With CreateObject("Scripting.Dictionary")
' process all tables
For Each oTable In cTables
' get all row nodes within table
Set cRows = oTable.getElementsByTagName("tr")
' process all rows
For Each oRow In cRows
' put each row into dictionary
Set .Item(.Count) = oRow
Next
Next
' retrieve array from dictionary
aItems = .Items()
End With
' redim 1st dimension equal total rows count
ReDim aRows(1 To UBound(aItems) + 1, 1 To 1)
' process all rows
For i = 1 To UBound(aItems) + 1
Set oRow = aItems(i - 1)
' get all cell nodes within row
Set cCells = aItems(i - 1).getElementsByTagName("td")
' process all cells
For j = 1 To cCells.Length
' enlarge 2nd dimension if necessary
If UBound(aRows, 2) < j Then ReDim Preserve aRows(1 To UBound(aItems) + 1, 1 To j)
' put cell innertext into array
aRows(i, j) = Trim(cCells(j - 1).innerText)
DoEvents
Next
Next
.Quit
End With
' return populated array
GetData = aRows
End Function
Sub Output(objDstRng As Range, arrCells As Variant)
With objDstRng
.Parent.Select
With .Resize( _
UBound(arrCells, 1) - LBound(arrCells, 1) + 1, _
UBound(arrCells, 2) - LBound(arrCells, 2) + 1)
.NumberFormat = "#"
.Value = arrCells
.Columns.AutoFit
End With
End With
End Sub
Webpage odds comparison tab content for me is as follows:
It gives the output:
Related
I'm trying to understand why my references arent working well to scrape this data.
Here is the site as an example:
http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear
And as a target:
<div id="data_i6" class="rf_crow"><div id="Y_1" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="741131269">741</div><div id="Y_2" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="836611464">837</div><div id="Y_3" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="939841654">940</div><div id="Y_4" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1010472512">1,010</div><div id="Y_5" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1100344312">1,100</div><div id="Y_6" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1115401551">1,115</div></div>
What I need to extract is the actual value in rawvalue="741131269" and the following is what I've gotten to work so far.
'Cells(1, 1) = Document.getElementsByClassName("rf_crow")'returns the rows of data into one cell
'Cells(1, 1) = Document.getElementById("Y_1").innerText 'returns the text for the year
'Cells(1, 1) = Document.getElementById("data_i1").innerText 'returns to first row of data
I know the above doesn't return what I want, because the comment tells you what it extracts into Excel. The sub-element doesn't seem to work as it does in other macros I've built. I thought something like this would work:
Cells(1, 1) = Document.getElementById("Y_1").getAttribute("rawvalue")
but that doesn't work, also, I tried:
Cells(1, 1) = Document.getElementById("data_i6").getElementById("Y_1").innertext
and that doesn't work either.
The solution is very easy. Just call it using it's attribute which is `rawvalue.
This is how you can go:
Using Hardcoded delay and for loop to check the availability of the desired value:
Sub GetValue()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object, elem As Object
With IE
.Visible = True
.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
''using hardcoded delay
Application.Wait Now + TimeValue("00:00:05")
For Each elem In HTML.getElementsByTagName("div")
If elem.innerText = "741" Then MsgBox elem.getAttribute("rawvalue"): Exit For
Next elem
End Sub
Using Explicit Wait:
Sub GetValue()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object
With IE
.Visible = True
.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
Do: Set post = HTML.querySelector("#data_i6 #Y_1"): DoEvents: Loop While post Is Nothing
MsgBox post.getAttribute("rawvalue")
End Sub
Output at this moment:
741131269
The following should illuminate some of the problems you were having.
.querySelectorAll
The exact element you mention is the second index returned by .querySelectorAll method of .document using the CSS selector #Y_1. The # means Id.
From that webpage it returns the following (sample shown - not all):
From the above you can see the string you want is returned by the index of 2 in the result.
querySelectorAll with Id? Isn't Id a unique identifier for a single element?
This Id, unexpectedly, is not unique to a single element on the page. It occurs a whopping 27 times:
This means you can use the .querySelectorAll method to return a nodeList of all matching items and take the item at index 2 to get your result.
Note:
If you want the long number next to rawvalue, 741131269, then parse the outerHTML of the returned element.
Debug.Print Replace(Split(Split(a.item(2).outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString)
.querySelector
Alternatively, you can target the id which is specific data_i6 with
.document.querySelector("#data_i6")
This CSS selector (#data_i6) returns the entire row as it has each year within. If using .querySelector you will only get the first item anyway which is year 1.
You can be more specific with the CSS selector and add the additional year Id to get just the year of interest:
#data_i6 #Y_1
Code: (querySelector method commented out next to querySelectorAll)
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 2)
Do
DoEvents
On Error Resume Next
Set a = .document.querySelectorAll("#Y_1") ' .document.querySelector("#data_i6 #Y_1")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Debug.Print Split(Split(a.item(2).innerText, "rawvalue=")(0), ">")(0) 'Split(Split(a.innerText, "rawvalue=")(0), ">")(0)
Debug.Print Replace(Split(Split(a.item(2).outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString) 'Replace(Split(Split(a.outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString)
.Quit
End With
End Sub
Try and declare "objCollection" as an object, strValue as string, and in the code below, replace in the first line the name of the http-request you declared:
Document.body.innerHTML = YourHTTPRequest.responseText
Set objCollection = Document.getElementsByClassName("rf_crow")
For Each objElement In objCollection
If objElement.ID = "Y_1" Then
strValue = objElement.getAttribute("rawvalue")
Exit For
End If
Next
Cells(1, 1) = strValue
Does this work for you?
Sub web_table_option_two()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub
'start a new subroutine called SearchBot
Sub SearchBot()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim x As Integer
Application.ScreenUpdating = False
' Set numrows = number of rows of data.
NumRows = Range("A1", Range("A1").End(xlDown)).Rows.Count
' Select cell a1.
Range("A1").Select
' Establish "For" loop to loop "numrows" number of times.
For x = 1 To NumRows
' Insert your code here.
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "http://ec.europa.eu/taxation_customs/vies/vatResponse.html"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("countryCombobox").Value = "GB"
objIE.document.getElementById("number").Value = ActiveCell.Value
'click the 'go' button
objIE.document.getElementById("submit").Click
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim vatResponse As String
vatResponse = objIE.document.getElementById("vatResponseFormTable").getElementsByTagName("tr")(0).Children(0).textContent
ActiveCell.Offset(0, 2).Value = vatResponse
' Selects cell down 1 row from active cell.
'Next
Application.ScreenUpdating = True
'close the browser
objIE.Quit
ActiveCell.Offset(1, 0).Select
'End
Next
'exit our SearchBot subroutine
End Sub
So basically on this code line:
vatResponse = objIE.document.getElementById("vatResponseFormTable").getElementsByTagName("tr")(0).Children(0).textContent
I am getting an error message saying that I have an error code 424
Sometimes the pages gets loaded internally through some scripts so the html element you are trying to get actually isn't found on the document as the code runs very fast. So somehow you have to wait until page loads completely.
Please try this approach and see if the code runs without producing an error.
Sub SearchBot()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim vatFormTable As IHTMLElement
Dim tr As IHTMLElement
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim x As Integer
Application.ScreenUpdating = False
' Set numrows = number of rows of data.
NumRows = Range("A" & Rows.Count).End(xlUp).Row
' Select cell a1.
Range("A1").Select
' Establish "For" loop to loop "numrows" number of times.
For x = 1 To NumRows
' Insert your code here.
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "http://ec.europa.eu/taxation_customs/vies/vatResponse.html"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("countryCombobox").Value = "GB"
objIE.document.getElementById("number").Value = ActiveCell.Value
'click the 'go' button
objIE.document.getElementById("submit").Click
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
On Error Resume Next
Dim vatResponse As String
Do While vatFormTable Is Nothing
Set vatFormTable = objIE.document.getElementById("vatResponseFormTable")
Loop
Do While tr Is Nothing
Set tr = vatFormTable.getElementsByTagName("tr")(0)
Loop
vatResponse = tr.Children(0).innerText
ActiveCell.Offset(0, 2).Value = vatResponse
' Selects cell down 1 row from active cell.
'Next
Application.ScreenUpdating = True
'close the browser
objIE.Quit
ActiveCell.Offset(1, 0).Select
'End
Next
'exit our SearchBot subroutine
End Sub
I am trying to extract table from an internal website to excel by giving some input parameters. Everything works until it refreshes the website data with my inputs. The part I get the run-time error 438 is marked (For r = 1 To elemCollection.Rows.Length - 1). I also tried to load the data from website to excel using web query and the table wasn't showing up on my excel spreadsheet. "It gives the following error-This page might not function correctly because either your browser does not support scripts or active scripting is disabled. Your browser does not support scripts or has been configured not to allow scripts. The report viewer web control http handler has not been registered in the application's web config file."
Wondering if this has anything to do with permissions.
VBA code below:
Option Explicit
Sub Macro1()
Dim IE As Object, obj As Object
Dim StartDate As Object
Dim EndDate As Object
Dim myState As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object, curHTMLRow As Object
Dim wb As Workbook
Dim ws As Worksheet
Dim objCollection As Object
Dim objElement As Object
Dim i As Long
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate ("http://internalwebsite_SSRSReport")
' we ensure that the web page downloads completely before we fill the form automatically
While IE.ReadyState <> 4: DoEvents: Wend
IE.Document.All.Item("ctl31_ctl03_txtValue").InnerText = Format("7/1/2016", "m/d/yyyy")
IE.Document.All.Item("ctl31_ctl05_txtValue").InnerText = Format("7/31/2016", "m/d/yyyy")
Wait 2
IE.Document.getElementsByName("ctl31_ctl04_divDropDown").Item.Click
Wait 2
' accessing the button
IE.Document.getElementsByName("ctl31_ctl04_ctl00").Item.Click
Wait 2
' again ensuring that the web page loads completely before we start scraping data
While IE.busy: DoEvents: Wend
Wait 2
'Clearing any unnecessary or old data in Sheet1
ThisWorkbook.Sheets("Sheet1").Activate
Range("A1:K500").ClearContents
Set elemCollection = IE.Document.getelementbyId("ctl31_ctl09_ReportArea")
'error here
For r = 1 To elemCollection.Rows.Length - 1
Set curHTMLRow = elemCollection.Rows(r)
For c = 0 To curHTMLRow.Cells.Length - 1
Cells(r + 1, c + 1) = curHTMLRow.Cells(c).InnerText
Next
Next
' cleaning up memory
IE.Quit
Set IE = Nothing
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub
Below is some code that should be able to grab the data from the HTML table from the SSRS report and extract it to Excel.
Basically the code will iterate through all the TRs and TDs in the Table Element, and output the InnerText to Excel. If you are moving a lot of data over, please consider writing to an array, then doing the write all at once by setting to an equally sized range object.
I also cleaned up the code, mostly removing variables that were not referenced and reduced some of the lines by combining some statements together
Option Explicit
Public Sub GetSSRSData()
On Error GoTo errhand:
Application.ScreenUpdating = False
Dim IE As Object: Set IE = CreateObject("InternetExplorer.Application")
Dim TR_Elements As Object
Dim TR As Object ' Table Row
Dim TD_Elements As Object
Dim TD As Object ' Table Data
Dim RowNumb As Integer
Dim Columns As Integer
Dim ColumnNumb As Integer
With IE
.Visible = True
.Navigate ("http://internalwebsite_SSRSReport")
While .ReadyState <> 4: DoEvents: Wend ' Wait for page load
'Fill the form out with dates
.Document.All.Item("ctl31_ctl03_txtValue").InnerText = Format("7/1/2016", "m/d/yyyy")
.Document.All.Item("ctl31_ctl05_txtValue").InnerText = Format("7/31/2016", "m/d/yyyy")
Wait 2
'Click the DropDown
.Document.getElementsByName("ctl31_ctl04_divDropDown").Item.Click
Wait 2
' Click the other button
.Document.getElementsByName("ctl31_ctl04_ctl00").Item.Click
End With
Wait 2
While IE.busy: DoEvents: Wend ' Wait for page load
Wait 2
'Clearing any unnecessary or old data in Sheet1
Sheets("Sheet1").Range("A1:K500").ClearContents
Set TR_Elements = IE.Document.getelementbyId("ctl31_ctl09_ReportArea").getElementsByTagName("tr")
RowNumb = 1
ColumnNumb = 1
'Tables usually consists of TR (Table Rows), and -
'TD (Table Data)
For Each TR In TR_Elements
Set TD_Elements = TR.getElementsByTagName("td")
ColumnNumb = 1
For Each TD In TD_Elements
'Consider using an array to save the values to memory if there is going
'to be a lot of data to be moved over
ActiveSheet.Cells(RowNumb, ColumnNumb).Value = TD.InnerText
ColumnNumb = ColumnNumb + 1
Next
RowNumb = RowNumb + 1
Next
' cleaning up memory
IE.Quit
Set IE = Nothing
Set TD_Elements = Nothing
Set TR_Elements = Nothing
Set TD = Nothing
Set TR = Nothing
Application.ScreenUpdating = True
errhand:
Application.ScreenUpdating = True
End Sub
Private Sub Wait(ByVal nSec As Long)
nSec = nSec + Timer
While nSec > Timer
DoEvents
Wend
End Sub
Goal: Make a VBA Macro that opens up a webpage via IE, loops through entire page, uses the Ctrl+F function to find MULTIPLE keywords, if those keywords are found, locates the row those keywords are in, and grabs a certain number of rows above and below that keyword row location and posts them to an excel sheet to be emailed out.
I have code that goes to the webpage, and uses Ctrl+F to find the keyword. This part works correctly. I don't know how to loop through the whole webpage and do this for multiple keywords. I also am having trouble finding the row location of each keyword 'hit' and posting it to excel (not that skilled with VBA).
Sub Find()
'create a variable to refer to an IE application, and
'start up a new copy of IE
Dim ieApp As New SHDocVw.InternetExplorer
Dim objectIE As Object
'make sure you can see
ieApp.Visible = True
'go to the website of interest
ieApp.Navigate "URL HERE"
'wait for page to finish loading
Do While ieApp.Busy
Loop
'Declare Keywords
Dim keyword1 As String
Dim found As Boolean
keyword1 = "keyword"
For i = 1 To ie.document.all.Length
Application.Wait (Now + TimeValue("0:00:02"))
SendKeys "^f"
Application.Wait (Now + TimeValue("0:00:01"))
SendKeys (keyword1)
Application.Wait (Now + TimeValue("0:00:01"))
SendKeys ("{ENTER}")
Next i
End Sub
Here is the example, which implements keyword look up in webpage document text nodes, if found - expands the range to the entire table cell, then outputs all matches on to worksheet:
Sub Find()
sKeyword = "language"
sUrl = "http://stackoverflow.com/tags"
Set oList = CreateObject("Scripting.Dictionary")
With CreateObject("InternetExplorer.Application")
.Visible = True
' Navigating to url
.Navigate sUrl
' Wait for IE ready
Do While .ReadyState <> 4 Or .Busy
DoEvents
Loop
' Wait for document complete
Do While .Document.ReadyState <> "complete"
DoEvents
Loop
' ' Look up in the specified node - optional
' ' Wait for target node created
' Do While TypeName(.Document.getElementById("Content")) = "Null" ' replace Content with your Id
' DoEvents
' Loop
' ' Get target node
' Set oRoot = .Document.getElementById("Content")
' Look up in the entire document
Set oRoot = .Document.getElementsByTagName("html")(0)
Set oWalker = .Document.createTreeWalker(oRoot, 4, Null, False) ' NodeFilter.SHOW_TEXT = 4
Set oNode = oWalker.currentNode
Do
Select Case True
Case IsNull(oNode.NodeValue)
Case oNode.NodeValue = ""
Case InStr(oNode.NodeValue, sKeyword) = 0
Case Else
' Text node contains keyword
Debug.Print oNode.NodeValue
Do
' Expand the range until thenode of the necessary type is enclosed
Set oNode = oNode.ParentNode
Debug.Print TypeName(oNode)
Select Case TypeName(oNode)
' ' Non-table structures
' Case "HTMLHtmlElement", "HTMLBody", "HTMLDivElement", "HTMLParagraphElement", "HTMLHeadingElement"
' For tables
Case "HTMLHtmlElement", "HTMLBody", "HTMLTableRow", "HTMLTableCell"
Exit Do
End Select
Loop
' Add to list
sText = oNode.innerText
Debug.Print sText
oList(oList.Count) = sText
End Select
' Get next node
oWalker.NextNode
Set oPrev = oNode
Set oNode = oWalker.currentNode
Loop Until oNode Is oPrev
.Quit
End With
' Results output
aList = oList.Items()
Cells(1, 1).Resize(UBound(aList) + 1, 1).Value = aList
End Sub
As an example, for source page as follows
the output is
For multiple keywords search please elaborate the rule: do all keywords, or at least one of the keywords should be found in one sentence?
I have a list of 1000 keywords in A1:A1000. I want to get the Google search result snippets of first page in corresponding cells of each keyword. Ex: search result snippets of A1 cell should be in B1...*1 and so on. Any help is much appreciated.
Consider the below example:
Option Explicit
Const TargetItemsQty = 30 ' results for each keyword
Sub GWebSearchIECtl()
Dim objSheet As Worksheet
Dim objIE As Object
Dim x As Long
Dim y As Long
Dim strSearch As String
Dim lngFound As Long
Dim st As String
Dim colGItems As Object
Dim varGItem As Variant
Dim strHLink As String
Dim strDescr As String
Dim strNextURL As String
Set objSheet = Sheets("Sheet1")
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True ' for debug or captcha request cases
y = 1 ' start searching for the keyword in the first row
With objSheet
.Select
.Range(.Columns("B:B"), .Columns("B:B").End(xlToRight)).Delete ' clear previous results
.Range("A1").Select
Do Until .Cells(y, 1) = ""
x = 2 ' start writing results from column B
.Cells(y, 1).Select
strSearch = .Cells(y, 1) ' current keyword
With objIE
lngFound = 0
.Navigate "https://www.google.com/search?q=" & EncodeUriComponent(strSearch) ' go to first search results page
Do
Do While .Busy Or Not .readyState = 4: DoEvents: Loop ' wait IE
Do Until .document.readyState = "complete": DoEvents: Loop ' wait document
Do While TypeName(.document.getElementById("res")) = "Null": DoEvents: Loop ' wait [#res] element
Set colGItems = .document.getElementById("res").getElementsByClassName("g") ' collection of search result [.g] items
For Each varGItem In colGItems ' process each item in collection
If varGItem.getElementsByTagName("a").Length > 0 And varGItem.getElementsByClassName("st").Length > 0 Then ' must have hyperlink and description
strHLink = varGItem.getElementsByTagName("a")(0).href ' get first hyperlink [a] found in current item
strDescr = GetInnerText(varGItem.getElementsByClassName("st")(0).innerHTML) ' get first description [span.st] found in current item
lngFound = lngFound + 1
With objSheet ' put result into cell
.Hyperlinks.Add .Cells(y, x), strHLink, , , strDescr
.Cells(y, x).WrapText = True
x = x + 1 ' next column
End With
If lngFound = TargetItemsQty Then Exit Do ' continue with next keyword - necessary quantity of the results for current keyword found
End If
DoEvents
Next
If TypeName(.document.getElementById("pnnext")) = "Null" Then Exit Do ' continue with next keyword - no [a#pnnext.pn] next page button exists
strNextURL = .document.getElementById("pnnext").href ' get next page url
.Navigate strNextURL ' go to next search results page
Loop
End With
y = y + 1 ' next row
Loop
End With
objIE.Quit
' google web search page contains the elements:
' [div#res] - main search results block
' [div.g] - each result item block within [div#res]
' [a] - hyperlink ancor(s) within each [div.g]
' [span.st] - description(s) within each [div.g]
' [a#pnnext.pn] - hyperlink ancor to the next search results page
End Sub
Function EncodeUriComponent(strText As String) As String
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
End If
EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function
Function GetInnerText(strText As String) As String
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.Open
objHtmlfile.Write "<body></body>"
End If
objHtmlfile.body.innerHTML = strText
GetInnerText = objHtmlfile.body.innerText
End Function