Use VBA to click html button and then scrape refreshed data - vba

I am trying to write a procedure which enters a date into an input box
<input name="Mdate" type="text" id="Mdate" size="30" value="" /></td>
clicks a submit button
<input type="submit" name="button" id="button" value="Submit" />
then scrapes the resulting data, which appears in the "a" tags.
<center>
<b>Tuesday, 6 January 2015</b><br />
Ruakaka
This data is not available until the submit button has been entered. My attempt is posted in full below. The problem I seem to be having is that i am not able to access the modified html code (modified by clicking submit). Can anyone provide any suggestions?
'dimension variables
Dim ie As InternetExplorer
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim inputs As MSHTML.IHTMLElementCollection 'Element collection for "input" tags
Dim eles1, eles2 As MSHTML.IHTMLElementCollection 'Element collection for th tags
Dim element As MSHTML.IHTMLElement 'input elements
Dim ele1, ele2 As MSHTML.IHTMLElement 'Header elements
'Open InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False 'make IE invisible
'Navigate to webpage
Dim ieURL As String: ieURL = "http://www.racenet.com.au/horse-racing-results/" 'set URL from which to retrieve racemeet and date data
ie.navigate ieURL 'navigate to URL
Do While ie.Busy Or ie.readyState <> 4 'wait for page to load
DoEvents
Loop
Set htmldoc = ie.document 'Document webpage
Set inputs = htmldoc.getElementsByTagName("input") 'Find all input tags
Dim dd, mm, yyyy As Integer
Dim startdate, enddate As Date
Dim i, j, k As Long
Dim raceMeet, raceURL As String
startdate = #1/1/2008#: enddate = Date - 1
Dim racemeetArr As Variant
ReDim racemeetArr(1 To 2, 1)
For i = startdate To enddate
dd = Day(i): mm = Month(i): yyyy = Year(i)
For Each element In inputs
If element.Name = "Mdate" Then
element.Value = yyyy & "-" & mm & "-" & dd
Else
If element.Name = "button" Then
element.Click
'insert scraper
Set eles1 = htmldoc.getElementsByTagName("a") 'Find all centre tags
For Each ele1 In eles1
If InStr(ele1.href, "/horse-racing-results/") > 0 Then
raceMeet = ele1.innerText
raceURL = ele1.innerHTML
ReDim Preserve racemeetArr(1 To 2, UBound(racemeetArr, 2) + 1)
racemeetArr(1, UBound(racemeetArr, 2)) = raceMeet
racemeetArr(2, UBound(racemeetArr, 2)) = raceURL
End If
Next ele1
Else
End If
End If
Next element
Stop
Next i
ie.Quit

Insert a condition to wait while the page is loading.
The following rewrite successfully fetches data from the target page on my pc:
Private Sub CommandButton1_Click()
'dimension variables
Dim ie As InternetExplorer
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim inputs As MSHTML.IHTMLElementCollection 'Element collection for "input" tags
Dim eles1, eles2 As MSHTML.IHTMLElementCollection 'Element collection for th tags
Dim element As MSHTML.IHTMLElement 'input elements
Dim ele1, ele2 As MSHTML.IHTMLElement 'Header elements
'Open InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True 'make IE invisible
'Navigate to webpage
Dim ieURL As String: ieURL = "http://www.racenet.com.au/horse-racing-results/" 'set URL from which to retrieve racemeet and date data
ie.navigate ieURL 'navigate to URL
Do While ie.Busy Or ie.readyState <> 4 'wait for page to load
DoEvents
Loop
Set htmldoc = ie.document 'Document webpage
Set inputs = htmldoc.getElementsByTagName("input") 'Find all input tags
Dim dd, mm, yyyy As Integer
Dim startdate, enddate As Date
Dim i, j, k As Long
Dim raceMeet, raceURL As String
startdate = #1/1/2008#: enddate = Date - 1
Dim racemeetArr As Variant
ReDim racemeetArr(1 To 2, 1)
For i = startdate To enddate
dd = Day(i): mm = Month(i): yyyy = Year(i)
For Each element In inputs
If element.Name = "Mdate" Then
element.Value = yyyy & "-" & mm & "-" & dd
Else
If element.Name = "button" Then
element.Click
Exit For
End If
End If
Next element
Do
' Wait until the Browser is loaded'
Loop Until ie.readyState = READYSTATE_COMPLETE
'insert scraper
Set eles1 = htmldoc.getElementsByTagName("a") 'Find all centre tags
For Each ele1 In eles1
If InStr(ele1.href, "/horse-racing-results/") > 0 Then
raceMeet = ele1.innerText
raceURL = ele1.innerHTML
ReDim Preserve racemeetArr(1 To 2, UBound(racemeetArr, 2) + 1)
racemeetArr(1, UBound(racemeetArr, 2)) = raceMeet
racemeetArr(2, UBound(racemeetArr, 2)) = raceURL
End If
Next ele1
Stop
Next i
ie.Quit
End Sub
Edit:
After analyzing the HTTP requests I managed to slim down the code a little bit (results can be queried directly without filling the form and submitting the page)
I am not a huge fan of expensive array ReDims, so I created a class instead, and save the results in a collection of that class (feel free to use it or not).
Add a new class module, call it clRaceMeet and paste this code:
Option Explicit
Private pMeet As String
Private pUrl As String
Public Property Let Meet(ByVal Val As String)
pMeet = Val
End Property
Public Property Get Meet() As String
Meet = pMeet
End Property
Public Property Let URL(ByVal Val As String)
pUrl = Val
End Property
Public Property Get URL() As String
URL = pUrl
End Property
Then, use this modified code version to scrape the data and dump it to the debugging window:
Option Explicit
Private Sub CommandButton1_Click()
'dimension variables
Dim ie As InternetExplorer
Dim ieURL As String
Dim dd As Integer
Dim mm As Integer
Dim yyyy As Integer
Dim startDate As Date
Dim endDate As Date
Dim i As Long
Dim htmlDoc As MSHTML.IHTMLDocument
Dim colLeftEleColl As MSHTML.IHTMLElementCollection
Dim colLeftEle As MSHTML.IHTMLElement
Dim centerEleColl As MSHTML.IHTMLElementCollection
Dim centerEle As MSHTML.IHTMLElement
Dim raceMeet As String
Dim raceURL As String
Dim objRaceMeet As clRaceMeet
Dim raceMeetColl As New Collection
'Open InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
startDate = #1/1/2009#
endDate = Date - 1
For i = startDate To endDate
dd = Day(i)
mm = Month(i)
yyyy = Year(i)
ieURL = "http://www.racenet.com.au/horse-racing-results-search.asp?Mdate=" & yyyy & "-" & mm & "-" & dd
ie.navigate ieURL
Do
' Wait until the Browser is loaded'
Loop Until ie.readyState = READYSTATE_COMPLETE
Set htmlDoc = ie.document
'insert scraper
Set colLeftEleColl = htmlDoc.getElementById("ColLeft").all
'Loop through elements of ColLeft div
For Each colLeftEle In colLeftEleColl
If colLeftEle.tagName = "CENTER" Then
Set centerEleColl = colLeftEle.all
'Loop through elements of <center> tag
For Each centerEle In centerEleColl
If centerEle.tagName = "A" Then
If InStr(centerEle.href, "/horse-racing-results/") > 0 Then
raceMeet = centerEle.innerText
raceURL = centerEle.href
Set objRaceMeet = New clRaceMeet
objRaceMeet.Meet = raceMeet
objRaceMeet.URL = raceURL
raceMeetColl.Add objRaceMeet
End If
End If
Next centerEle
Exit For
End If
Next colLeftEle
' Dump results to immediate window:
For Each objRaceMeet In raceMeetColl
Debug.Print objRaceMeet.Meet & " - " & objRaceMeet.URL
Next objRaceMeet
'Stop
Next i
ie.Quit
End Sub
Happy betting! :)

I toyed around with the last one and the for each loop within the for next loop has to go after it. I then also made it list into sheet1 and it worked. I did a few minor adjustments such as adding a variable to increment the cells.
this code didn't produce the actual results just the websites, not sure if that is what you were aiming for.

Related

Error While using Macro to add google's first image link to excel

I'm using the below Code to input Google's first images link in B1
for certain values in A1.
Public Sub Test()
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim imgElements As IHTMLElementCollection
Dim imgElement As HTMLImg
Dim aElement As HTMLAnchorElement
Dim N As Integer, I As Integer
Dim Url As String, Url2 As String
Dim LastRow As Long
Dim m, sImageSearchString
LastRow = Range("A" & Rows.Count).End(xlUp).Row
For I = 1 To LastRow
Url = "http://www.google.co.in/search?q=" & Cells(I, 1) & "&source=lnms&tbm=isch&sa=X&rnd=1"
Set IE = New InternetExplorer
With IE
.Visible = False
.Navigate Url 'sWebSiteURL
Do Until .readyState = 4: DoEvents: Loop
'Do Until IE.document.readyState = "Complete": DoEvents: Loop
Set HTMLdoc = .document
Set imgElements = HTMLdoc.getElementsByTagName("IMG")
N = 1
For Each imgElement In imgElements
If InStr(imgElement.src, sImageSearchString) Then
If imgElement.ParentNode.nodeName = "A" Then
Set aElement = imgElement.ParentNode
Url2 = imgElement.src
N = N + 1
End If
End If
Next
Cells(I, 2) = Url2
IE.Quit
Set IE = Nothing
End With
Next
End Sub
however I'm receiving the below error, can you please advise?
I'm using Windows 10, Excel 365
In VBA Menu - Tools - References - tick MS Internet Controls.
Or
Using Late Binding
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")

Excel VBA to search a website's search box and click a link/button

I'm using Excel VBA to log into a website, then type a .txt file name in a search box and click a button/link to search for it in the resulting second page. The resulting third page has an icon you have to click to download the file. Lastly I want to input the info in the .txt file to a new excel spreadsheet.
My getLogin function works and I successfully log into the website. Stepping through with F8 I can see each line of code entering in my username and password values in the respective fields. I'm on the second page level where I want to search in a searchbox using a filename, but at
If Not FileN Is Nothing And FileN.Length > 0 Then
FileN(0).Value = fileName
in the SearchFile function, when I step through it using F8, I see it keeps skipping FileN(0).Value = fileName and doesn't input the file name. The same happens for the link I click to search,
Set ElementCol = ie.document.getElementsByTagName("a")
For Each l In ElementCol
If l.href = "javascript:myFunction('/mailbox/jsp/MBIList.jsp')" Then
l.Click
It doesn't enter the If Statement and therefore doesn't click it.
My code so far:
Sub getComponents()
Dim WebAddressIn As String
Dim ie As Object
WebAddressIn = "https://..."
'get ie instance to work with that is logged in
Set ie = getLogin(WebAddressIn, "usern", "pw")
Do Until ie.readyState = 4
Loop
ie.Visible = True
Dim fileName As String
fileName = Format(Now(), "yyyyMMdd") & ".TXT"
Set ie = searchFile(fileName, ie)
End Sub
Function searchFile(fileName As String, ie As Object)
Dim Doc As Object, lastRow As Long, tblTR As Object
Dim UserString As String
Dim FileN As Object ' MSHTML.IHTMLElement
Dim ElementCol As Object ' MSHTML.IHTMLElementCollection
Dim AllText As String
Do While ie.Busy
Loop
Set FileN = ie.document.getElementsByName("MsgNamePattern")
If Not FileN Is Nothing And FileN.Length > 0 Then
FileN(0).Value = fileName
End If
Do While ie.Busy
Loop
Set ElementCol = ie.document.getElementsByTagName("a")
For Each l In ElementCol
If l.href = "javascript:myFunction('/mailbox/jsp/MBIList.jsp')" Then
l.Click
Exit For
End If
Next l
Do While ie.Busy
Loop
Set searchFile = ie
Set ie = Nothing
End Function
Function getLogin(WebAddressIn As String, UserNameIn As String, PasswordIn As String)
Dim Doc As Object, lastRow As Long, tblTR As Object
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
Dim UserString As String
Dim PWString As String
ie.Navigate2 WebAddressIn
ie.Visible = True
Dim UserN As Object ' MSHTML.IHTMLElement
Dim PW As Object ' MSHTML.IHTMLElement
Dim ElementCol As Object ' MSHTML.IHTMLElementCollection
Dim AllText As String
Do While ie.Busy
Loop
' enter username and password in textboxes
Set UserN = ie.document.getElementsByName("userid")
If Not UserN Is Nothing And UserN.Length > 0 Then
' fill in first element named "username", assumed to be the login name field
UserN(0).Value = UserNameIn
End If
Set PW = ie.document.getElementsByName("password")
' password
If Not PW Is Nothing And PW.Length > 0 Then
' fill in first element named "password", assumed to be the password field
PW(0).Value = PasswordIn
End If
Do While ie.Busy
Loop
'Clicks the Sign in button
Set ElementCol = ie.document.getElementsByName("submit")
For Each btnInput In ElementCol
If btnInput.Value = "*Sign In" Then
btnInput.Click
Exit For
End If
Next btnInput
Do While ie.Busy
Loop
Set getLogin = ie
Set ie = Nothing
End Function
And here's the relevant HTML code on this second resulting page:
The Search Box -
<input type="text" name="MsgNamePattern" size="20" onblur="validateMessageName(this)">
The link to search -
<td align="center" valign="center"> <img border="0" src="/mailbox/images/go_off.gif" vspace="7" name="Go" align="top">
</td>

how to get the meta name keywords -vba

I am trying to get the meta name keywords from a webpage
meta name="keywords" content="Mitch Albom,For One More Day,Little, Brown Book Group,0751537535,Fiction / General,General & Literary Fiction,Modern & contemporary fiction (post c 1945),USA
I need to get the contents from it need help.
Option Explicit
Sub GetData()
Dim ie As New InternetExplorer
Dim str As String
Dim wk As Worksheet
Dim webpage As New HTMLDocument
Dim item As HTMLHtmlElement
Set wk = Sheet1
str = wk.Range("Link").value
ie.Visible = True
ie.Navigate str
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = ie.Document
Dim kwd As String
kwd = Trim(Doc.getElementsByTagName("keywords").innerText)
MsgBox kwd
End Sub
The best way to do that is by finding meta-element with name keyword and referring to its content property. You can do it like that:
Option Explicit
Sub GetData()
Dim ie As New InternetExplorer
Dim str As String
Dim wk As Worksheet
Dim webpage As New HTMLDocument
Dim item As HTMLHtmlElement
Set wk = Sheet1
str = wk.Range("Link").value
ie.Visible = True
ie.Navigate str
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
'Find the proper meta element --------------
Const META_TAG As String = "META"
Const META_NAME As String = "keywords"
Dim Doc As HTMLDocument
Dim metaElements As Object
Dim element As Object
Dim kwd As String
Set Doc = ie.Document
Set metaElements = Doc.all.tags(META_TAG)
For Each element In metaElements
If element.Name = META_NAME Then
kwd = element.Content
End If
Next
MsgBox kwd
End Sub

VBA to get the href value

I am writing macro to extract the href value from a website, example here is to extract the value: '/listedco/listconews/SEHK/2015/0429/LTN201504291355_C.pdf' from the html code below. The href is one of the attributes of the html tag 'a', I have add the code getElementbyTagName'a' but it did not work, my question is how to extract that href value to column L. Anyone could help? Thanks in advance!
<a id="ctl00_gvMain_ctl03_hlTitle" class="news" href="/listedco/listconews/SEHK/2015/0429/LTN201504291355_C.pdf" target="_blank">二零一四年年報</a>
Sub Download_From_HKEX()
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String
Dim IE As Object
Dim i As Object
Dim ieDoc As Object
Dim selectItems As Variant
Dim h As Variant
Dim LocalFileName As String
Dim B As Boolean
Dim ErrorText As String
Dim x As Variant
'Key Ratios
For x = 1 To 1579
Set IE = New InternetExplorerMedium
IE.Visible = True
URL = "http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main_c.aspx"
IE.navigate URL
Do
DoEvents
Loop Until IE.readyState = 4
Application.Wait (Now + TimeValue("0:00:05"))
Call IE.Document.getElementById("ctl00_txt_stock_code").setAttribute("value", Worksheets("Stocks").Cells(x, 1).Value)
Set selectItems = IE.Document.getElementsByName("ctl00$sel_tier_1")
For Each i In selectItems
i.Value = "4"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_tier_2")
For Each i In selectItems
i.Value = "159"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_d")
For Each i In selectItems
i.Value = "01"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_m")
For Each i In selectItems
i.Value = "04"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_y")
For Each i In selectItems
i.Value = "1999"
i.FireEvent ("onchange")
Next i
Application.Wait (Now + TimeValue("0:00:02"))
Set ieDoc = IE.Document
With ieDoc.forms(0)
Call IE.Document.parentWindow.execScript("document.forms[0].submit()", "JavaScript")
.submit
End With
Application.Wait (Now + TimeValue("0:00:03"))
'Start here to extract the href value.
Set internetdata = IE.Document
Set div_result = internetdata.getElementById("ctl00_gvMain_ctl03_hlTitle")
Set header_links = div_result.getElementsByTagName("a")
For Each h In header_links
Set link = h.ChildNodes.Item(0)
Worksheets("Stocks").Cells(Range("L" & Rows.Count).End(xlUp).Row + 1, 12) = link.href
Next
Next x
End Sub
For Each h In header_links
Worksheets("Stocks").Cells(Range("L" & Rows.Count).End(xlUp).Row + 1, 12) = h.href
Next
EDIT: The id attribute is supposed to be unique in the document: there should only be a single element with any given id. So
IE.Document.getElementById("ctl00_gvMain_ctl03_hlTitle").href
should work.
WB.Document.GetElementById("ctl00_gvMain_ctl04_hlTitle").GetAttribute("href").ToString
Use a CSS selector to get the element then access its href attribute.
#ctl00_gvMain_ctl03_hlTitle
The above is element with id ctl00_gvMain_ctl03_hlTitle. "#" means id.
Debug.Print IE.document.querySelector("#ctl00_gvMain_ctl03_hlTitle").href

VBA Script pull data from website

I want to pull the data from http://www.buyshedsdirect.co.uk/ to get the most recent prices of specific items.
I have an excel spreadsheet with the following:
|A | B
1 |Item |Price
2 |bfd/garden-structures/arches/premier-arches-pergola
and the VBA script:
Dim ie As New InternetExplorer
Dim item As String
item = Sheet1.Range("A2").Value
Dim doc As HTMLDocument
ie.Visible = True
ie.navigate "http://www.buyshedsdirect.co.uk/" & item
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
On Error Resume Next
output = doc.getElementByClass("NowValue").innerText
Sheet1.Range("B2").Value = output
ie.Quit
End Sub
i am new to VBA scripting and have no idea why it isnt pulling the value form the class "NowValue"
Any help would be appreciated :)
The On Error Resume Next line is stopping an error message from being displayed. That error message would be that there is no method on HTMLDocument called "getElementByClass". You probably want "getElementsByClassName" instead and will have to handle the fact that this returns a collection rather than a single element. Code like this would work:
Option Explicit
Sub foo()
Dim ie As New InternetExplorer
Dim item As String
item = Sheet1.Range("A2").Value
Dim doc As HTMLDocument
ie.Visible = True
ie.navigate "http://www.buyshedsdirect.co.uk/" & item
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Dim results As IHTMLElementCollection
Dim result As IHTMLElement
Dim output As String
Set results = doc.getElementsByClassName("NowValue")
output = ""
For Each result In results
output = output & result.innerText
Next result
Sheet1.Range("B2").Value = output
ie.Quit
End Sub
You would then find that there are multiple elements with class "NowValue" on that page. It looks as though the one you want might be enclosed in a div called "VariantPrice" so this code should work:
Option Explicit
Sub bar()
Dim ie As New InternetExplorer
Dim item As String
item = Sheet1.Range("A2").Value
Dim doc As HTMLDocument
ie.Visible = True
ie.navigate "http://www.buyshedsdirect.co.uk/" & item
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Dim results As IHTMLElementCollection
Dim results2 As IHTMLElementCollection
Dim result As IHTMLElement
Dim result2 As IHTMLElement
Dim output As String
Set results = doc.getElementsByClassName("VariantPrice")
output = ""
For Each result In results
Set results2 = result.getElementsByClassName("NowValue")
For Each result2 In results2
output = output & result2.innerText
Next result2
Next result
Sheet1.Range("B2").Value = output
ie.Quit
End Sub
edit: as the code above works perfectly for me but fails to work for the question asker, it may be the case that they are using an older version of Internet Explorer which does not support getElementsByClassName. It may be the case that using querySelector will work instead. To be certain, go to this QuirksMode page to determine exactly what your browser supports.
New code using querySelector:
Option Explicit
Sub bar()
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
Dim result As IHTMLElement
Dim result2 As IHTMLElement
Dim item As String
item = Sheet1.Range("A2").Value
ie.Visible = True
ie.navigate "http://www.buyshedsdirect.co.uk/" & item
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set result = doc.querySelector(".VariantPrice")
Set result2 = result.querySelector(".NowValue")
Sheet1.Range("B2").Value = result2.innerText
ie.Quit
End Sub
further edit: to make the macro loop through all of the entries in column A, here are the relevant bits to add or change:
Option Explicit
Sub bar()
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
Dim result As IHTMLElement
Dim result2 As IHTMLElement
Dim item As String
Dim lRow As Long
ie.Visible = True
lRow = 2
item = Sheet1.Range("A" & lRow).Value
Do Until item = ""
ie.navigate "http://www.buyshedsdirect.co.uk/" & item
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set result = doc.querySelector(".VariantPrice")
Set result2 = result.querySelector(".NowValue")
Sheet1.Range("B" & lRow).Value = result2.innerText
lRow = lRow + 1
item = Sheet1.Range("A" & lRow).Value
Loop
ie.Quit
End Sub