Get data out of a webpage with VBA - vba

I'm trying to get some data out of this Webpage:
http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html
I want to have the whole table on the right site for every Smartphone and every contract type.
one time payment for Setting up: Anschlusspreis
one time payment for the phone: Smartphone
total amount: Gesamt
Monthly payment for the contract: Basispreis
Monthly payment for the mobile phone: Smartphone-Zuzahlung
This is all stored in the JavaScript part which is a huge amout of letters.
I´m trying to use Excel VBA:
Sub Button1_Click()
'On Error GoTo Errorhandler
Dim ie As Object, doc As Object, rng As Object, ticker As String, quote As String
Set ie = CreateObject("InternetExplorer.Application")
i = 1
'Application.StatusBar = "Your request is loading. Please wait..."
ie.navigate "http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html"
'ie.navigate ticker
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
quote = doc.getElementsByID("connectionFeeVal").innerText
Cells(3, 3).Value = quote
MsgBox ("done")
'Errorhandler:
'If Err = 91 Then MsgBox "Error Message"
ie.Application.Quit
End Sub
But it is continuously Looping at "DoEvents".
Does someone have an idea why and how I can solve this and maybe another idea how to get all this data out of this page.
Thank you in advance.

instead of using the IE automation you can also use a http request object:
Dim oRequest As Object
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
oRequest.Open "GET", "http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html"
oRequest.Send
MsgBox oRequest.ResponseText
it's faster and doesn't need as many ressources as the solution with the IE
if you are behind a proxy server you can use something like this:
Const HTTPREQUEST_PROXYSETTING_PROXY = 2
Dim oRequest As Object
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
oRequest.setProxy HTTPREQUEST_PROXYSETTING_PROXY, "http://proxy.intern:8080"
oRequest.Open "GET", "http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html"
oRequest.Send
MsgBox oRequest.ResponseText
of course you have to adjust the proxy to your values.
As you are interessted in a german page here also a short explanation in german language: http://cboden.de/softwareentwicklung/vba/tipps-tricks/27-webseite-per-vba-ansprechen
There is also explained how to pass values of a form to the webserver which might also be helpfull for you.

Instead of:
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
You could try:
Do While ie.Busy: DoEvents: Loop
Do Until ie.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
with ie
...
end with

Related

Why is XMLMHTTP.readystate not equal to 4?

running following code i get a .readystate = 1, i do not understand why. Is anyone able to help me ? As well and more globally, i am looking for some documentation on using MSXML2.ServerXMLHTTP or XMLHTTP60, is there a website for noobs on this topic ?*
thanks in advance !
Sub a()
Dim wwwPage As Object
Dim Email As String
DoEvents
On Error Resume Next
Set wwwPage = CreateObject("MSXML2.ServerXMLHTTP")
With wwwPage
.Open "GET", "https://runninggeek.be/annuaire-des-clubs/", False
.send
Position = 1
If .Status = 200 And .readyState = 4 Then
Email = "xxx"
End If
End With
End Sub
Am expecting .readystate 4. I see sometimes we need to use headers but i do not know why nor how, nor even how i can find what header to use

Download google script results to Excel VBA

I have a webApp (google sheets) that should transfer data to an Excel file.
In the script i'm currently returning something with ContentService.createTextOutput(), see this link
In the VBA i tried with different codes, for example with
Sub Test()
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", "https://script.google.com/macros/s/AKfycbxU2ZL39IdtMzQXu0OLJZz3shSOx1JNTCbe1_SCxunIimLJVqY/exec", False
WinHttpReq.Send
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.ResponseBody
oStream.SaveToFile "c:\test.txt", 2
oStream.Close
End If
End Sub
With Runtime error 70, I tried also with WinHttp.WinHttpRequest.5.1 but with other errors (I suppose timeout)
I can download the whole file with this link without problems, I see that the ContentService redirect the browser page (and the direct download does not) but I really don't know how to handle it with VBA (and I didn't find anything useful googleing)
Thanks for help
I found right now a solution with InternetExplorer.Application that is working, after hours of attempts and 10 minutes after making the question
Sub Test2()
'This will load a webpage in IE
Dim i As Long
Dim URL As String
Dim IE As Object
'Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
'Set IE.Visible = True to make IE visible, or False for IE to run in the background
IE.Visible = False
'Define URL
URL = "https://script.google.com/macros/s/AKfycbxU2ZL39IdtMzQXu0OLJZz3shSOx1JNTCbe1_SCxunIimLJVqY/exec"
'Navigate to URL
IE.Navigate URL
' Wait while IE loading...
'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
MsgBox (IE.document.body.innerText)
'Unload IE
Set IE = Nothing
End Sub

Unable to parse some links lying within an iframe

I've written a script in vba using IE to parse some links from a webpage. The thing is the links are within an iframe. I've twitched my code in such a way so that the script will first find a link within that iframe and navigate to that new page and parse the required content from there. If i do this way then I can get all the links.
Webpage URL: weblink
Successful approach (working one):
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim elem As Object, post As Object
With IE
.Visible = True
.navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set elem = .document.getElementById("compInfo") #it is within iframe
.navigate elem.src
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
For Each post In HTML.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've seen few sites where no such links exist within iframe so, I will have no option to use any link to track down the content.
If you take a look at the below approach by tracking the link then you can notice that I've parsed the content from a webpage which are within Iframe. There is no such link within Iframe to navigate to a new webpage to locate the content. So, I used contentWindow.document instead and found it working flawlessly.
Link to the working code of parsing Iframe content from another site:
contentWindow approach
However, my question is: why should i navigate to a new webpage to collect the links as I can see the content in the landing page? I tried using contentWindow.document but it is giving me access denied error. How can I make my below code work using contentWindow.document like I did above?
I tried like this but it throws access denied error:
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim frm As Object, post As Object
With IE
.Visible = True
.Navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
''the code breaks when it hits the following line "access denied error"
Set frm = HTML.getElementById("compInfo").contentWindow.document
For Each post In frm.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've attached an image to let you know which links (they are marked with pencil) I'm after.
These are the elements within which one such link (i would like to grab) is found:
<div class="news">
<span class="news-date_time"><img src="images/arrow.png" alt="">19 Jan 2018 00:01</span>
<a style="color:#5b5b5b;" href="/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019039003&opt=9">ABB India Limited - Press Release</a>
</div>
Image of the links of that page I would like to grab:
From the very first day while creating this thread I strictly requested not to use this url http://hindubusiness.cmlinks.com/Companydetails.aspx?cocode=INE117A01022 to locate the data. I requested any solution from this main_page_link without touching the link within iframe. However, everyone is trying to provide solutions that I've already shown in my post. What did I put a bounty for then?
You can see the links within <iframe> in browser but can't access them programmatically due to Same-origin policy.
There is the example showing how to retrieve the links using XHR and RegEx:
Option Explicit
Sub Test()
Dim sContent As String
Dim sUrl As String
Dim aLinks() As String
Dim i As Long
' Retrieve initial webpage HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/", False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Extract target iframe URL via RegEx
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all a within div.news
.Pattern = "<iframe[\s\S]*?src=""([^""]*?Companydetails[^""]*)""[^>]*>"
sUrl = .Execute(sContent).Item(i).SubMatches(0)
End With
' Retrieve iframe HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sUrl, False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Parse links via XHR
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all anchors within div.news
.Pattern = "<div class=""news"">[\s\S]*?href=""([^""]*)"
With .Execute(sContent)
ReDim aLinks(0 To .Count - 1)
For i = 0 To .Count - 1
aLinks(i) = .Item(i).SubMatches(0)
Next
End With
End With
Debug.Print Join(aLinks, vbCrLf)
End Sub
Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx.
The output for me as follows:
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17047038016&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17046039003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17045039006&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043039002&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043010019&opt=9
I also tried to copy the content of the <iframe> from IE to clipboard (for further pasting to the worksheet) using commands:
IE.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DODEFAULT
IE.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT
But actually that commands select and copy the main document, excluding the frame, unless I click on the frame manually. So that might be applied if click on the frame could be reproduced from VBA (frame node methods like .focus and .click didn't help).
Something like this should work. They key is to realize the iFrame is technically another Document. Reviewing the iFrame on the page you listed, you can easily use a web request to get at the data you need. As already mentioned, the reason you get an error is due to the Same-Origin policy. You could write something to get the src of the iFrame then do the web request as I've shown below, or, use IE to scrape the page, get the src, then load that page which looks like what you have done.
I would recommend using a web request approach, Internet Explorer can get annoying, fast.
Code
Public Sub SOExample()
Dim html As Object 'To store the HTML content
Dim Elements As Object 'To store the anchor collection
Dim Element As Object 'To iterate the anchor collection
Set html = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP")
'Navigate to the source of the iFrame, it's another page
'View the source for the iframe. Alternatively -
'you could navigate to this page and use IE to scrape it
.Open "GET", "https://stocks.thehindubusinessline.com/Companydetails.aspx?&cocode=INE117A01022"
.send ""
'See if the request was ok, exit it there was an error
If Not .Status = 200 Then Exit Sub
'Assign the page's HTML to an HTML object
html.body.InnerHTML = .responseText
Set Elements = html.body.document.getElementByID("hmstockchart_CompanyNews1_updateGLVV")
Set Elements = Elements.getElementsByTagName("a")
For Each Element In Elements
'Print out the data to the Immediate window
Debug.Print Element.InnerText
Next
End With
End Sub
Results
ABB India Limited - AGM/Book Closure
Board of ABB India recommends final dividend
ABB India to convene AGM
ABB India to pay dividend
ABB India Limited - Outcome of Board Meeting
More ?
The simple of solution like everyone suggested is to directly go the link. This would take the IFRAME out of picture and it would be easier for you loop through links. But in case you still don't like the approach then you need to get a bit deeper into the hole.
Below is a function from a library I wrote long back in VB.NET
https://github.com/tarunlalwani/ScreenCaptureAPI/blob/2646c627b4bb70e36fe2c6603acde4cee3354b39/Source%20Code/ScreenCaptureAPI/ScreenCaptureAPI/ScreenCapture.vb#L803
Private Function _EnumIEFramesDocument(ByVal wb As HTMLDocumentClass) As Collection
Dim pContainer As olelib.IOleContainer = Nothing
Dim pEnumerator As olelib.IEnumUnknown = Nothing
Dim pUnk As olelib.IUnknown = Nothing
Dim pBrowser As SHDocVW.IWebBrowser2 = Nothing
Dim pFramesDoc As Collection = New Collection
_EnumIEFramesDocument = Nothing
pContainer = wb
Dim i As Integer = 0
' Get an enumerator for the frames
If pContainer.EnumObjects(olelib.OLECONTF.OLECONTF_EMBEDDINGS, pEnumerator) = 0 Then
pContainer = Nothing
' Enumerate and refresh all the frames
Do While pEnumerator.Next(1, pUnk) = 0
On Error Resume Next
' Clear errors
Err.Clear()
' Get the IWebBrowser2 interface
pBrowser = pUnk
If Err.Number = 0 Then
pFramesDoc.Add(pBrowser.Document)
i = i + 1
End If
Loop
pEnumerator = Nothing
End If
_EnumIEFramesDocument = pFramesDoc
End Function
So basically this is a VB.NET version of below C++ version
Accessing body (at least some data) in a iframe with IE plugin Browser Helper Object (BHO)
Now you just need to port it to VBA. The only problem you may have is finding the olelib rerefernce. Rest most of it is VBA compatible
So once you get the array of object, you will find the one which belongs to your frame and then you can just that one
frames = _EnumIEFramesDocument(IE)
frames.Item(1).document.getElementsByTagName("A").length

VBA WebScraping content from nested frames: access denied

This isn't an important task, but it is one that I thought would be easy and has instead been very frustrating. I'm trying to grab the current print counter values from our MFP but I am not able to get past the second Frame on the page.
In fact, it seems to continually loop me back to the top as I try to go "deeper" into the page.
The data I am after is stored in the "contents" frame.
HTML Source - Nested Frames
Sub Copy_Count()
Dim IE As InternetExplorerMedium
Dim strURL() As String
Dim HTML_Doc As HTMLDocument
Dim HTML_Doc2 As HTMLDocument
' There will be additional machines to retrieve data from, current copier must navigate to main page first before counter.
strURL = Split("http://192.168.50.26/?MAIN=DEVICE,http://192.168.50.26/?MAIN=COUNTER&SUB=TOTAL", ",")
Set IE = New InternetExplorerMedium
IE.Navigate2 strURL(0)
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
IE.Navigate2 strURL(1)
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
Set HTML_Doc = IE.Document
' This get's me to "TopLevelFrame"
Debug.Print HTML_Doc.getElementsByTagName("frameset")(0).getElementsByTagName("frame")(0).Document.Body.innerHTML
' Assign the document of the TopLevelFrame
Set HTML_Doc2 = HTML_Doc.getElementsByTagName("frameset")(0).getElementsByTagName("frame")(0).Document
Debug.Print HTML_Doc2.getElementById("TotalFullColor")
Debug.Print HTML_Doc.getElementById("TotalFullColor").innerText
Debug.Print IE.Document.GetElementsByID("TotalFullColor")(1)
End Sub
Windows 10 Pro, IE 11, Office Pro Plus 2016

VBA access remote website, automate clicking submit after already automating clicking button on other webpage

I'm working with vba in excel 2010 and internet explorer 8 and Vista. The code below works to go to a remote website and post a form. On the resulting page, the code should click the "get estimates" button. Instead I get this error "object variable or with block variable not set". The highlighted problem line in the code is "estimate = ieApp.document.getElementById("btnRequestEstimates")".
I think part of the problem might be that the button that isn't working is a submit button that isn't part of a form. I am also wondering if the variables need to be reset before the 2nd button click. The error message implies this is a qualification problem, but I think it's a pretty standard way of qualifying an element in this situation. Those are some things I've been googling to no avail, I'm not really sure what the problem is.
Sub btn_version()
Dim ieApp As Object
Dim ieDoc As Object
Dim ieForm As Object
Dim ieObj As Object
Dim URL As String
Dim estimate As Object
URL = "http://www.craft-e-corner.com/p-2688-new-testament-cricut-cartridge.aspx"
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.navigate URL
While ieApp.Busy Or ieApp.readyState <> 4: DoEvents: Wend
Set ieDoc = ieApp.document
Set ieForm = ieDoc.forms(1)
For Each ieObj In ieForm.Elements
If ieObj.ClassName = "AddToCartButton" Then
ieObj.Click
End If
Next ieObj
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
While ieApp.Busy Or ieApp.readyState <> 4: DoEvents: Wend
estimate = ieApp.document.getElementById("btnRequestEstimates")
estimate.submit
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Sub
The code below combines your xmlhttp code from automate submitting a post form that is on a website with vba and xmlhttp (to give you better control on the POST, ie skipping your Set ieDoc = ieApp.document section in our question) with clicking the "btnRequestEstimates button on the final URl from this page
Sub Scrape2()
Dim objIE As Object
Dim xmlhttp As Object
Dim ieButton As Object
Dim strResponse As String
Dim strUrl As String
strUrl = "http://www.craft-e-corner.com/addtocart.aspx?returnurl=showproduct.aspx%3fProductID%3d2688%26SEName%3dnew-testament-cricut-cartridge"
Set objIE = CreateObject("InternetExplorer.Application")
objIE.navigate "about:blank"
Set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
'~~> Indicates that page that will receive the request and the type of request being submitted
xmlhttp.Open "POST", "http://www.craft-e-corner.com/addtocart.aspx?returnurl=showproduct.aspx%3fProductID%3d2688%26SEName%3dnew-testament-cricut-cartridge", False
'~~> Indicate that the body of the request contains form data
xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
'~~> Send the data as name/value pairs
xmlhttp.Send "Quantity=1&VariantID=2705&ProductID=2688"
strResponse = xmlhttp.responseText
objIE.navigate strUrl
objIE.Visible = True
Do While objIE.readystate <> 4
DoEvents
Loop
objIE.document.Write strResponse
Set xmlhttp = Nothing
Set ieButton = objIE.document.getelementbyid("btnRequestEstimates")
ieButton.Click
End Sub