Working on a html reader for a webpage, the problem is: The webpage got a loading screen. Any way to do this without using Internet Explorer application ?.
While IE.Busy
DoEvents
Wend
Do Until IE.statusText = "Done"
DoEvents
Loop
Do Until IE.readyState = 4
DoEvents
Loop
I would perfer to use something like this reader.
Function GetHTML(URL As String) As String
Dim HTML As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
GetHTML = .responseText
End With
End Function
Any ideas or leads will be most appreciated.
Related
I have a webApp (google sheets) that should transfer data to an Excel file.
In the script i'm currently returning something with ContentService.createTextOutput(), see this link
In the VBA i tried with different codes, for example with
Sub Test()
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", "https://script.google.com/macros/s/AKfycbxU2ZL39IdtMzQXu0OLJZz3shSOx1JNTCbe1_SCxunIimLJVqY/exec", False
WinHttpReq.Send
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.ResponseBody
oStream.SaveToFile "c:\test.txt", 2
oStream.Close
End If
End Sub
With Runtime error 70, I tried also with WinHttp.WinHttpRequest.5.1 but with other errors (I suppose timeout)
I can download the whole file with this link without problems, I see that the ContentService redirect the browser page (and the direct download does not) but I really don't know how to handle it with VBA (and I didn't find anything useful googleing)
Thanks for help
I found right now a solution with InternetExplorer.Application that is working, after hours of attempts and 10 minutes after making the question
Sub Test2()
'This will load a webpage in IE
Dim i As Long
Dim URL As String
Dim IE As Object
'Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
'Set IE.Visible = True to make IE visible, or False for IE to run in the background
IE.Visible = False
'Define URL
URL = "https://script.google.com/macros/s/AKfycbxU2ZL39IdtMzQXu0OLJZz3shSOx1JNTCbe1_SCxunIimLJVqY/exec"
'Navigate to URL
IE.Navigate URL
' Wait while IE loading...
'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
MsgBox (IE.document.body.innerText)
'Unload IE
Set IE = Nothing
End Sub
I've written a script in vba using IE to parse some links from a webpage. The thing is the links are within an iframe. I've twitched my code in such a way so that the script will first find a link within that iframe and navigate to that new page and parse the required content from there. If i do this way then I can get all the links.
Webpage URL: weblink
Successful approach (working one):
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim elem As Object, post As Object
With IE
.Visible = True
.navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set elem = .document.getElementById("compInfo") #it is within iframe
.navigate elem.src
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
For Each post In HTML.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've seen few sites where no such links exist within iframe so, I will have no option to use any link to track down the content.
If you take a look at the below approach by tracking the link then you can notice that I've parsed the content from a webpage which are within Iframe. There is no such link within Iframe to navigate to a new webpage to locate the content. So, I used contentWindow.document instead and found it working flawlessly.
Link to the working code of parsing Iframe content from another site:
contentWindow approach
However, my question is: why should i navigate to a new webpage to collect the links as I can see the content in the landing page? I tried using contentWindow.document but it is giving me access denied error. How can I make my below code work using contentWindow.document like I did above?
I tried like this but it throws access denied error:
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim frm As Object, post As Object
With IE
.Visible = True
.Navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
''the code breaks when it hits the following line "access denied error"
Set frm = HTML.getElementById("compInfo").contentWindow.document
For Each post In frm.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've attached an image to let you know which links (they are marked with pencil) I'm after.
These are the elements within which one such link (i would like to grab) is found:
<div class="news">
<span class="news-date_time"><img src="images/arrow.png" alt="">19 Jan 2018 00:01</span>
<a style="color:#5b5b5b;" href="/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019039003&opt=9">ABB India Limited - Press Release</a>
</div>
Image of the links of that page I would like to grab:
From the very first day while creating this thread I strictly requested not to use this url http://hindubusiness.cmlinks.com/Companydetails.aspx?cocode=INE117A01022 to locate the data. I requested any solution from this main_page_link without touching the link within iframe. However, everyone is trying to provide solutions that I've already shown in my post. What did I put a bounty for then?
You can see the links within <iframe> in browser but can't access them programmatically due to Same-origin policy.
There is the example showing how to retrieve the links using XHR and RegEx:
Option Explicit
Sub Test()
Dim sContent As String
Dim sUrl As String
Dim aLinks() As String
Dim i As Long
' Retrieve initial webpage HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/", False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Extract target iframe URL via RegEx
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all a within div.news
.Pattern = "<iframe[\s\S]*?src=""([^""]*?Companydetails[^""]*)""[^>]*>"
sUrl = .Execute(sContent).Item(i).SubMatches(0)
End With
' Retrieve iframe HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sUrl, False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Parse links via XHR
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all anchors within div.news
.Pattern = "<div class=""news"">[\s\S]*?href=""([^""]*)"
With .Execute(sContent)
ReDim aLinks(0 To .Count - 1)
For i = 0 To .Count - 1
aLinks(i) = .Item(i).SubMatches(0)
Next
End With
End With
Debug.Print Join(aLinks, vbCrLf)
End Sub
Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx.
The output for me as follows:
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17047038016&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17046039003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17045039006&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043039002&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043010019&opt=9
I also tried to copy the content of the <iframe> from IE to clipboard (for further pasting to the worksheet) using commands:
IE.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DODEFAULT
IE.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT
But actually that commands select and copy the main document, excluding the frame, unless I click on the frame manually. So that might be applied if click on the frame could be reproduced from VBA (frame node methods like .focus and .click didn't help).
Something like this should work. They key is to realize the iFrame is technically another Document. Reviewing the iFrame on the page you listed, you can easily use a web request to get at the data you need. As already mentioned, the reason you get an error is due to the Same-Origin policy. You could write something to get the src of the iFrame then do the web request as I've shown below, or, use IE to scrape the page, get the src, then load that page which looks like what you have done.
I would recommend using a web request approach, Internet Explorer can get annoying, fast.
Code
Public Sub SOExample()
Dim html As Object 'To store the HTML content
Dim Elements As Object 'To store the anchor collection
Dim Element As Object 'To iterate the anchor collection
Set html = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP")
'Navigate to the source of the iFrame, it's another page
'View the source for the iframe. Alternatively -
'you could navigate to this page and use IE to scrape it
.Open "GET", "https://stocks.thehindubusinessline.com/Companydetails.aspx?&cocode=INE117A01022"
.send ""
'See if the request was ok, exit it there was an error
If Not .Status = 200 Then Exit Sub
'Assign the page's HTML to an HTML object
html.body.InnerHTML = .responseText
Set Elements = html.body.document.getElementByID("hmstockchart_CompanyNews1_updateGLVV")
Set Elements = Elements.getElementsByTagName("a")
For Each Element In Elements
'Print out the data to the Immediate window
Debug.Print Element.InnerText
Next
End With
End Sub
Results
ABB India Limited - AGM/Book Closure
Board of ABB India recommends final dividend
ABB India to convene AGM
ABB India to pay dividend
ABB India Limited - Outcome of Board Meeting
More ?
The simple of solution like everyone suggested is to directly go the link. This would take the IFRAME out of picture and it would be easier for you loop through links. But in case you still don't like the approach then you need to get a bit deeper into the hole.
Below is a function from a library I wrote long back in VB.NET
https://github.com/tarunlalwani/ScreenCaptureAPI/blob/2646c627b4bb70e36fe2c6603acde4cee3354b39/Source%20Code/ScreenCaptureAPI/ScreenCaptureAPI/ScreenCapture.vb#L803
Private Function _EnumIEFramesDocument(ByVal wb As HTMLDocumentClass) As Collection
Dim pContainer As olelib.IOleContainer = Nothing
Dim pEnumerator As olelib.IEnumUnknown = Nothing
Dim pUnk As olelib.IUnknown = Nothing
Dim pBrowser As SHDocVW.IWebBrowser2 = Nothing
Dim pFramesDoc As Collection = New Collection
_EnumIEFramesDocument = Nothing
pContainer = wb
Dim i As Integer = 0
' Get an enumerator for the frames
If pContainer.EnumObjects(olelib.OLECONTF.OLECONTF_EMBEDDINGS, pEnumerator) = 0 Then
pContainer = Nothing
' Enumerate and refresh all the frames
Do While pEnumerator.Next(1, pUnk) = 0
On Error Resume Next
' Clear errors
Err.Clear()
' Get the IWebBrowser2 interface
pBrowser = pUnk
If Err.Number = 0 Then
pFramesDoc.Add(pBrowser.Document)
i = i + 1
End If
Loop
pEnumerator = Nothing
End If
_EnumIEFramesDocument = pFramesDoc
End Function
So basically this is a VB.NET version of below C++ version
Accessing body (at least some data) in a iframe with IE plugin Browser Helper Object (BHO)
Now you just need to port it to VBA. The only problem you may have is finding the olelib rerefernce. Rest most of it is VBA compatible
So once you get the array of object, you will find the one which belongs to your frame and then you can just that one
frames = _EnumIEFramesDocument(IE)
frames.Item(1).document.getElementsByTagName("A").length
I'm trying to get some data out of this Webpage:
http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html
I want to have the whole table on the right site for every Smartphone and every contract type.
one time payment for Setting up: Anschlusspreis
one time payment for the phone: Smartphone
total amount: Gesamt
Monthly payment for the contract: Basispreis
Monthly payment for the mobile phone: Smartphone-Zuzahlung
This is all stored in the JavaScript part which is a huge amout of letters.
I´m trying to use Excel VBA:
Sub Button1_Click()
'On Error GoTo Errorhandler
Dim ie As Object, doc As Object, rng As Object, ticker As String, quote As String
Set ie = CreateObject("InternetExplorer.Application")
i = 1
'Application.StatusBar = "Your request is loading. Please wait..."
ie.navigate "http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html"
'ie.navigate ticker
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
quote = doc.getElementsByID("connectionFeeVal").innerText
Cells(3, 3).Value = quote
MsgBox ("done")
'Errorhandler:
'If Err = 91 Then MsgBox "Error Message"
ie.Application.Quit
End Sub
But it is continuously Looping at "DoEvents".
Does someone have an idea why and how I can solve this and maybe another idea how to get all this data out of this page.
Thank you in advance.
instead of using the IE automation you can also use a http request object:
Dim oRequest As Object
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
oRequest.Open "GET", "http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html"
oRequest.Send
MsgBox oRequest.ResponseText
it's faster and doesn't need as many ressources as the solution with the IE
if you are behind a proxy server you can use something like this:
Const HTTPREQUEST_PROXYSETTING_PROXY = 2
Dim oRequest As Object
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
oRequest.setProxy HTTPREQUEST_PROXYSETTING_PROXY, "http://proxy.intern:8080"
oRequest.Open "GET", "http://www.vodafone.de/privat/handys-tablets-tarife/smartphone-tarife.html"
oRequest.Send
MsgBox oRequest.ResponseText
of course you have to adjust the proxy to your values.
As you are interessted in a german page here also a short explanation in german language: http://cboden.de/softwareentwicklung/vba/tipps-tricks/27-webseite-per-vba-ansprechen
There is also explained how to pass values of a form to the webserver which might also be helpfull for you.
Instead of:
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
You could try:
Do While ie.Busy: DoEvents: Loop
Do Until ie.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
with ie
...
end with
I have the following function which returns HTML document for the URL passed. I am using the returned HTML Doc in some other function.
The function works perfectly on Windows 7 but NOT on windows 8 unfortunately. How can I write code which works on Windows 7 and 8 both? I think I need to use a different version of XML HTTP object.
Function GetHtmlDoc(ByVal URL As String) As Object
Dim msg As String
' Reset the Global variables.
PageSrc = ""
Set htmlDoc = Nothing
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, True
.Send
While .readyState <> 4: DoEvents: Wend
a = .statusText
' Check for any server errors.
If .statusText <> "OK" Then
' Check for WinHTTP error.
'msg = GetWinHttpErrorMsg(.Status)
'If msg = "" Then msg = .statusText
' Return the error number and message.
GetPageSource = "ERROR: " & .Status & " - " & msg
Exit Function
End If
' Save the HTML code in the Global variable.
PageSrc = .responseText
End With
' Create an empty HTML Document.
Set htmlDoc = CreateObject("htmlfile")
htmlDoc.Open URL:="text/html", Replace:=False
' Convert the HTML code into an HTML Document Object.
htmlDoc.write PageSrc
' Terminate Input-Output with the Global variable.
htmlDoc.Close
' Return the HTML text of the web page.
Set GetHtmlDoc = htmlDoc
End Function
Example function call:
Set htmlDoc = GetHtmlDoc("http://www.censusdata.abs.gov.au/census_services/getproduct/census/2011/quickstat/POA2155?opendocument&navpos=220")
XMLHTTP no longer likes accessing remote servers from local scripts, switch to ServerXMLHTTP:
With CreateObject("MSXML2.ServerXMLHTTP")
.Open "GET", URL, False
(Using False performs the operation synchronously ngating the need for the readyState loop.)
I had same error turns out my code was working fine, just firewall blocked Excel so it failed.
I'm working with vba in excel 2010 and internet explorer 8 and Vista. The code below works to go to a remote website and post a form. On the resulting page, the code should click the "get estimates" button. Instead I get this error "object variable or with block variable not set". The highlighted problem line in the code is "estimate = ieApp.document.getElementById("btnRequestEstimates")".
I think part of the problem might be that the button that isn't working is a submit button that isn't part of a form. I am also wondering if the variables need to be reset before the 2nd button click. The error message implies this is a qualification problem, but I think it's a pretty standard way of qualifying an element in this situation. Those are some things I've been googling to no avail, I'm not really sure what the problem is.
Sub btn_version()
Dim ieApp As Object
Dim ieDoc As Object
Dim ieForm As Object
Dim ieObj As Object
Dim URL As String
Dim estimate As Object
URL = "http://www.craft-e-corner.com/p-2688-new-testament-cricut-cartridge.aspx"
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.navigate URL
While ieApp.Busy Or ieApp.readyState <> 4: DoEvents: Wend
Set ieDoc = ieApp.document
Set ieForm = ieDoc.forms(1)
For Each ieObj In ieForm.Elements
If ieObj.ClassName = "AddToCartButton" Then
ieObj.Click
End If
Next ieObj
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
While ieApp.Busy Or ieApp.readyState <> 4: DoEvents: Wend
estimate = ieApp.document.getElementById("btnRequestEstimates")
estimate.submit
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
End Sub
The code below combines your xmlhttp code from automate submitting a post form that is on a website with vba and xmlhttp (to give you better control on the POST, ie skipping your Set ieDoc = ieApp.document section in our question) with clicking the "btnRequestEstimates button on the final URl from this page
Sub Scrape2()
Dim objIE As Object
Dim xmlhttp As Object
Dim ieButton As Object
Dim strResponse As String
Dim strUrl As String
strUrl = "http://www.craft-e-corner.com/addtocart.aspx?returnurl=showproduct.aspx%3fProductID%3d2688%26SEName%3dnew-testament-cricut-cartridge"
Set objIE = CreateObject("InternetExplorer.Application")
objIE.navigate "about:blank"
Set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
'~~> Indicates that page that will receive the request and the type of request being submitted
xmlhttp.Open "POST", "http://www.craft-e-corner.com/addtocart.aspx?returnurl=showproduct.aspx%3fProductID%3d2688%26SEName%3dnew-testament-cricut-cartridge", False
'~~> Indicate that the body of the request contains form data
xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
'~~> Send the data as name/value pairs
xmlhttp.Send "Quantity=1&VariantID=2705&ProductID=2688"
strResponse = xmlhttp.responseText
objIE.navigate strUrl
objIE.Visible = True
Do While objIE.readystate <> 4
DoEvents
Loop
objIE.document.Write strResponse
Set xmlhttp = Nothing
Set ieButton = objIE.document.getelementbyid("btnRequestEstimates")
ieButton.Click
End Sub