Reading HTML file in VBA Excel - vba

i want to read the HTML code in VBA (something like URL in Java). I need to save it in a string. I parse it afterwards.
alpan67

Here's a function for you. It will return the String of the HTML returned by a given URL.
Function GetHTML(URL As String) As String
Dim HTML As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
GetHTML = .ResponseText
End With
End Function
Just make sure your provided URL is well formed. I.E. it includes the http:// or https:// if appropriate.
For Example: GetHtml("www.google.com") is incorrect. You would want GetHtml("https://www.google.com/")

To do the URL DECODING you may use that post as a reference.
Here goes a article that uses MS Word to save html of web page as text.
Code in Excel VBA - from VBAExpress: I wouldn't fancy copying his code. You may give it a try and comment.

Related

VB macro changing encoding on REST request

I have an excel file with a button associated to a VB macro like this:
Sub button_macro()
Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
MyRequest.Open "GET", "http://theurl/service?param=""Gestión"""
MyRequest.Send
End Sub
But the response I get is something like:
"Gestión" is not a valid value for param.
How can I avoid VB converting the 'ó' character to another encoding?
If I send the request via browser like:
http://theurl/service?param="Gestión"
The service answer as desired.
EDIT:
Curiously MsgBox "Código único" works as expected showing the 'ó' and 'ú' characters correctly.
It works in the browser because the browser encodes the request for you - you can see that if you use your browser's Developer tools - eg for your "example" URL of
http://theurl/service?param="Gestión"
I see
http://theurl/service?param=%22Gesti%C3%B3n%22
You can encode your parameters using (eg) the UTF-supporting method from the accepted answer here: How can I URL encode a string in Excel VBA?

Why do i get "Argument is not optional" error on simple url call?

My goal is to check the internet connecton in VBA but when i try to call a url with request.open I get the error message "Argument is not optional".
Sadly internet research has no yielded any results.
This is my code:
On Error GoTo NoConnectionErrorHandling
Dim Request As MSXML2.XMLHTTP60
Request.Open "http://www.google.com"
Request.send
MsgBox Request.Status
It hangs itself in the third line of the shown code.
I hope someone can help me as i have very very little experience in VBA yet.
You have to specify the type of the request. It can be "GET", "POST" or something else.
See here for the open Method:
https://msdn.microsoft.com/en-us/library/ms757849(v=vs.85).aspx
These are the parameters:
bstrMethod
The HTTP method used to open the connection, such as GET, POST, PUT, or PROPFIND. For XMLHTTP, this parameter is not case-sensitive. The verbs TRACE and TRACK are not allowed when IXMLHTTPRequest is hosted in the browser. What is the difference between POST and GET?
bstrUrl
The requested URL. This can be either an absolute URL, such as "http://Myserver/Mypath/Myfile.asp", or a relative URL, such as "../MyPath/MyFile.asp".
varAsync[optional]
bstrUser[optional]
bstrPassword[optional]
As you see, the method needs at least two parameters (the other 3 are optional) thus 1 is not enough.
You need to declare Request with the keyword New. Thus, something like the following piece works:
Public Sub TestMe()
Dim Request As New MSXML2.XMLHTTP60
Request.Open "GET", "http://www.bbc.com"
Request.send
MsgBox Request.Status
End Sub
Whenever you are using libraries outside the standard VBA libraries it is a good idea to do one of the following 2:
Provide information for the library:
Use late binding:
Dim Request As Object
Set Request = CreateObject("Msxml2.ServerXMLHTTP.6.0")

Web-Crawler for VBA

I am trying to program a Webcrawler, using Visual Basic. I have a list with links, stored in an Excel (column 1). The Macro should then open each link and add certain information from the website to the excel file.
Here's the first link (stored in field A2).
The Macro should identify and insert the name of the hotel into column 2 (B2), the rating in column 3 (C2) and the address in column 4 (D2). This process could then be repeated with a loop for all other links (all websites have the same structure).
My code so far (I did not add the loop yet):
Sub Hoteldetails()
Dim IEexp As Object
Set IEexp = CreateObject("InternetExplorer.Application")
IEexp.Visible = True
Range("A2").Select
Selection.Hyperlinks(1).Follow NewWindow:=False, AddHistory:=True
End Sub
How can I "select" the specific data I want and insert it into the excel file? I tried to record the macro via "Add Data", but was not able to import the data from the website. I also tried to do it by using various example codes, but it did not work out for my specific website.
Thanks a lot for any assistance!
tl;dr;
I am not going to do all the work for you but this is fairly easy if the pages have the same structure.
You can issue a browserless XMLHTTP request, to get a nice fast response, and then select the items of interest using either id or classname and collection index.
Here is an example, using the link you provided, which you can adapt into a loop over all links.
Webpage view:
Code output:
VBA:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, HTML As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.tripadvisor.co.uk/Hotel_Review-g198832-d236315-Reviews-Grand_Hotel_Kronenhof-Pontresina_Engadin_St_Moritz_Canton_of_Graubunden_Swiss_Alps.html", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With HTML
.body.innerHTML = sResponse
Debug.Print "HotelName: " & .getElementById("HEADING").innerText
Debug.Print "Address: " & .getElementsByClassName("detail")(0).innerText
Debug.Print "Rating: " & .getElementsByClassName("overallRating")(0).innerText
End With
End Sub
References:
VBE > Tools > References > HTML Object Library
You have several options:
Option 1: IEObject
Either you need to use the getElementBy methods in IEObject and use string manipulation to extract the data you need. 2 options for string extractions:
Extracting a top-level element by Name or by Id then use string manipulation functions such as Mid, InStr, Left and Right
Use Regex (VBA Vbscript object) to extract the data (recommended)
Option 2: Scrape HTML Add-In
Sometime ago I developed an AddIn for Excel that allows you to easily scrape HTML data within an Excel formula. The process is similar as above as you still need to create a relevant Regex. See an example below for TripAdvisor:
The formula in B2 looks like this (A2 is the link, and the second argument is the Regex):
=GetElementByRegex(A2;"<h1 id=""HEADING"".*?>(?:(?:.|\n)*?)</div>((?:.|\n)*?)</h1>")
You can download the AddIn here:
http://www.analystcave.com/excel-tools/excel-scrape-html-add/

Use live feeds (RSS?) to convert USD to GBP in VB form

I've searched the web and can't find the exact thing that I'm looking for - this could be because it doesn't exist, but I'll ask here anyway...
I want to use a pretty simple from written using VB, I've worked with things like RSS feeds before, but only in a HTML environment. I was wondering if there is a way to have a section of the form as a USD -> GBP converter, using a live exchange rate. Is this possible? If so, does anyone know a good source to get the live feed from?
Any ideas, code, suggestions and criticism is welcome.
Thanks for your time.
Cal.
The simplest way is to query the Yahoo currency converter using this function:
Public Function GetHTML(ByVal sURL As String) As String
'Function returns the contents of the web page at sURL
'(or the contents of the 404/error info, if sURL is
'invalid, incorrectly formatted, etc.)
Dim xmlHttp As Object
Set xmlHttp = CreateObject("MSXML2.ServerXmlHttp")
xmlHttp.Open "GET", sURL, False
xmlHttp.send
GetHTML = xmlHttp.responseText
Set xmlHttp = Nothing
End Function
Notice that it's using ServerXmlHttprather than just XmlHttp. This is because the latter was returning an Access denied error when I tried it, I'm not sure why.
Call it from your form code using something like this:
lblRate.caption = GetHTML("http://finance.yahoo.com/d/quotes.csv?&f=l1&s=USDGBP=X")
Note:
AFAICT there isn't any official documentation on the Yahoo API. But there are plenty of examples of it's usage if you search online.

ASPTwitter library fails when using special characters

I am trying to update an old ASP classic Twitter program that my work currently uses to use the new OAUTH. I am not an ASP programmer but I managed to find the ASPTwitter library posted online by Tim Acheson at http://www.timacheson.com/Blog/2013/jun/asptwitter
Everything works, as we have our own code searching our database and passing on a built string to the ASPTwitter code to tweet.
The catch is that it will fail with the
{"errors":[{"message":"Could not authenticate you","code":32}]}
error message if there is so much as a "." period in the string. Every possible special character besides letters and numbers causes a fail.
We have many posts that will include various symbols as well as URLs.
I have searched all over and have not been able to find a solution. Comments on Tim's site have mentioned it but no solutions yet. Everyone here has been very helpful so I was hoping someone might have a solution.
I can't post the code as there are about 6 files and I don't know which one is causing the issue.
Thank you so much for the help!
Edit:
This is a block of the 300+ line file where the issue happens, I hope that the cause can be found here too.
' Gets bearer token for application-only authentication from Twitter API 1.1.
' Application-user authentication: https://dev.twitter.com/docs/auth/using-oauth
' and: https://dev.twitter.com/docs/auth/authorizing-request
' API endpoint statuses/update (post a tweet): https://dev.twitter.com/docs/api/1.1/post/statuses/update
Private Function UpdateStatusJSON(sStatus)
Dim sURL : sURL = API_BASE_URL + "/1.1/statuses/update.json"
Dim oXmlHttp: Set oXmlHttp = Server.CreateObject("MSXML2.ServerXMLHTTP")
oXmlHttp.open "POST", sURL, False
sStatus = "this is from the ASPTwitter dot asp file"
oXmlHttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded;charset=UTF-8"
oXmlHttp.setRequestHeader "User-Agent", "emiratesjobpost.com"
oXmlHttp.setRequestHeader "Authorization", "OAuth " & GetOAuthHeader(sURL, sStatus)
oXmlHttp.send "status=" & Server.URLEncode(sStatus) ' Encoded spaces as + in request body.
UpdateStatusJSON = oXmlHttp.responseText
Set oXmlHttp = Nothing
REM: A JSON viewer can be useful here: http://www.jsoneditoronline.org/
' To fix error message "Read-only application cannot POST" go to your application's "Application Type" settings at dev.twitter.com/apps and set "Access" to "Read and Write".
' After changing access to read/write you must click the button to generate new auth tokens and then use those.
Response.Write "<textarea cols=""100"" rows=""3"" >" & UpdateStatusJSON & "</textarea>" : Response.Flush()
End Function
If I replace the "dot" with "." in the "sStatus" line, it breaks
Do your pages use UTF-8 encoding? Open them in Notepad, select save as from the file menu, and if ansi coding is selected then change it to UTF-8.
See this link for more things you can do
http://www.hanselman.com/blog/InternationalizationAndClassicASP.aspx