Web Scraping using VBA and MSXML2.XMLHTTP library - vba

I'm trying to scrap data from a website using MSXML2.XMLHTTP object on VBA environment (Excel) and I cannot figure out how to solve this problem! The website is the following:
http://www.detran.ms.gov.br/consulta-de-debitos/
You guys can use the following test data to fill the form:
Placa: oon5868
Renavam: 1021783231
I want to retrieve data like "chassi", with the data above that would be " 9BD374121F5068077".
I do not have problems parsing the html document, the difficult is actually getting the information as response! Code below:
Sub SearchVehicle()
Dim strPlaca As String
Dim strRenavam As String
strPlaca = "oon5868"
strRenavam = "01021783231"
Dim oXmlPage As MSXML2.XMLHTTP60
Dim strUrl As String
Dim strPostData As String
Set oXmlPage = New MSXML2.XMLHTTP60
strUrl = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
strPostData = "placa=" & strPlaca & "&renavam=" & strRenavam
oXmlPage.Open "POST", strUrl, False
oXmlPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
oXmlPage.send strPostData
Debug.Print oXmlPage.responseText
End Sub
The strURL used in the POST method ".../retornooooveiculos.asp" is the one google developer tools and fiddler showed me that was the correct address the website was posting the payload.
When manually accessed, the website retrieve the correct information, but running my code I always get the following response on the .responseText:
<html>Acesse: <b><a href='http://www.detran.ms.gov.br target='_parent'>www.detran.ms.gov.br</a></b></html>
HELP PLEASE, I'm getting crazy trying to solve this puzzle! Why do I get redirected like this?
I need the "CHASSI" information and can't find the correct http Request to do this!

Try the below approach. It should fetch you the content you are after. The thing is you need to supply the Cookie copied from your Request Headers fields in order for your script to work which you can find using devtools.
Sub SearchVehicle()
Const URL As String = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
Dim HTTP As New ServerXMLHTTP60, HTML As New HTMLDocument
Dim elem As Object, splaca$, srenavam$, qsp$
splaca = "oon5868"
srenavam = "01021783231"
qsp = "placa=" & splaca & "&renavam=" & srenavam
With HTTP
.Open "POST", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Cookie", "ISAWPLB{07D08995-E67C-4F44-91A1-F6A16337ECD6}={286E0BB1-C5F9-4439-A2CE-A7BE8C3955E0}; ASPSESSIONIDSCSDSCTB=AGDPOBEAAPJLLMKKIGPLBGMJ; 69137927=967930978"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send qsp
HTML.body.innerHTML = .responseText
End With
For Each elem In HTML.getElementsByTagName("b")
If InStr(elem.innerText, "Chassi:") > 0 Then MsgBox elem.ParentNode.NextSibling.innerText: Exit For
Next elem
End Sub
Once again: fill in the Cookie field by collecting it using your devtools (from Request Headers section), if for some reason my provided Cookie doesn't work for you. Thanks.
Output I'm getting:
9BD374121F5068077

Related

Catch the POST Request Response and the Redirected URL from XMLHTTP request with VBA

I'm trying to catch a Response to a POST Request using XMLHTTP using the code below
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim htmlEle1 As MSHTML.IHTMLElement
Dim htmlEle2 As MSHTML.IHTMLElement
Dim URL As String
Dim elemValue As String
URL = "https://www.informadb.pt/pt/pesquisa/?search=500004064"
XMLPage.Open "GET", URL, False
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
For Each htmlEle1 In HTMLDoc.getElementsByTagName("div")
Debug.Print htmlEle1.className
If htmlEle1.className = "styles__SCFileModuleFooter-e6rbca-1 kUUNkj" Then
elemValue = Trim(htmlEle1.innerText)
If InStr(UCase$(elemValue), "CONSTITU") > 0 Then
'Found Value
Exit For
End If
End If
Next htmlEle1
The problem is that I can't find the ClassName "styles__SCFileModuleFooter-e6rbca-1 kUUNkj", because I notice that when I manually insert the value (500004064) in the search box of the URL : https://www.informadb.pt/pt/pesquisa/, the Web Page generates addicinal traffic and turns up at an end point URL : https://www.informadb.pt/pt/pesquisa/empresa/?Duns=453060832, where that className can be found in the Request ResponseText.
My goal is to use the First URL to retrieve the Duns number '453060832', to be able to access the information in the ResponseText of the EndPoint URL. And to catch Duns Number, I need to find a way to get the Endpoint URL, or try to get The POST request response below, and get that value using JSON parser:
{'TotalResults': 1,
'NumberOfPages': 1,
'Results': [{'Duns': '453060832',
'Vat': '500004064',
'Name': 'A PANIFICADORA CENTRAL EBORENSE, S.A.',
'Address': 'BAIRRO DE NOSSA SENHORA DO CARMO,',
'Locality': 'ÉVORA',
'OfficeType': 'HeadOffice',
'FoundIn': None,
'Score': 231.72766,
'PageUrl': '/pt/pesquisa/empresa/?Duns=453060832'}]}
I'm not being able to capture what is really happening using the XMLHTTP Browser request, that seems to be the below steps:
navigate to https://www.informadb.pt/pt/pesquisa/?search=500004064
Webpage generates additional traffic
Amongst that additional traffic is an API POST XHR request which
returns search results as JSON. That request goes to
https://www.informadb.pt/Umbraco/Api/Search/Companies and includes
the 500004064 identifier amongst the arguments within the post body
Based on the API results the browser ends up at the following URI
https://www.informadb.pt/pt/pesquisa/empresa/?Duns=453060832
Can someone help me please, I have to do it using VBA.
Thanks in advance.
A small example how to POST data to your website using VBA, and how to use bare-bones string processing to extract data from the result, as outlined in my comments above.
Function GetVatId(dunsNumber As String) As String
With New MSXML2.XMLHTTP60
.open "POST", "https://www.informadb.pt/Umbraco/Api/Search/Companies", False
.setRequestHeader "Content-Type", "application/json"
.send "{""Page"":0,""PageSize"":5,""SearchTerm"":""" & dunsNumber & """,""Filters"":[{""Key"":""districtFilter"",""Name"":""Distrito"",""Values"":[]},{""Key"":""legalFormFilter"",""Name"":""Forma Jurídica"",""Values"":[]}],""Culture"":""pt""}"
If .status = 200 Then
MsgBox "Response: " & .responseText, vbInformation
GetVatId = Mid(.responseText, InStr(.responseText, """Vat"":""") + 7, 9)
Else
MsgBox "Repsonse status " & .status, vbExclamation
End If
End With
End Function
Usage:
Dim vatId As String
vatId = GetVatId("453060832") ' => "500004064"
For a more robust solution, you should use a JSON parser and -serializer, something like https://github.com/VBA-tools/VBA-JSON.

Can't access some tabular content using xmlhttp requests

I've created two scripts using vba to fetch tabular content from a webpage. To access the tabular content, I had to first fill in with 410611 right next to UID# under Search by UID. Then, I hit the go button and I got to see the tabular content.
I found success using selenium:
Sub FetchContent()
Dim oInput As Object, oElem As Object
With CreateObject("Selenium.ChromeDriver")
.get "https://www.theclearinghouse.org/uid-lookup"
.SwitchToFrame .FindElementById("uidlookup", timeout:=10000)
Set oInput = .FindElementByCss("input[name='UidNum']", timeout:=5000)
oInput.SendKeys "410611"
.FindElementByCss("input#SubmitUid", timeout:=5000).Click
Set oElem = .FindElementByCss("table#uid", timeout:=5000)
MsgBox oElem.Text
End With
End Sub
However, I wish to get the same content using xmlhttp requests. I tried like the following but unfortunately it doesn't work. The script neither throws any error nor prints anything in the immediate window. I found success using the same logic in python, though.
Using xmlhttp requests (doesn't work):
Sub FetchContent()
Const URL = "https://uid.theclearinghouse.org/list.php"
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", URL, False
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.setRequestHeader "referer", "https://uid.theclearinghouse.org/index.php"
.send ("UidNum=410611&SubmitUid=Go%21")
Debug.Print .responseText
End With
End Sub
How can I access the tabular content using xmlhttp requests?

Connecting to Web API with Cookie Authentication and CSRF Token

*UPDATE AT THE END
I need help with using an API to authenticate into https://connect.garmin.com/signin/.
I am using VBA and Power Query to automate the collecting of workout data from my Garmin account. As far as I can tell, the website uses cookie based authentication and a CSRF token.
I am testing my API calls in Postman, building the authentication request in VBA, and performing data collection in Power Query with the authenticated cookies. (I would use Power Query for the whole project, but it is not able to return the response headers/authenticated cookies)
I am trying to replicate the browser actions by collecting the cookies from response headers and CSRF token from the HTML body before making the authentication POST request, but I am not having much success. When I try to authenticate using this method the response status is 200 and response body is the login page with a "Something went wrong" message.
I have tried to follow OmegaStripes answer from this question to get a handle on the cookies,
How to set and get JSESSIONID cookie in VBA
Is there a special way to handle CSRF tokens in Postman or VBA MSXML2.ServerXMLHTTP? Is there something I am fundamentally not understanding about cookie authentication?
If you need any further info from me, please let me know. Any help is greatly appreciated!
My modified version of OmegaStripes code is below,
Option Explicit
Sub GetCookie()
Dim sUrl, sRespHeaders, sRespText, aSetHeaders, aList, aSetBody, sCSRFToken, sBody
sBody = "username=USERNAME&password=PASSWORD&embed=false&_csrf="
' get cookie 1
sUrl = "https://connect.garmin.com/signin/"
XmlHttpRequest "GET", sUrl, Array(), "", sRespHeaders, sRespText
ParseResponse "^Set-(Cookie): (\S*?=\S*?);[\s\S]*?$", sRespHeaders, aSetHeaders
' get cookie 2 and CSRF Token
sUrl = "https://sso.garmin.com/sso/signin?service=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&webhost=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&source=https%3A%2F%2Fconnect.garmin.com%2Fsignin%2F&redirectAfterAccountLoginUrl=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&redirectAfterAccountCreationUrl=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&gauthHost=https%3A%2F%2Fsso.garmin.com%2Fsso&locale=en_US&id=gauth-widget&cssUrl=https%3A%2F%2Fconnect.garmin.com%2Fgauth-custom-v1.2-min.css&privacyStatementUrl=https%3A%2F%2Fwww.garmin.com%2Fen-US%2Fprivacy%2Fconnect%2F&clientId=GarminConnect&displayNameShown=false&consumeServiceTicket=false&generateExtraServiceTicket=true&generateTwoExtraServiceTickets=false&generateNoServiceTicket=false&globalOptInShown=true&globalOptInChecked=false&connectLegalTerms=true&locationPromptShown=true&showPassword=true"
XmlHttpRequest "GET", sUrl, aSetHeaders, "", sRespHeaders, sRespText
' parse project names
ParseResponse "^Set-(Cookie): (\S*?=\S*?);[\s\S]*?$", sRespHeaders, aSetHeaders
sCSRFToken = GetCSRFToken("name=" & Chr(34) & "_csrf" & Chr(34) & " value=" & Chr(34), sRespText)
' get authenticated cookies
sUrl = "https://sso.garmin.com/sso/signin?service=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&webhost=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&source=https%3A%2F%2Fconnect.garmin.com%2Fsignin%2F&redirectAfterAccountLoginUrl=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&redirectAfterAccountCreationUrl=https%3A%2F%2Fconnect.garmin.com%2Fmodern%2F&gauthHost=https%3A%2F%2Fsso.garmin.com%2Fsso&locale=en_US&id=gauth-widget&cssUrl=https%3A%2F%2Fconnect.garmin.com%2Fgauth-custom-v1.2-min.css&privacyStatementUrl=https%3A%2F%2Fwww.garmin.com%2Fen-US%2Fprivacy%2Fconnect%2F&clientId=GarminConnect&displayNameShown=false&consumeServiceTicket=false&generateExtraServiceTicket=true&generateTwoExtraServiceTickets=false&generateNoServiceTicket=false&globalOptInShown=true&globalOptInChecked=false&connectLegalTerms=true&locationPromptShown=true&showPassword=true"
XmlHttpRequest "POST", sUrl, aSetHeaders, sBody & sCSRFToken, sRespHeaders, sRespText
' parse project names
ParseResponse "^Set-(Cookie): (\S*?=\S*?);[\s\S]*?$", sRespHeaders, aSetHeaders
End Sub
Sub XmlHttpRequest(sMethod, sUrl, aSetHeaders, sPayload, sRespHeaders, sRespText)
Dim aHeader
With CreateObject("MSXML2.ServerXMLHTTP")
'.SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
.Open sMethod, sUrl, False
For Each aHeader In aSetHeaders
.SetRequestHeader aHeader(0), aHeader(1)
Next
.Send (sPayload)
sRespHeaders = .GetAllResponseHeaders
sRespText = .ResponseText
End With
End Sub
Sub ParseResponse(sPattern, sResponse, aData)
Dim oMatch, aTmp, sSubMatch
If IsEmpty(aData) Then
aData = Array()
End If
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = sPattern
For Each oMatch In .Execute(sResponse)
If oMatch.SubMatches.Count = 1 Then
PushItem aData, oMatch.SubMatches(0)
Else
aTmp = Array()
For Each sSubMatch In oMatch.SubMatches
PushItem aTmp, sSubMatch
Next
PushItem aData, aTmp
End If
Next
End With
End Sub
Function GetCSRFToken(sPattern, sResponse)
Dim lStart, lLength, sSubMatch
lStart = InStr(1, sResponse, sPattern) + Len(sPattern)
lLength = InStr(lStart, sResponse, Chr(34)) - lStart
GetCSRFToken = Mid(sResponse, lStart, lLength)
End Function
Sub PushItem(aList, vItem)
ReDim Preserve aList(UBound(aList) + 1)
aList(UBound(aList)) = vItem
End Sub
EDIT
I have changed approach to creating an instance of Internet Explorer, and automating the log in process. Not ideal, but I am able to sign in successfully. My issue now is I am not able to retrieve the HttpOnly Cookies required to maintain the logged in session using,
getCookie = objIE.document.Cookie
I have tried another of OmegaStripes answers (coincidentally) Retrieve ALL cookies from Internet Explorer but was unsuccessful.
Any suggestions on how to get this working with MSXML2.ServerXMLHTTP, an IE instance, or any other method would be greatly appreciated!

REST Interacting with BigCommerce from VB EXCEL

NOTE: Further research (and thick skin, and lots of coffee) has lead me here where it suggests that the store cannot receive the uid and pwd in the URL, and I gather from the PHP examples here that the transmission needs to be encrypted. Is this my problem? If so I have not been able to find anything on how to work around this.
I would be happy if the open() command prompted the UID & PWD window to open and I can manually input the values. Does anyone have a suggestion. Thanks!
I am learning how to use the BigCommerce APIs and I am programming in VB / XL to be able to directly post / retrieve from all fields in the store's DB.
I have never coded on this before (although I get around VB OK) and I am stuck. I have the following code:
Const URL As String = "https://www.myurl.com/api/v2/brands.json"
Public Sub Test()
Dim xmlHttp As Object
Set xmlHttp = CreateObject("MSXML2.ServerXMLHTTP.6.0")
xmlHttp.Open "GET", URL, False, "myid", "mytoken"
xmlHttp.setRequestHeader "Content-Type", "application/json"
xmlHttp.send
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
html.body.innerHTML = xmlHttp.ResponseText
Range("A1").Value = html.body.innerHTML
End Sub
What I get back is "401" or more precisely:
[{"status":401,"message":"No credentials were supplied in the request."}]
The credentials are valid and working, as if I place the URL in my browser and submit it, the pop up for UID & PWD comes up, and once I place the values I am using in the code, the resulting BRANDS list appears in the browser.
Additionally, if I add a "rand num" to the URL to ensure the returned data is not cached as suggested here, then the request looks like this:
xmlHttp.Open "GET", URL & "&t=" & WorksheetFunction.RandBetween(1, 99), False, `"myid", "mytoken"`
and the response back from BC changes to:
406</STATUS><MESSAGE>The requested content type is not available.</MESSAGE></ERROR></ERRORS>
I have also tried this (where UID and PWD are the correct values):
xmlHttp.Open "GET", URL & "&user=UID" & "&password=PWD", False
and i get:
406</STATUS><MESSAGE>The requested content type is not available.</MESSAGE></ERROR></ERRORS>
Can anyone help me understand pls. I have followed examples at MSDN, but I am still stuck. The BC API site is poor in examples (none actually!) for VB.
Thank you
Sorted! BC required this:
Set xmlHttp = CreateObject("MSXML2.ServerXMLHTTP.6.0")
xmlHttp.Open "GET", URL_Cat, UID, PWD
xmlHttp.setRequestHeader "Content-Type", "application/json"
xmlHttp.setRequestHeader "Authorization", "Basic " & Base64Encode(UID & ":" & PWD)
xmlHttp.send
I got Base64Encode code from here. Hope this helps someone.

How to browse through box.com folders to select a file and download the selected file

I need a excel macro (vba) to select a file from box.com by iterating though existing folders and at the same time I need to upload the file from my machine to box.com folder using excel macro. I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
Thanks in advance.
-Edit
I am using the below code for getting authentication token. But I am getting an error message at the place of .send(url). Error message is "The server name or address could not be resolved".
Function getAuthToken()
Dim WinHttpReq As WinHttp.WinHttpRequest
Dim api_key As String
api_key = "{api_key}"
Set WinHttpReq = New WinHttp.WinHttpRequest
strUrl = "https://www.box.net/api/1.0/rest?action=get_ticket&api_key=" & api_key
WinHttpReq.Open Method:="GET", url:=strUrl, async:=False
WinHttpReq.Send
getTicket = WinHttpReq.responseText
Debug.Print getTicket
End Function
Not being a vba expert, I suspect that you'll get more answers if you tag your question with a vba tag. However, some quick scanning around shows that vba can call REST apis by doing something like this:
Dim MyURL as String
MyURL = "http://api.box.com/2.0/folders/0.xml"
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
With objHTTP
.Open "GET", MyURL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "Authorization", "BoxAuth api_key=<your api key>&auth_token=<your auth token>
.send (MyURL)
End With
I'll defer to a real VBA expert, but something roughly along these lines should work.
Yeah, this is frustrating. I tried code like Peter's using both WinHttp.WinHttpRequest.5.1 and MSXML2.ServerXMLHTTP and with both I just get a zero-length string back. No error message or anything.
I installed cURL and tested the URL. It works fine there. The script below also works fine with a generic JSON web service, like jsonplaceholder.typicode.com.
All this makes me think that Box.com is receiving the message, detecting it is coming from a non-approved source, and returning nothing . . . probably for security reasons.
Option Explicit
Const URL As String = "https://api.box.com/2.0/folders/0 -H ""Authorization: Bearer MyToken"""
'Const URL As String = "https://jsonplaceholder.typicode.com/posts/1"
Sub Test()
Dim winHTTP As Object
' Set winHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
Set winHTTP = CreateObject("MSXML2.ServerXMLHTTP")
winHTTP.Open "GET", URL
winHTTP.setRequestHeader "Content-Type", "application/json"
winHTTP.send
Debug.Print winHTTP.ResponseText
If Len(winHTTP.ResponseText) = 0 Then
MsgBox "blank string returned"
Else
Dim objResponse As Object
Set objResponse = JsonConverter.ParseJson(winHTTP.ResponseText) 'Converter from Tim Hall - https://github.com/VBA-tools/VBA-JSON
End If
End Sub