Post request can't fetch response from targeted page - vba

Writing a macro in post request when i run it, it brings unexpected response which i don't want. Perhaps it is unable to fetch response from the targeted page. Can't identify the mistake I'm doing? The original url I'm pasting under my code.
Box to be checked before performing search:
Industry Role = Professional Services Providers
Other Criterion = APEX
Sub Xmlpost()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim Items As Object, Item As Object, Elem As Object
Dim postdata As String
postdata = "DoMemberSearch=1&mas_last=&mas_comp=&mas_city=&mas_stat=&mas_cntr=&mas_type=Professional+Services+Providers&OtherCriteria=1"
With http
.Open "POST", "https://www.infocomm.org/cps/rde/xchg/infocomm/hs.xsl/memberdirectory.htm", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
.send postdata
html.body.innerHTML = .responseText
End With
Set Items = html.getElementById("paginationDataPool").getElementsByTagName("a")
For Each Item In Items
x = x + 1
Cells(x, 1) = Item.innerText
Next Item
End Sub
Original Page:"https://www.infocomm.org/cps/rde/xchg/infocomm/hs.xsl/memberdirectory.htm"
Search should be made like:
The output I'm getting is this:

You are looking for elements that use the class paginationDisplayItem, but this class is only added dynamically by JavaScript running in your browser, looking like this:
<div class="paginationDisplayItem">
In your html object, however, there is just the plain HTML response from your POST request. Just save it to a file and have a look for yourself, instead of the class attribute the same div contains an id attribute:
<div id="paginationItem_1">
Each successive entry has that trailing number increased by one.
If you adapt your loop to retrieve elements based on that id, everything will work as you expect.
Proof of concept:
For x = 1 To 57
Set Item = html.getElementById("paginationItem_" & x)
Cells(x, 1) = Item.getElementsByTagName("a")(0).innerText
Next x
You will obviously not want to explicitly loop to 57 in all cases, so feel free to refactor this to your likings.
Btw.: You should declare Items As IHTMLElementCollection and Item As IHTMLElement - this way IntelliSense will work on your objects and you'll have type safety.

Related

Macro gets partial response using serverxmlhttp requests

I'm trying to extract street address along with the builder name from a webpage. When I use xmlhttp60 requests I get those fields accordingly. However, when I go for serverxmlhttp60 requests I get partial response most of the times and as a result the script only prints the street adddress. I used json converter to parse builder name out of json content from that site.
Here is the proof of concept:
Sub GrabPropertyInfo()
Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
Dim oPost As Object, oData As Object, Html As HTMLDocument
Dim jsonObject As Object, jsonStr As Object, propertyMainRaw$
Dim itemStr As Variant, sResp As String, oElem As Object
Dim propertyContainer As Object, propertyMain As Object
Set Html = New HTMLDocument
' With CreateObject("MSXML2.XMLHTTP")
' .Open "GET", siteLink, False
' .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
' .send
' sResp = .responseText
' Html.body.innerHTML = .responseText
' End With
With CreateObject("MSXML2.ServerXMLHTTP.6.0")
.Open "GET", siteLink, True
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
While .readyState < 4: DoEvents: Wend
sResp = .responseText
Html.body.innerHTML = .responseText
End With
Debug.Print "Street address: " & Html.querySelector("h1.homeAddress > .street-address").innerText
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "reactServerState\.InitialContext = (.*);"
.MultiLine = True
Set jsonStr = .Execute(sResp)
End With
itemStr = jsonStr(0).submatches(0)
Set jsonObject = JsonConverter.ParseJson(itemStr)
Set propertyMain = jsonObject("ReactServerAgent.cache")("dataCache")("/stingray/api/home/details/mainHouseInfoPanelInfo")("res")
propertyMainRaw = Replace(propertyMain("text"), "{}&&", "")
On Error Resume Next
Set propertyContainer = JsonConverter.ParseJson(propertyMainRaw)("payload")("mainHouseInfo")("amenitiesInfo")("superGroups")
On Error GoTo 0
If Not propertyContainer Is Nothing Then
For Each oElem In propertyContainer
For Each oPost In oElem("amenityGroups")
If InStr(oPost("groupTitle"), "Building Information") > 0 Then
For Each oData In oPost("amenityEntries")
If InStr(oData("amenityName"), "Builder Name") > 0 Then
Debug.Print "Builder Name: " & oData("amenityValues")(1)
End If
Next oData
End If
Next oPost
Next oElem
End If
End Sub
Using xmlhttp requests, I always get:
Street address: 604 Amesbury Ln #2,
Builder Name: Zach Savage
Using serverxmlhttp requests, I get the following result most of the time:
Street address: 604 Amesbury Ln #2,
How can I get complete response using serverxmlhttp requests?
EDIT:
According to the answer and comments, it is clear that if I scrape browserid from that site using xmlhttp requests and use the value of that browserid as cookie while sending requests using serverxmlhttp, I'll get the desired results. However, The problem is the value of the browserid that I get using xmlhttp requests is sz9u0xmCQKKV9Wu0jRa3Yg whereas I can see this value v-J5D2IUSyqXizI7MG67fQ in page source. How can I get the latter value? This is how I parsed browserid.
Sub FetchBrowserId()
Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
Dim Rxp As Object, browserId As Object, sRes$, cookie$
Set Rxp = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", siteLink, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36"
.send
sRes = .responseText
End With
With Rxp
.Global = True
.Pattern = "window.__rfBrowserId=""(.*?)"";"
.MultiLine = True
Set browserId = .Execute(sRes)
End With
cookie = browserId(0).submatches(0)
Debug.Print cookie
End Sub
So the actual question I guess is why there are different responses returned by MSXML2.XMLHTTP and MSXML2.ServerXMLHTTP requests made to the same URL.
MSXML2.XMLHTTP uses WinINet stack and MSXML2.ServerXMLHTTP uses WinHTTP stack. Check WinINet vs. WinHTTP article for more details.
WinINet provide full processing of cookies (BTW IE also rely on it). So the first reason you have different responses is that cookies sent to server may affect flow. It can be easily compared with any service like e. g. Webhook.site. When you make second request with MSXML2.XMLHTTP the webservice logs the cookies which have been accepted from first response.
Also take in account SSL conditions. Make requests to How's My SSL? by MSXML2.XMLHTTP and MSXML2.ServerXMLHTTP, and follow the link in browser (i. e. Chrome) and compare results.

Can't grab the first image link out of an array of image links

I'm trying to figure out a way to fetch the images from a webpage using xmlhttp requests in vba. After digging deep I could notice that I can access to those images using this attribute data-lazy-srcset. However, this attribute produces an array of image links. What I wish to do is capture the first image link from the array.
Sub GetImage()
Const Url = "https://rasamalaysia.com/grilled-honey-cajun-shrimp/"
Dim Http As Object, Html As HTMLDocument, oImage As Object
Set Html = New HTMLDocument
Set Http = CreateObject("MSXML2.XMLHTTP")
With Http
.Open "Get", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Set oImage = Html.querySelectorAll("p > img")
Debug.Print oImage(0).getAttribute("data-lazy-srcset")
End Sub
Current output:
https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3.jpg 1200w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-200x300.jpg 200w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-300x450.jpg 300w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-768x1152.jpg 768w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-1024x1536.jpg 1024w
Expected output (the first one):
https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3.jpg
How can I scrape the first image link out of an array of image links?
You've described the problem well and it at least looks like a simple array index problem.
Turn the string into array by splitting it on spaces and take out the first element.
Add to top of declares
Dim varArray as Variant
Then add the lines
' Split into an array using blank spaces as delimiter
varArray = Split(oImage(0).getAttribute("data-lazy-srcset"), " ")
' This should return your first image
Debug.Print varArray(0)
There is a more efficient and faster way. Simply select by size-full class, for an element where there is no need to split a string, and where you can simply extract as the appropriate image direct from an attribute:
Option Explicit
Sub GetImage()
Const Url = "https://rasamalaysia.com/grilled-honey-cajun-shrimp/"
Dim Http As Object, Html As HTMLDocument
Set Html = New HTMLDocument
Set Http = CreateObject("MSXML2.XMLHTTP")
With Http
.Open "Get", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Debug.Print Html.querySelector(".size-full").getAttribute("data-pin-media")
End Sub

Can't access some tabular content using xmlhttp requests

I've created two scripts using vba to fetch tabular content from a webpage. To access the tabular content, I had to first fill in with 410611 right next to UID# under Search by UID. Then, I hit the go button and I got to see the tabular content.
I found success using selenium:
Sub FetchContent()
Dim oInput As Object, oElem As Object
With CreateObject("Selenium.ChromeDriver")
.get "https://www.theclearinghouse.org/uid-lookup"
.SwitchToFrame .FindElementById("uidlookup", timeout:=10000)
Set oInput = .FindElementByCss("input[name='UidNum']", timeout:=5000)
oInput.SendKeys "410611"
.FindElementByCss("input#SubmitUid", timeout:=5000).Click
Set oElem = .FindElementByCss("table#uid", timeout:=5000)
MsgBox oElem.Text
End With
End Sub
However, I wish to get the same content using xmlhttp requests. I tried like the following but unfortunately it doesn't work. The script neither throws any error nor prints anything in the immediate window. I found success using the same logic in python, though.
Using xmlhttp requests (doesn't work):
Sub FetchContent()
Const URL = "https://uid.theclearinghouse.org/list.php"
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", URL, False
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.setRequestHeader "referer", "https://uid.theclearinghouse.org/index.php"
.send ("UidNum=410611&SubmitUid=Go%21")
Debug.Print .responseText
End With
End Sub
How can I access the tabular content using xmlhttp requests?

Web Scraping using VBA and MSXML2.XMLHTTP library

I'm trying to scrap data from a website using MSXML2.XMLHTTP object on VBA environment (Excel) and I cannot figure out how to solve this problem! The website is the following:
http://www.detran.ms.gov.br/consulta-de-debitos/
You guys can use the following test data to fill the form:
Placa: oon5868
Renavam: 1021783231
I want to retrieve data like "chassi", with the data above that would be " 9BD374121F5068077".
I do not have problems parsing the html document, the difficult is actually getting the information as response! Code below:
Sub SearchVehicle()
Dim strPlaca As String
Dim strRenavam As String
strPlaca = "oon5868"
strRenavam = "01021783231"
Dim oXmlPage As MSXML2.XMLHTTP60
Dim strUrl As String
Dim strPostData As String
Set oXmlPage = New MSXML2.XMLHTTP60
strUrl = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
strPostData = "placa=" & strPlaca & "&renavam=" & strRenavam
oXmlPage.Open "POST", strUrl, False
oXmlPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
oXmlPage.send strPostData
Debug.Print oXmlPage.responseText
End Sub
The strURL used in the POST method ".../retornooooveiculos.asp" is the one google developer tools and fiddler showed me that was the correct address the website was posting the payload.
When manually accessed, the website retrieve the correct information, but running my code I always get the following response on the .responseText:
<html>Acesse: <b><a href='http://www.detran.ms.gov.br target='_parent'>www.detran.ms.gov.br</a></b></html>
HELP PLEASE, I'm getting crazy trying to solve this puzzle! Why do I get redirected like this?
I need the "CHASSI" information and can't find the correct http Request to do this!
Try the below approach. It should fetch you the content you are after. The thing is you need to supply the Cookie copied from your Request Headers fields in order for your script to work which you can find using devtools.
Sub SearchVehicle()
Const URL As String = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
Dim HTTP As New ServerXMLHTTP60, HTML As New HTMLDocument
Dim elem As Object, splaca$, srenavam$, qsp$
splaca = "oon5868"
srenavam = "01021783231"
qsp = "placa=" & splaca & "&renavam=" & srenavam
With HTTP
.Open "POST", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Cookie", "ISAWPLB{07D08995-E67C-4F44-91A1-F6A16337ECD6}={286E0BB1-C5F9-4439-A2CE-A7BE8C3955E0}; ASPSESSIONIDSCSDSCTB=AGDPOBEAAPJLLMKKIGPLBGMJ; 69137927=967930978"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send qsp
HTML.body.innerHTML = .responseText
End With
For Each elem In HTML.getElementsByTagName("b")
If InStr(elem.innerText, "Chassi:") > 0 Then MsgBox elem.ParentNode.NextSibling.innerText: Exit For
Next elem
End Sub
Once again: fill in the Cookie field by collecting it using your devtools (from Request Headers section), if for some reason my provided Cookie doesn't work for you. Thanks.
Output I'm getting:
9BD374121F5068077

Scraping a HTML POST request using VBA

I'm trying to scrape quotes of Moroccan stocks from this website using VBA :
http://www.casablanca-bourse.com/bourseweb/en/Negociation-History.aspx?Cat=24&IdLink=225
Where you select a security, check "By period", specify the date interval and finally click the "Submit" button.
I went first with the easy method : using an Internet Explorer object :
Sub method1()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.Navigate "http://www.casablanca-bourse.com/bourseweb/Negociation-Historique.aspx?Cat=24&IdLink=302"
Do While IE.Busy
DoEvents
Loop
'Picking the security
Set obj1 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_DDValeur")
obj1.Value = "4100 " 'Security code taken from the source html
'Specifying "By period"
Set obj2 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_RBSearchDate")
obj2.Checked = True
'Start date
Set obj3 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_DateTimeControl1_TBCalendar")
obj3.Value = "07/03/2016"
'End date
Set obj4 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_DateTimeControl2_TBCalendar")
obj4.Value = "07/03/2016"
'Clicking the button
Set objs = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_Image1")
objs.Click
'Setting the data <div> as an object
Set obj5 = IE.document.getElementById("HistoriqueNegociation1_UpdatePanel1")
s = obj5.innerHTML
'Looping until the quotes pop up
Do While InStr(s, "HistoriqueNegociation1_HistValeur1_RptListHist_ctl01_Label3") = 0
Application.Wait DateAdd("s", 0.1, Now)
s = obj5.innerHTML
Loop
'Printing the value
Set obj6 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_RptListHist_ctl01_Label3")
Cells(1, 1).Value = CDbl(obj6.innerText)
IE.Quit
Set IE = Nothing
End Sub
This webpage being dynamic, I had to make the application wait, until the data pops up (until the data pops in the HTML code), and that's why I used that second Do while loop.
Now, what I want to do, is to use the harder way : sending the form request through VBA, which is pretty easy when it comes to GET requests, but this site uses a POST request that I found pretty hard to mimic in VBA.
I used this simple code :
Sub method2()
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
URL = "http://www.casablanca-bourse.com/bourseweb/Negociation-Historique.aspx?Cat=24&IdLink=302"
objHTTP.Open "POST", URL, False
objHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
objHTTP.send ("encoded request params go here")
Cells(1, 1).Value = objHTTP.ResponseText
End Sub
I used the Chrome DevTools (F12) to record the POST request. But I had a hard time figuring what the params should be (The form data is too long, i couldn't make a screenshot or copy it here, so please feel free to record it yourself). I went with the only params that I needed (security code, the radiobox and the two dates), but the request response didn't match the DevTools one, and it didn't contain any usable. Here are the params that I used :
HistoriqueNegociation1$HistValeur1$DDValeur=9000%20%20&HistoriqueNegociation1$HistValeur1$historique=RBSearchDate&HistoriqueNegociation1$HistValeur1$DateTimeControl1$TBCalendar=07%2F03%2F2016&HistoriqueNegociation1$HistValeur1$DateTimeControl2$TBCalendar=07%2F03%2F2016
Obviously, I'm not getting something (or everything) right here.
Actually, I can't just pick "some of the params", I have to send all of them. I didn't do that at first because the params line that I got from the DevTools was too long (47012 characters), Excel-VBA doesn't acccept a line that long. So I copied the params to a text file and then sent the request using that file, and It worked.