Macro gets partial response using serverxmlhttp requests - vba

I'm trying to extract street address along with the builder name from a webpage. When I use xmlhttp60 requests I get those fields accordingly. However, when I go for serverxmlhttp60 requests I get partial response most of the times and as a result the script only prints the street adddress. I used json converter to parse builder name out of json content from that site.
Here is the proof of concept:
Sub GrabPropertyInfo()
Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
Dim oPost As Object, oData As Object, Html As HTMLDocument
Dim jsonObject As Object, jsonStr As Object, propertyMainRaw$
Dim itemStr As Variant, sResp As String, oElem As Object
Dim propertyContainer As Object, propertyMain As Object
Set Html = New HTMLDocument
' With CreateObject("MSXML2.XMLHTTP")
' .Open "GET", siteLink, False
' .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
' .send
' sResp = .responseText
' Html.body.innerHTML = .responseText
' End With
With CreateObject("MSXML2.ServerXMLHTTP.6.0")
.Open "GET", siteLink, True
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
While .readyState < 4: DoEvents: Wend
sResp = .responseText
Html.body.innerHTML = .responseText
End With
Debug.Print "Street address: " & Html.querySelector("h1.homeAddress > .street-address").innerText
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "reactServerState\.InitialContext = (.*);"
.MultiLine = True
Set jsonStr = .Execute(sResp)
End With
itemStr = jsonStr(0).submatches(0)
Set jsonObject = JsonConverter.ParseJson(itemStr)
Set propertyMain = jsonObject("ReactServerAgent.cache")("dataCache")("/stingray/api/home/details/mainHouseInfoPanelInfo")("res")
propertyMainRaw = Replace(propertyMain("text"), "{}&&", "")
On Error Resume Next
Set propertyContainer = JsonConverter.ParseJson(propertyMainRaw)("payload")("mainHouseInfo")("amenitiesInfo")("superGroups")
On Error GoTo 0
If Not propertyContainer Is Nothing Then
For Each oElem In propertyContainer
For Each oPost In oElem("amenityGroups")
If InStr(oPost("groupTitle"), "Building Information") > 0 Then
For Each oData In oPost("amenityEntries")
If InStr(oData("amenityName"), "Builder Name") > 0 Then
Debug.Print "Builder Name: " & oData("amenityValues")(1)
End If
Next oData
End If
Next oPost
Next oElem
End If
End Sub
Using xmlhttp requests, I always get:
Street address: 604 Amesbury Ln #2,
Builder Name: Zach Savage
Using serverxmlhttp requests, I get the following result most of the time:
Street address: 604 Amesbury Ln #2,
How can I get complete response using serverxmlhttp requests?
EDIT:
According to the answer and comments, it is clear that if I scrape browserid from that site using xmlhttp requests and use the value of that browserid as cookie while sending requests using serverxmlhttp, I'll get the desired results. However, The problem is the value of the browserid that I get using xmlhttp requests is sz9u0xmCQKKV9Wu0jRa3Yg whereas I can see this value v-J5D2IUSyqXizI7MG67fQ in page source. How can I get the latter value? This is how I parsed browserid.
Sub FetchBrowserId()
Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
Dim Rxp As Object, browserId As Object, sRes$, cookie$
Set Rxp = CreateObject("VBScript.RegExp")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", siteLink, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36"
.send
sRes = .responseText
End With
With Rxp
.Global = True
.Pattern = "window.__rfBrowserId=""(.*?)"";"
.MultiLine = True
Set browserId = .Execute(sRes)
End With
cookie = browserId(0).submatches(0)
Debug.Print cookie
End Sub

So the actual question I guess is why there are different responses returned by MSXML2.XMLHTTP and MSXML2.ServerXMLHTTP requests made to the same URL.
MSXML2.XMLHTTP uses WinINet stack and MSXML2.ServerXMLHTTP uses WinHTTP stack. Check WinINet vs. WinHTTP article for more details.
WinINet provide full processing of cookies (BTW IE also rely on it). So the first reason you have different responses is that cookies sent to server may affect flow. It can be easily compared with any service like e. g. Webhook.site. When you make second request with MSXML2.XMLHTTP the webservice logs the cookies which have been accepted from first response.
Also take in account SSL conditions. Make requests to How's My SSL? by MSXML2.XMLHTTP and MSXML2.ServerXMLHTTP, and follow the link in browser (i. e. Chrome) and compare results.

Related

Can't grab the first image link out of an array of image links

I'm trying to figure out a way to fetch the images from a webpage using xmlhttp requests in vba. After digging deep I could notice that I can access to those images using this attribute data-lazy-srcset. However, this attribute produces an array of image links. What I wish to do is capture the first image link from the array.
Sub GetImage()
Const Url = "https://rasamalaysia.com/grilled-honey-cajun-shrimp/"
Dim Http As Object, Html As HTMLDocument, oImage As Object
Set Html = New HTMLDocument
Set Http = CreateObject("MSXML2.XMLHTTP")
With Http
.Open "Get", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Set oImage = Html.querySelectorAll("p > img")
Debug.Print oImage(0).getAttribute("data-lazy-srcset")
End Sub
Current output:
https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3.jpg 1200w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-200x300.jpg 200w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-300x450.jpg 300w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-768x1152.jpg 768w, https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3-1024x1536.jpg 1024w
Expected output (the first one):
https://rasamalaysia.com/wp-content/uploads/2021/06/honey-cajun-grilled-shrimp3.jpg
How can I scrape the first image link out of an array of image links?
You've described the problem well and it at least looks like a simple array index problem.
Turn the string into array by splitting it on spaces and take out the first element.
Add to top of declares
Dim varArray as Variant
Then add the lines
' Split into an array using blank spaces as delimiter
varArray = Split(oImage(0).getAttribute("data-lazy-srcset"), " ")
' This should return your first image
Debug.Print varArray(0)
There is a more efficient and faster way. Simply select by size-full class, for an element where there is no need to split a string, and where you can simply extract as the appropriate image direct from an attribute:
Option Explicit
Sub GetImage()
Const Url = "https://rasamalaysia.com/grilled-honey-cajun-shrimp/"
Dim Http As Object, Html As HTMLDocument
Set Html = New HTMLDocument
Set Http = CreateObject("MSXML2.XMLHTTP")
With Http
.Open "Get", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Debug.Print Html.querySelector(".size-full").getAttribute("data-pin-media")
End Sub

Http request VBA response

Making my first steps on HTTP within VBA.
Already managed to get cookie value in order to make a request. The problem I'm having is that the file I want to download is not in the first response. When I analyse using HTTP Header Live, the browser receives several responses and only the last one is the file, a PDF that is generated after a query sent by the user. The only thing I'm getting is the first response that I'm displaying with a MsgBox. Can someone help me solving this problem. Made some searches through the web but haven't found yet, a solution.
The code I am using is:
Sub Test()
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("WinHTTP.WinHTTPrequest.5.1")
With WinHttpReq
.Open "POST", myURL, False ', "username", "password"
.send
x = .getResponseHeader("Set-Cookie")
i = InStr(x, ";")
x = Left(x, i - 1)
MsgBox x
End With
With WinHttpReq
.Open "GET", myURL, False
.Option(WinHttpRequestOption_EnableRedirects) = True
.SetRequestHeader "Content-Type", "application/pdf"
.SetRequestHeader "Accept-Encoding", "gzip, deflate, br"
.SetRequestHeader "Cookie", x
.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0"
.send
End With
MsgBox (WinHttpReq.getAllResponseHeaders())
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
With oStream
.Type = 1
.Open
.Write WinHttpReq.responseBody
.SaveToFile "C:\Users\xxx\Desktop\file.pdf", 2
.Close
End With
End If
Set WinHttpReq = Nothing
Set oStream = Nothing
End Sub
Analyzing with Firefox, when I enter the URL, I receive two responses with 200 OK. I wonder how can I get the second response? The site uses a javascript that interprets the query I send and returns a PDF file. The file name changes according to the user and the query.
Now, i reached the following point. I make a first request to get the cookie, then a second to get an ETag from a diferent address. The problem is that when i make the third request to download the file, the filename is apparently generated by the server (APACHE?), based on the ETag and something I'm not being able to find. The last 5 numbers from the ETag change in the filename.
For example:
ETag - 1543932096000
File - 1543932095115.xxxxx.address.com.5245.idp.pdf
Since I don't have the filename, i cannot download the file with an httprequest.
Help?

POST request created in vba brings back nothing as result

I've written a very tiny script in vba using POST request. However, when I run it, I get nothing as result except for a blank message. I've tried to fill in the request parameter accordingly. Perhaps, I can't notice which should be included in the parameter. The page I'm dealing with contains several images in it's right panel. When an image is clicked the request about which i'm talking here is sent to the server and brings back the result and displays new information concerning its' flavor under it. My goal is to parse all the flavors connected to each images. Anyways, I'm trying to attach all the things necessary to find out what i'm missing. Thanks in advance.
This is what I got from chrome developer tools to prepare the POST request:
"https://www.dropbox.com/s/zjn0ahixhu58miq/RequestStatus.txt?dl=0"
Here is what I'm trying with:
Sub PostReq()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim ArgumentStr As String
ArgumentStr = "opt=flavor&opt1=207&opt2=47&ip=105"
With http
.Open "POST", "https://www.optigura.com/product/ajax/details.php", False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
.send ArgumentStr
html.body.innerHTML = .responseText
End With
MsgBox http.responseText
End Sub
This is the original link to the webpage:
"https://www.optigura.com/uk/product/gold-standard-100-whey/"
Your code sets a request header like so:
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
so the script is going to expect the argument string to be URL encoded (which it isn't).
Try either encoding the string, or send the request using "GET" instead.
Finally, I've made it. To receive the required response it is necessary to send a GET request first then again send a POST request using the response from that get request. Here is the working one:
Sub httpPost()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim ArgumentStr As String
ArgumentStr = "opt=flavor&opt1=207&opt2=47&ip=105"
With http
.Open "GET", "https://www.optigura.com/uk/product/gold-standard-100-whey/", False
.send
End With
With http
.Open "POST", "https://www.optigura.com/product/ajax/details.php", False
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
.send ArgumentStr
html.body.innerHTML = .responseText
End With
MsgBox http.responseText
End Sub

Post request can't fetch response from targeted page

Writing a macro in post request when i run it, it brings unexpected response which i don't want. Perhaps it is unable to fetch response from the targeted page. Can't identify the mistake I'm doing? The original url I'm pasting under my code.
Box to be checked before performing search:
Industry Role = Professional Services Providers
Other Criterion = APEX
Sub Xmlpost()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim Items As Object, Item As Object, Elem As Object
Dim postdata As String
postdata = "DoMemberSearch=1&mas_last=&mas_comp=&mas_city=&mas_stat=&mas_cntr=&mas_type=Professional+Services+Providers&OtherCriteria=1"
With http
.Open "POST", "https://www.infocomm.org/cps/rde/xchg/infocomm/hs.xsl/memberdirectory.htm", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
.send postdata
html.body.innerHTML = .responseText
End With
Set Items = html.getElementById("paginationDataPool").getElementsByTagName("a")
For Each Item In Items
x = x + 1
Cells(x, 1) = Item.innerText
Next Item
End Sub
Original Page:"https://www.infocomm.org/cps/rde/xchg/infocomm/hs.xsl/memberdirectory.htm"
Search should be made like:
The output I'm getting is this:
You are looking for elements that use the class paginationDisplayItem, but this class is only added dynamically by JavaScript running in your browser, looking like this:
<div class="paginationDisplayItem">
In your html object, however, there is just the plain HTML response from your POST request. Just save it to a file and have a look for yourself, instead of the class attribute the same div contains an id attribute:
<div id="paginationItem_1">
Each successive entry has that trailing number increased by one.
If you adapt your loop to retrieve elements based on that id, everything will work as you expect.
Proof of concept:
For x = 1 To 57
Set Item = html.getElementById("paginationItem_" & x)
Cells(x, 1) = Item.getElementsByTagName("a")(0).innerText
Next x
You will obviously not want to explicitly loop to 57 in all cases, so feel free to refactor this to your likings.
Btw.: You should declare Items As IHTMLElementCollection and Item As IHTMLElement - this way IntelliSense will work on your objects and you'll have type safety.

#Value error on Winhttp.Winhttprequest in Excel VBA

I have written some code to retrieve url, but i am getting #Value error. Is anything wrong in this code,
Public Function Rurl(ByVal URL As String)
Dim http As Object
Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
http.Option(WinHttpRequestOption_UserAgentString) = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
http.Option(WinHttpRequestOption_EnableRedirects) = True
If Not InStr(URL, "://") > 0 Then URL = "http://" & URL
http.Open "GET", URL
http.Send
Rurl = http.GetResponseHeader("Location")
Set http = Nothing
End function
You don't say where you're getting the error, so I'm going to assume it's at this line:
Rurl = http.GetResponseHeader("Location")
Something to ask yourself: what will your code do if the site at the supplied URL doesn't redirect?
The answer is that your code will give you an error at the above line which you don't handle anywhere in your code, very likely resulting in the #VALUE! error that you're seeing.
I'd suggest adding some error checking to ensure your function works in all situations. So, give this a go:
Public Function Rurl(ByVal URL As String)
On Error GoTo ErrorHandler
Dim http As Object
Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
http.Option(WinHttpRequestOption_UserAgentString) = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
http.Option(WinHttpRequestOption_EnableRedirects) = True
If Not InStr(URL, "://") > 0 Then URL = "http://" & URL
http.Open "GET", URL
http.Send
Rurl = http.GetResponseHeader("Location")
Set http = Nothing
Exit Function
ErrorHandler:
Rurl = "" ' or you can say something like: "No redirection".
Resume Next
End Function
If an error occurs anywhere in your function, the error handler will set the return value of your function to something sensible, clean up and exit the function. If no error occurs, everything should work like before. We're just adding a bit of code to handle potential errors.
trap for the 302 status code then get the Location variable from the http header.
strUrl = "https://xx123.abc.com/"
Dim http As Object
Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
With http
.Open "GET", strUrl, False
.setRequestHeader "Content-Type", "text/css" '"application/x-www-form-urlencoded"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
.Option(WinHttpRequestOption_EnableRedirects) = False
.Send
d = .waitForResponse()
If (.Status = 302) Then
temp = .responseText
headers = .getAllResponseHeaders()
cookie = .getResponseHeader("Set-Cookie")
redirectedURL = .getResponseHeader("Location")
End If
End With