Scraping a HTML POST request using VBA - vba

I'm trying to scrape quotes of Moroccan stocks from this website using VBA :
http://www.casablanca-bourse.com/bourseweb/en/Negociation-History.aspx?Cat=24&IdLink=225
Where you select a security, check "By period", specify the date interval and finally click the "Submit" button.
I went first with the easy method : using an Internet Explorer object :
Sub method1()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.Navigate "http://www.casablanca-bourse.com/bourseweb/Negociation-Historique.aspx?Cat=24&IdLink=302"
Do While IE.Busy
DoEvents
Loop
'Picking the security
Set obj1 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_DDValeur")
obj1.Value = "4100 " 'Security code taken from the source html
'Specifying "By period"
Set obj2 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_RBSearchDate")
obj2.Checked = True
'Start date
Set obj3 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_DateTimeControl1_TBCalendar")
obj3.Value = "07/03/2016"
'End date
Set obj4 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_DateTimeControl2_TBCalendar")
obj4.Value = "07/03/2016"
'Clicking the button
Set objs = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_Image1")
objs.Click
'Setting the data <div> as an object
Set obj5 = IE.document.getElementById("HistoriqueNegociation1_UpdatePanel1")
s = obj5.innerHTML
'Looping until the quotes pop up
Do While InStr(s, "HistoriqueNegociation1_HistValeur1_RptListHist_ctl01_Label3") = 0
Application.Wait DateAdd("s", 0.1, Now)
s = obj5.innerHTML
Loop
'Printing the value
Set obj6 = IE.document.getElementById("HistoriqueNegociation1_HistValeur1_RptListHist_ctl01_Label3")
Cells(1, 1).Value = CDbl(obj6.innerText)
IE.Quit
Set IE = Nothing
End Sub
This webpage being dynamic, I had to make the application wait, until the data pops up (until the data pops in the HTML code), and that's why I used that second Do while loop.
Now, what I want to do, is to use the harder way : sending the form request through VBA, which is pretty easy when it comes to GET requests, but this site uses a POST request that I found pretty hard to mimic in VBA.
I used this simple code :
Sub method2()
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
URL = "http://www.casablanca-bourse.com/bourseweb/Negociation-Historique.aspx?Cat=24&IdLink=302"
objHTTP.Open "POST", URL, False
objHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
objHTTP.send ("encoded request params go here")
Cells(1, 1).Value = objHTTP.ResponseText
End Sub
I used the Chrome DevTools (F12) to record the POST request. But I had a hard time figuring what the params should be (The form data is too long, i couldn't make a screenshot or copy it here, so please feel free to record it yourself). I went with the only params that I needed (security code, the radiobox and the two dates), but the request response didn't match the DevTools one, and it didn't contain any usable. Here are the params that I used :
HistoriqueNegociation1$HistValeur1$DDValeur=9000%20%20&HistoriqueNegociation1$HistValeur1$historique=RBSearchDate&HistoriqueNegociation1$HistValeur1$DateTimeControl1$TBCalendar=07%2F03%2F2016&HistoriqueNegociation1$HistValeur1$DateTimeControl2$TBCalendar=07%2F03%2F2016
Obviously, I'm not getting something (or everything) right here.

Actually, I can't just pick "some of the params", I have to send all of them. I didn't do that at first because the params line that I got from the DevTools was too long (47012 characters), Excel-VBA doesn't acccept a line that long. So I copied the params to a text file and then sent the request using that file, and It worked.

Related

Can't access some tabular content using xmlhttp requests

I've created two scripts using vba to fetch tabular content from a webpage. To access the tabular content, I had to first fill in with 410611 right next to UID# under Search by UID. Then, I hit the go button and I got to see the tabular content.
I found success using selenium:
Sub FetchContent()
Dim oInput As Object, oElem As Object
With CreateObject("Selenium.ChromeDriver")
.get "https://www.theclearinghouse.org/uid-lookup"
.SwitchToFrame .FindElementById("uidlookup", timeout:=10000)
Set oInput = .FindElementByCss("input[name='UidNum']", timeout:=5000)
oInput.SendKeys "410611"
.FindElementByCss("input#SubmitUid", timeout:=5000).Click
Set oElem = .FindElementByCss("table#uid", timeout:=5000)
MsgBox oElem.Text
End With
End Sub
However, I wish to get the same content using xmlhttp requests. I tried like the following but unfortunately it doesn't work. The script neither throws any error nor prints anything in the immediate window. I found success using the same logic in python, though.
Using xmlhttp requests (doesn't work):
Sub FetchContent()
Const URL = "https://uid.theclearinghouse.org/list.php"
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", URL, False
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.setRequestHeader "referer", "https://uid.theclearinghouse.org/index.php"
.send ("UidNum=410611&SubmitUid=Go%21")
Debug.Print .responseText
End With
End Sub
How can I access the tabular content using xmlhttp requests?

Get Data using MSXML2.XMLHTTP

i am trying to get data using MSXML2.XMLHTTP
but it didn't work
any ideas?
Sub getdata
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As String
Dim sht As Worksheet
Application.DisplayAlerts = False
Set sht = ActiveSheet
On Error Resume Next
website = "https://shopee.co.id/AFI-EC-Tshirt-Yumia-(LD-90-P-57)-i.10221730.5568491283"
Set request = CreateObject("MSXML2.XMLHTTP")
request.Open "GET", website, False
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
request.send
response = StrConv(request.responseBody, vbUnicode)
html.DocumentElement.innerHTML = response
price = html.querySelector("div.AJyN7v")(0).innerText
Debug.Print price
Application.StatusBar = ""
On Error GoTo 0
Application.DisplayAlerts = True``
End Sub
I have done many ways but still not working ,
hope someone can help me
Pretty much everything on that page requires javascript to load. Javascript doesn't run with xmlhttp request to landing page so price never gets retrieved..
The price is being retrieved dynamically from an additional API call returning json.
If you examine the url you will have the following:
https://shopee.co.id/AFI-EC-Tshirt-Yumia-(LD-90-P-57)-i.10221730.5568491283
The last set of consecutive numbers is the product id i.e. 5568491283.
If you open the network tab of dev tools F12, and press F5 to refresh the web traffic that updates the page, then check on the xhr only traffic, then input your product id into the search box, the first result retrieved is the xhr which is returning the price:
https://shopee.co.id/api/v2/item/get?itemid=5568491283&shopid=10221730
The response is json so you will need a json parser to extract the result (or use regex on string - less preferable)
In the headers sub-tab you can view info about the xhr request made.
Check the terms and conditions to see if scraping allowed and also whether there is an public API for retrieving this data.

Web Scraping using VBA and MSXML2.XMLHTTP library

I'm trying to scrap data from a website using MSXML2.XMLHTTP object on VBA environment (Excel) and I cannot figure out how to solve this problem! The website is the following:
http://www.detran.ms.gov.br/consulta-de-debitos/
You guys can use the following test data to fill the form:
Placa: oon5868
Renavam: 1021783231
I want to retrieve data like "chassi", with the data above that would be " 9BD374121F5068077".
I do not have problems parsing the html document, the difficult is actually getting the information as response! Code below:
Sub SearchVehicle()
Dim strPlaca As String
Dim strRenavam As String
strPlaca = "oon5868"
strRenavam = "01021783231"
Dim oXmlPage As MSXML2.XMLHTTP60
Dim strUrl As String
Dim strPostData As String
Set oXmlPage = New MSXML2.XMLHTTP60
strUrl = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
strPostData = "placa=" & strPlaca & "&renavam=" & strRenavam
oXmlPage.Open "POST", strUrl, False
oXmlPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
oXmlPage.send strPostData
Debug.Print oXmlPage.responseText
End Sub
The strURL used in the POST method ".../retornooooveiculos.asp" is the one google developer tools and fiddler showed me that was the correct address the website was posting the payload.
When manually accessed, the website retrieve the correct information, but running my code I always get the following response on the .responseText:
<html>Acesse: <b><a href='http://www.detran.ms.gov.br target='_parent'>www.detran.ms.gov.br</a></b></html>
HELP PLEASE, I'm getting crazy trying to solve this puzzle! Why do I get redirected like this?
I need the "CHASSI" information and can't find the correct http Request to do this!
Try the below approach. It should fetch you the content you are after. The thing is you need to supply the Cookie copied from your Request Headers fields in order for your script to work which you can find using devtools.
Sub SearchVehicle()
Const URL As String = "http://www2.detran.ms.gov.br/detranet/nsite/veiculo/veiculos/retornooooveiculos.asp"
Dim HTTP As New ServerXMLHTTP60, HTML As New HTMLDocument
Dim elem As Object, splaca$, srenavam$, qsp$
splaca = "oon5868"
srenavam = "01021783231"
qsp = "placa=" & splaca & "&renavam=" & srenavam
With HTTP
.Open "POST", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Cookie", "ISAWPLB{07D08995-E67C-4F44-91A1-F6A16337ECD6}={286E0BB1-C5F9-4439-A2CE-A7BE8C3955E0}; ASPSESSIONIDSCSDSCTB=AGDPOBEAAPJLLMKKIGPLBGMJ; 69137927=967930978"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send qsp
HTML.body.innerHTML = .responseText
End With
For Each elem In HTML.getElementsByTagName("b")
If InStr(elem.innerText, "Chassi:") > 0 Then MsgBox elem.ParentNode.NextSibling.innerText: Exit For
Next elem
End Sub
Once again: fill in the Cookie field by collecting it using your devtools (from Request Headers section), if for some reason my provided Cookie doesn't work for you. Thanks.
Output I'm getting:
9BD374121F5068077

Trouble scraping the names from a certain website

I've come across such a webpage which seems to me a bit misleading
to scrape. When I go the address "https://jobboerse2.arbeitsagentur.de/jobsuche/?s=1" it takes me to a page with "suchen" option. After clicking "suchen" it opens a new layout within this tab and takes me to a page with lots of names. So, the site address is same again "https://jobboerse2.arbeitsagentur.de/jobsuche/?s=1".
I would like to scrape the names of that page, as in "Mitarbeiter für die Leerguttrennung (m/w)". Any help would be highly appreciated.
What I wrote so far:
Sub WebData()
Dim http As New MSXML2.xmlhttp60
Dim html As New htmldocument, source As Object, item As Object
With http
.Open "GET", "https://jobboerse2.arbeitsagentur.de/jobsuche/?s=1", False
.send
html.body.innerHTML = .responseText
End With
Set source = html.getElementsByClassName("ng-binding ng-scope")
For Each item In source
x = x + 1
Cells(x, 1) = item.innerText
Next item
Set html = Nothing: Set source = Nothing
End Sub
The links are incremented like these as per xhr in developer tool but can't figure out what is the number of the last link.
"https://jobboerse2.arbeitsagentur.de/jobsuche/pc/v1/jobs"
"https://jobboerse2.arbeitsagentur.d...00&FCT.ANGEBOTSART=ARBEIT&FCT.BEHINDERUNG=AUS"
"https://jobboerse2.arbeitsagentur.d...EBOTSART=ARBEIT&FCT.BEHINDERUNG=AUS&offset=12"
"https://jobboerse2.arbeitsagentur.d...EBOTSART=ARBEIT&FCT.BEHINDERUNG=AUS&offset=24"
"https://jobboerse2.arbeitsagentur.d...EBOTSART=ARBEIT&FCT.BEHINDERUNG=AUS&offset=36"

Excel VBA get data from web with msxml2.xmlhttp - how do I accept cookies automatically?

need your expertise and help, as i've looked around and couldn't find a solution:
I am uploading information to Excel from a website using the msxml2.xmlhttp method (did it earlier via webquery but it gets stuck after a few iterations plus it is slower). My problem is that now on every iteration, I have a Windows Security warning popping up asking me to accept a cookie from the website. Note that the website doesn't require a login/password. I understood from an earlier post that the msxml2.xmlhttp method strips cookies for security reasons, but I get the same message even if I change the method to winhttp. I also changed the settings in IE to accept all cookies automatically from the website but it didn't help.
My question is, what code do I need to add in order to have the cookies be accepted automatically, as I am looping this code on bulk and can't have it hang waiting for me to accept the cookie manually. Your help will be very much appreciated!!! Below is the code snippet (which I actually found here on Stackoverflow).
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://finance.yahoo.com/q/ae?s=" & Ticker & "+Analyst+Estimates", False
.send
htm.body.innerHTML = .responseText
End With
Set elemCollection = htm.getElementsByTagName("td")
For Each itm In elemCollection
If itm.className = "yfnc_tabledata1" Then
ActiveCell = itm.innerText
If ActiveCell.Column = 7 Then
ActiveCell.Offset(1, -6).Select
Else
ActiveCell.Offset(0, 1).Select
End If
End If
Next
I had the same problem this week.
After google and trying some ideas, I added two MsgBox statements to my code.
objXMLDoc.Open "GET", strURL, False
objXMLDoc.send
MsgBox "After XMLDoc.send", vbOKOnly, "Test"
objHTMLDoc.body.innerHTML = objXMLDoc.responseText
MsgBox "After .innerHTML assignment", vbOKOnly, "Test"
I found pop-up security warning windows always appear after .innerHTML assignment, i.e., the problem is nothing to do with XMLHttp. It is HTMLDocument, which causes the pop-ups.
I guess objHTMLDoc.body.innerHTML = objXMLDoc.responseText does not just do a simple value assignment. It must also trigged some action according to the contents of the webpage.
I checked the webpage and found some code like this:
YUI().use('node','event','event-mouseenter','substitute','oop','node-focusmanager','node','event','substitute','**cookie**','event-resize','node', 'event', 'querystring-stringify','node','event','node','event','event-custom','event-valuechange','classnamemanager','node', function(Y) {})
Then I changed my code as follows and the pop-up warning windows disappear.
objXMLDoc.Open "GET", strURL, False
objXMLDoc.send
objHTMLDoc.body.innerHTML = Replace(objXMLDoc.responseText, "cookie", "")
Hope this can be helpful if you still have the problem.