Parsing HTTP Response to spreadsheet (error'94': invalid use of null) - vba

I am using VBA for sending HTTP Request and parse Response to my spreadsheet. while decoding table data from response, I am getting "error'94': invalid use of null".
code is given below. HTML structure of response link is also shared. Please Help..
Image-1
Image-2
Here is a link [a link](https://drive.google.com/file/d/1_ubH3zyPiZ8crK-KYs_-RipkGDoFci9G/view?usp=sharing)!
URL = "https://xx.xxxx.com/itrade/user/watch.exe?action=C "
xmlhttp.Open "POST", URL, False
xmlhttp.setRequestHeader "Host", "xx.xxxx.com"
.....
xmlhttp.setRequestHeader "Accept-Language", "en-US,en;q=0.9"
xmlhttp.setRequestHeader "Cookie", cook
xmlhttp.send bodyk
html.body.innerHTML = xmlhttp.responseText
'---------Done-----------
getcookres = xmlhttp.GetResponseHeader("Set-Cookie")
Worksheets("Action").Cells(3, 24).Value = getcookres
Dim nodeList As Object, i As Long
Set nodeList = html.querySelectorAll("#data_table tr.tinside td")
For i = 1 To nodeList.Length - 1
Debug.Print nodeList.Item(i).innerText
Sheets("Sheet1").Cells(1, i).Value = nodeList.Item(i).innerText
Next
Worksheets("Action").Cells(6, 26).Value = Worksheets("Action").Cells(6, 27).Value
End Sub

Related

Excel VBA: alter url based on cell value

Parent post:
VBA: Selecting from dropdown menu to reload page and scraping data
Apparently I don't know how to use stackoverflow: I deleted my profile, thinking I was 'muting' the email updates. Just started coding VBA today, not quite sure what I'm doing. With the help of the awesome user SIM, the code is working.
I was trying to further tinker the code, to alter the web url address such that it would insert whatever ticker symbol I put in cell J1. In this case, I'm trying to lookup more than just jpm.
The goal here is to put in whatever ticker symbol in J1, and it would be reflected on the query url.
For example:
J1 would hold AAPL, and the .Open command would be
.Open "POST", "https://www.nasdaq.com/symbol/jpm/historical", False
Or J1 would hold WFC, and the .Open command would be
.Open "POST", "https://www.nasdaq.com/symbol/WFC/historical", False
However, my attempts are not working so hot.
Here's what I have so far.
Sub Get_Data()
Dim tabd As Object, trow As Object, r&, c&
Dim QueryString$, S$
QueryString = "10y|false|" & Range("J1").Value & "" ''change here the "year" and the "ticker" name as necessary
''Set web_url = "https://www.nasdaq.com/symbol/" & Range("J1").Value & "/historical"
Range("A:F").ClearContents
With New XMLHTTP
.Open "POST", "https://www.nasdaq.com/symbol/jpm/historical", False
''.Open "POST", "web_url", False
.setRequestHeader "User-Agent", "IE"
.setRequestHeader "Content-Type", "application/json"
.send QueryString
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each tabd In .getElementById("quotes_content_left_pnlAJAX").getElementsByTagName("table")(0).Rows
For Each trow In tabd.Cells
c = c + 1: Cells(r + 1, c) = trow.innerText
Next trow
c = 0: r = r + 1
Next tabd
End With
End Sub
I commented out the section that did not work.
Actually, your commented part is almost right. You can't use Set since you are creating a string, and the ticker inside the url must be lower case. Also, you were passing "web_url" as string literal in the Open method.
This is how you would do it:
Sub Get_Data()
Dim tabd As Object, trow As Object, r&, c&
Dim QueryString$, S$
QueryString = "10y|false|" & Range("J1").Value & "" ''change here the "year" and the "ticker" name as necessary
web_url = "https://www.nasdaq.com/symbol/" & LCase(Range("J1").Value) & "/historical"
Range("A:F").ClearContents
With New XMLHTTP
.Open "POST", web_url, False
.setRequestHeader "User-Agent", "IE"
.setRequestHeader "Content-Type", "application/json"
.send QueryString
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each tabd In .getElementById("quotes_content_left_pnlAJAX").getElementsByTagName("table")(0).Rows
For Each trow In tabd.Cells
c = c + 1: Cells(r + 1, c) = trow.innerText
Next trow
c = 0: r = r + 1
Next tabd
End With
End Sub

Creating a facebook scraper

I tried to make a facebook bot to parse the profile links. However, it signs in and parses the content of left-sided bar inconsistently. I can't go further. Could anyone point me in the right direction so that i can rectify my mistakes I've made already in my code and parse the profile links. Here is the code:
strdata = "email=sth.com&pass=xxx"
http.Open "POST", "https://www.facebook.com/login.php?login_attempt=1&lwv=110", False
http.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
http.send strdata
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("_li")(0).getElementsByTagName("a")
For Each topic In topics
Cells(x, 1) = topic.innerText
x = x + 1
Next topic
Does this help?
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "https://www.facebook.com/login.php?login_attempt=1&lwv=110"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
To get the content of facebook, either one has to deal with api or to use selenium which is definitely the worst way. However, my below script can log in to the page and parse some titles:
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "https://www.facebook.com/"
.FindElementById("email").SendKeys ("email_id")
.FindElementById("pass").SendKeys ("Pass_word")
.FindElementById("u_0_2").Click
.Wait 5
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("profileLink")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub

Extracting file URL from a Hyperlinked Image

Sub DownloadFile()
Dim myURL As String
myURL = "http://data.bls.gov/timeseries/LNS14000000"
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False, "username", "password"
WinHttpReq.send
myURL = WinHttpReq.responseBody
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile "C:\Downloads\abc.xlsx", 2
oStream.Close
End If
End Sub
I am trying to download data using VBA and found this code running pretty well. The webpage URL from which I am trying to download data is the one I have used in the code. Please take a moment and open the webpage as the Excel file I am trying to download is linked in an image and so I am not able to find the URL to download the file from that image. Please advice. Thanks.
You might be able to hit the form target directly with a POST (action="/pdq/SurveyOutputServlet") but it is expecting a post string of the <input> elements together with their values. Most if not all of these input elements have been filled out for you simply by going to that page. All you need to do is collect and concatenate them into a post string to be shoved back at the form.
Option Explicit
'base web page
Public Const csBLSGOVpg = "http://data.bls.gov/timeseries/LNS14000000"
'form's action target
Public Const csXLSDLpg = "http://data.bls.gov/pdq/SurveyOutputServlet"
Sub mcr_Stream_Buyer_Documents()
Dim xmlDL As New MSXML2.ServerXMLHTTP60, xmlBDY As New HTMLDocument, adoFILE As Object
Dim xmlSend As String, strFN As String, f As Long, i As Long
With xmlDL
.SetTimeouts 5000, 5000, 15000, 25000
'start by going to the base web page
.Open "GET", csBLSGOVpg, False
.setRequestHeader "Content-Type", "text/javascript"
.send
If .Status <> "200" Then GoTo bm_Exit
'get the source HTML for examination; zero the post string var
xmlBDY.body.innerHTML = .responseText
xmlSend = vbNullString
'loop through the forms until you find the right one
'then loop through the input elements and construct a post string
For f = 0 To xmlBDY.getElementsByTagName("form").Length - 1
If xmlBDY.getElementsByTagName("form")(f).Name = "excel" Then
With xmlBDY.getElementsByTagName("form")(f)
For i = 0 To .getElementsByTagName("input").Length - 1
xmlSend = xmlSend & Chr(38) & _
.getElementsByTagName("input")(i).Name & Chr(61) & _
.getElementsByTagName("input")(i).Value
Next i
xmlSend = "?.x=5&.y=5" & xmlSend
End With
Exit For
End If
Next f
'Debug.Print xmlSend 'check the POST string
'send the POST string back to the form's action target
.Open "POST", csXLSDLpg, False
xmlDL.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
xmlDL.send xmlSend
If xmlDL.Status <> "200" Then GoTo bm_Exit
'pick up the response as a stream and save it as a .XLSX
strFN = Environ("USERPROFILE") & "\Documents\LNS14000000" & Format(Date, "yyyymmdd") & ".xlsx"
On Error Resume Next
Kill strFN
On Error GoTo 0
Set adoFILE = CreateObject("ADODB.Stream")
adoFILE.Type = 1
adoFILE.Open
adoFILE.Write .responseBody
adoFILE.SaveToFile strFN, 2
Set adoFILE = Nothing
End With
Set xmlBDY = Nothing
Set xmlDL = Nothing
Exit Sub
bm_Exit:
Debug.Print Err.Number & ":" & Err.Description
End Sub
This is pretty minimalist but it is all that you need. There is at least one non-standard input element that does not have a name but I elected to send its value back anyway. I did not sequentially remove things until it broke; I just built the POST string given what I retrieved and sent it back.
     
               LNS1400000020150916.xlsx
You will probably be moving this code to some sort of loop. Adjust the receiving file name accordingly. Each new page should adjust its own form input elements accordingly.
Once response is stored in an HTMLDocument object you can use a CSS selector of
#download_xlsx
The "#" means id.
You can then click on this element
htmlDocument.querySelector("#download_xlsx").Click
VBA:
Option Explicit
Public Sub DownloadFile()
Dim ie As New InternetExplorer
With ie
.Visible = True
.navigate "https://data.bls.gov/timeseries/LNS14000000"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("#download_xlsx").Click
.Quit
End With
End Sub
Other:
You could even target the form and submit:
.document.forms("excel").submit
This triggers the POST request mentioned in the other answer (which is an awesome answer btw).

Strip multiple li elements from google with VBA

My goal is to strip all Google search results from page 1 with VBA to Excel. Until so far I managed to strip the first result. The head, link and date are stored in cells 4, 5, 6. I now have to make a loop for the other li's, but I can't get it straight. Also the function that stores the date isn't very optimal coded I think. Anyone who knows the answer?
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object, Objdatum As Object, Ddatum As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 3) & "Skipr" & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
Set Objdatum = objResultDiv.getelementsbytagname("span")(2)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
dat_text = Objdatum.innerHTML
Cells(i, 4) = str_text
Cells(i, 5) = link.href
Cells(i, 6) = dat_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
You need to iterate the collections returned to you by your getelementsbytagname call instead of returning only the first element with the array index (0)
I had a similar project, below are some tips & my approach for your reference, it might assist you in working & maintaining the code in the future:
First, Instead of using CreateObject I prefer to reference the Object Libraries that expose the COM objects, this gives me the ability to browse the functions and properties of each object F2 and gives me code completion (speed & less bugs) within VBA editor (F7 takes you back to code view).
Giving me documentation and code completion:
Also, use these const for clarity
'see ready state : https://msdn.microsoft.com/en-us/library/ie/ms534361(v=vs.85).aspx
Const READYSTATE_UNINITIALIZED = 0
Const READYSTATE_LOADING = 1
Const READYSTATE_LOADED = 2
Const READYSTATE_INTERACTIVE = 3
Const READYSTATE_COMPLETE = 4
Finally, using DOMDocument60 to parse the XML into a Document Object Model in memory.
and MSHTML.HTMLDocument to parse the HTML document and iterate the table rows.
Below is code where I iterate all returned rows from a table within a html document embedded in the initial XML document returned by the webserver:
Dim xmlDoc As DOMDocument60
Set xmlDoc = GetXMLDocument("http://www.nbg.ge/rss.php")
'extract publication date
Debug.Print xmlDoc.getElementsByTagName("pubDate")(0).Text
'unwrap html document from CDATA in "//item/description" element
Dim htmlDoc As New MSHTML.HTMLDocument
htmlDoc.body.innerHTML = xmlDoc.SelectNodes("//item/description")(0).Text
'extract table data from html document
Dim tr As IHTMLElement, td As IHTMLElement
For Each tr In htmlDoc.getElementsByTagName("tr")
For Each td In tr.Children
'each cell in current row
Debug.Print " " & td.innerHTML
Next td
'next row
Debug.Print "-----"
Next tr
Sample Data returned by webservice I was calling:
<rss version="2.0">
<channel>
<title>RSS NBG Currency Rates</title>
<link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link>
<description>Currency Rates</description>
<language>geo</language>
<copyright>Copyright 2015, NBG</copyright>
<pubDate>Wed, 29 Apr 2015 12:39:50 +0400</pubDate>
<lastBuildDate>Wed, 29 Apr 2015 12:39:50 +0400</lastBuildDate>
<managingEditor>alex#proservice.ge</managingEditor>
<webMaster>alex#proservice.ge</webMaster>
<item>
<title>Currency Rates 2015-04-29</title>
<link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link>
<description>
<![CDATA[
<table border="0">
<tr>
<td>AED</td>
<td>10 არაბეთის გაერთიანებული საამიროების დირჰამი</td>
<td>6.2858</td>
<td><img src="https://www.nbg.gov.ge/images/green.gif"></td>
<td>0.0640</td> </tr><tr> <td>AMD</td> <td>1000 სომხური დრამი</td>
<td>4.8676</td>
<td><img src="https://www.nbg.gov.ge/images/green.gif"></td>
<td>0.0414</td>
</tr>
</table>
]]>
</description>
<pubDate>Wed, 29 Apr 2015 12:39:50 +0400</pubDate>
<guid>
https://www.nbg.gov.ge/index.php?m=236&lang=geo&date=2015-04-29
</guid>
</item>
</channel>
</rss>
and the function that actually gets the document from the webserver (only works if you added the references as shown in above pictures)
Function GetXMLDocument(url As String) As MSXML2.DOMDocument60
Dim xhr As New XMLHTTP60
Dim doc As New DOMDocument60
Dim msg As String
With xhr
.Open bstrMethod:="GET", bstrUrl:=url, varAsync:=False
On Error GoTo SendError
.send
On Error GoTo 0
'http status codes - http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
'200 = SUCCESS - OK
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
'Debug.Print .responseText
doc.LoadXML (.responseText)
Else
msg = "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
GoTo Error
End If
End With
Set GetXMLDocument = doc
Exit Function
SendError:
'by default access to data source accross internet dissabled
'go to internet options & under security>custom level>Misc>access data sources accross domains> enable
'see: http://stackoverflow.com/a/17402920
MsgBox "Make sure access data sources accross domains is enabled under internet options>security>custom", vbOKOnly, "Could not send request to server"
Error:
MsgBox msg, vbOKOnly, "Unexpected Error"
End Function

How to google image search from excel cell content and return url?

Is it possible to write a macro or vba which will image search the cell content in row A and return the image's url? Ive been able to return the first result in the past or the first search result but i can't figure out how to make one that will return a relevant image such as:
http://www.gamexchange.co.uk/images/pictures/products/ps3/littlebigplanet-3-ps3-(product-photography).jpg?v=b13dfc3644d28cd2cff7d5729a35fc5e
Any help would be appreciated, as i think i've exhausted the search function now.
Thank you.
Here is the code I've used which doesn't work as desired.
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
The problem is that every search is going to take you to a different domain, with a completely different HTML structure.
There is no reliable way to do this and bring in the correct picture, because you cannot possibly know in advance the structure of the returned HTML which you need to parse for the image... Of course you could try to return, for instance, the first image on each page, but since the pages will likely contain many images which are irrelevant to your search, that is not going to be very reliable.
One alternative would be to do what you are currently doing to obtain the str_text and the link.href to columns B and C of your spreadsheet, and then run a second XMLHTTP request to a URL of the following structure:
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&source=lnms&tbm=isch&sa=X" & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
This will return a Google Image search. Again, this will not be perfect since the application has no way of knowing which is the "best" picture to return, but you could more reliably return the first image since these results will be more likely relevant to your search term.