Excel VBA: alter url based on cell value - vba

Parent post:
VBA: Selecting from dropdown menu to reload page and scraping data
Apparently I don't know how to use stackoverflow: I deleted my profile, thinking I was 'muting' the email updates. Just started coding VBA today, not quite sure what I'm doing. With the help of the awesome user SIM, the code is working.
I was trying to further tinker the code, to alter the web url address such that it would insert whatever ticker symbol I put in cell J1. In this case, I'm trying to lookup more than just jpm.
The goal here is to put in whatever ticker symbol in J1, and it would be reflected on the query url.
For example:
J1 would hold AAPL, and the .Open command would be
.Open "POST", "https://www.nasdaq.com/symbol/jpm/historical", False
Or J1 would hold WFC, and the .Open command would be
.Open "POST", "https://www.nasdaq.com/symbol/WFC/historical", False
However, my attempts are not working so hot.
Here's what I have so far.
Sub Get_Data()
Dim tabd As Object, trow As Object, r&, c&
Dim QueryString$, S$
QueryString = "10y|false|" & Range("J1").Value & "" ''change here the "year" and the "ticker" name as necessary
''Set web_url = "https://www.nasdaq.com/symbol/" & Range("J1").Value & "/historical"
Range("A:F").ClearContents
With New XMLHTTP
.Open "POST", "https://www.nasdaq.com/symbol/jpm/historical", False
''.Open "POST", "web_url", False
.setRequestHeader "User-Agent", "IE"
.setRequestHeader "Content-Type", "application/json"
.send QueryString
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each tabd In .getElementById("quotes_content_left_pnlAJAX").getElementsByTagName("table")(0).Rows
For Each trow In tabd.Cells
c = c + 1: Cells(r + 1, c) = trow.innerText
Next trow
c = 0: r = r + 1
Next tabd
End With
End Sub
I commented out the section that did not work.

Actually, your commented part is almost right. You can't use Set since you are creating a string, and the ticker inside the url must be lower case. Also, you were passing "web_url" as string literal in the Open method.
This is how you would do it:
Sub Get_Data()
Dim tabd As Object, trow As Object, r&, c&
Dim QueryString$, S$
QueryString = "10y|false|" & Range("J1").Value & "" ''change here the "year" and the "ticker" name as necessary
web_url = "https://www.nasdaq.com/symbol/" & LCase(Range("J1").Value) & "/historical"
Range("A:F").ClearContents
With New XMLHTTP
.Open "POST", web_url, False
.setRequestHeader "User-Agent", "IE"
.setRequestHeader "Content-Type", "application/json"
.send QueryString
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each tabd In .getElementById("quotes_content_left_pnlAJAX").getElementsByTagName("table")(0).Rows
For Each trow In tabd.Cells
c = c + 1: Cells(r + 1, c) = trow.innerText
Next trow
c = 0: r = r + 1
Next tabd
End With
End Sub

Related

Parsing HTTP Response to spreadsheet (error'94': invalid use of null)

I am using VBA for sending HTTP Request and parse Response to my spreadsheet. while decoding table data from response, I am getting "error'94': invalid use of null".
code is given below. HTML structure of response link is also shared. Please Help..
Image-1
Image-2
Here is a link [a link](https://drive.google.com/file/d/1_ubH3zyPiZ8crK-KYs_-RipkGDoFci9G/view?usp=sharing)!
URL = "https://xx.xxxx.com/itrade/user/watch.exe?action=C "
xmlhttp.Open "POST", URL, False
xmlhttp.setRequestHeader "Host", "xx.xxxx.com"
.....
xmlhttp.setRequestHeader "Accept-Language", "en-US,en;q=0.9"
xmlhttp.setRequestHeader "Cookie", cook
xmlhttp.send bodyk
html.body.innerHTML = xmlhttp.responseText
'---------Done-----------
getcookres = xmlhttp.GetResponseHeader("Set-Cookie")
Worksheets("Action").Cells(3, 24).Value = getcookres
Dim nodeList As Object, i As Long
Set nodeList = html.querySelectorAll("#data_table tr.tinside td")
For i = 1 To nodeList.Length - 1
Debug.Print nodeList.Item(i).innerText
Sheets("Sheet1").Cells(1, i).Value = nodeList.Item(i).innerText
Next
Worksheets("Action").Cells(6, 26).Value = Worksheets("Action").Cells(6, 27).Value
End Sub

How can i web scrape data outside of div tags

Im having trouble webscraping a value from https://next-episode.net/star-trek-picard. Im looking to pull the season number from the "Previous__episode" tag ID. The number seems to be in between 2 tags and I cant seem to find a way to reference it. Ive previously been pulling all of the inner text of the previous_episode tag and using regex functions to isolate the values that I want giving me the following
Previous Episode
Name:Broken Pieces
Date:
Thu Mar 12, 2020
Season:
1
Episode:8
Summary:Episode Summary
(the above has no lines in between them in excel. The formatting in the question box is placing them all on one line for some reason)
Before tonight the number 1 was in the same line as "Season:" allowing my regex function to work.
Or possibly a regex patter to allow me to get the 1 on the next line after "season:"
Dim XML_05 As New MSXML2.XMLHTTP60
Dim HTML_05 As New MSHTML.HTMLDocument
XML_05.Open "GET", Cells(Row, NextEpisodeURL).Value, False
XML_05.send
HTML_05.body.innerHTML = XML_05.responseText
Dim NETC_05 As MSHTML.IHTMLElementCollection
Dim NET_05 As MSHTML.IHTMLElement
Dim REC_05 As MSHTML.IHTMLElement
Dim CEC_05 As MSHTML.IHTMLElementCollection
Dim CE_05 As MSHTML.IHTMLElement
Dim REO_05 As VBScript_RegExp_55.RegExp
Dim MO_05 As Object
Dim SN_05() As String
Dim ENA_05() As String
Dim EN_05() As String
Dim LatestEpisodeName As String
Set NET_05 = HTML_05.getElementById("previous_episode")
Set REO_05 = New VBScript_RegExp_55.RegExp
REO_05.Global = True
REO_05.IgnoreCase = True
REO_05.Pattern = "(Name:(.*))"
Set MO_05 = REO_05.Execute(NET_05.innerText)
Debug.Print MO_05.Count
Debug.Print MO_05(0).Value
ENA_05 = Split(MO_05(0), ":")
Debug.Print ENA_05(1)
LatestEpisodeName = ENA_05(1)
REO_05.Pattern = "(Episode:([0-9]*))"
Set MO_05 = REO_05.Execute(NET_05.innerText)
Debug.Print MO_05.Count
Debug.Print MO_05(0).Value
EN_05 = Split(MO_05(0), ":")
Debug.Print EN_05(1)
Cells(Row, EpisodeNet).Value = EN_05(1)
REO_05.Pattern = "(Season:\s+([0-9]*))"
Set MO_05 = REO_05.Execute(NET_05.innerText)
Debug.Print MO_05.Count
Debug.Print MO_05(5).Value
SN_05 = Split(MO_05(0), ":")
Debug.Print SN_05(1)
Trim (SN_05(1))
Cells(Row, SeasonNet).Value = SN_05(1)
Set NETC_05 = HTML_05.getElementById("next_episode").Children
Cells(Row, CountDown).Value = NETC_05(5).innerText
Debug.Print NETC_05(5).innerText
I suppose this is something that might help you get the required fields you wish to grab:
Sub FetchData()
Const Url$ = "https://next-episode.net/star-trek-picard"
Dim HTML As New HTMLDocument, post As Object
Dim rxp As New RegExp, R&
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
HTML.body.innerHTML = .responseText
End With
Set post = HTML.getElementById("previous_episode")
With rxp
.pattern = "(Name:(.*))"
If .Execute(post.innerText).Count > 0 Then
R = R + 1: Cells(R, 1) = .Execute(post.innerText)(0).SubMatches(0)
End If
.pattern = "(Season:\s*([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
Cells(R, 2) = .Execute(post.innerText)(0).SubMatches(0)
End If
.pattern = "(Episode:([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
Cells(R, 3) = .Execute(post.innerText)(0).SubMatches(0)
End If
End With
End Sub
Const Url$ = "https://next-episode.net/star-trek-picard"
Dim HTML As New HTMLDocument, post As Object
Dim rxp As New RegExp, R&
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
HTML.body.innerHTML = .responseText
End With
Set post = HTML.getElementById("previous_episode")
With rxp
.pattern = "(Name:(.*))"
If .Execute(post.innerText).Count > 0 Then
R = R + 1: Cells(R, 1) = .Execute(post.innerText)(0).SubMatches(0)
End If
.pattern = "(Season:\s+([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
CleanString = Application.WorksheetFunction.Clean(.Execute(post.innerText)(0).SubMatches(0))
SeasonNumber = Split(CleanString, ":")
Cells(R, 2) = SeasonNumber(1)
End If
.pattern = "(Episode:([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
Cells(R, 3) = .Execute(post.innerText)(0).SubMatches(0)
End If
End With
End Sub
It is cleaner and faster to use the DOM to your advantage. The nodes you want are NextSiblings to the items matched by the nice and fast css selector #previous_episode .subheadline. You get a returned nodeList which are the left hand side elements e.g. "Name", "Date" etc... Simply, navigate with NextSibling to jump to the right hand side elements e.g. "Et in Arcadia Ego, Part 1".
The returned nodeList starts at 0 and you can index in to get individual items. Note that the second item you want is a textNode and therefore you must use a NodeValue property rather than innerText.
This is faster and mpre correct in terms of dealing with HTML parsing.
Option Explicit
Public Sub GetPriorEpisodeInfp()
Const URL = "https://next-episode.net/star-trek-picard"
Dim html As New mshtml.HTMLDocument, previousEpisodeItems As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Set previousEpisodeItems = html.querySelectorAll("#previous_episode .subheadline")
With ActiveSheet
.Cells(1, 1) = previousEpisodeItems.item(0).NextSibling.innerText
.Cells(1, 2) = previousEpisodeItems.item(2).NextSibling.NodeValue 'textNode
.Cells(1, 3) = previousEpisodeItems.item(3).NextSibling.innerText
End With
End Sub

Creating a facebook scraper

I tried to make a facebook bot to parse the profile links. However, it signs in and parses the content of left-sided bar inconsistently. I can't go further. Could anyone point me in the right direction so that i can rectify my mistakes I've made already in my code and parse the profile links. Here is the code:
strdata = "email=sth.com&pass=xxx"
http.Open "POST", "https://www.facebook.com/login.php?login_attempt=1&lwv=110", False
http.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
http.send strdata
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("_li")(0).getElementsByTagName("a")
For Each topic In topics
Cells(x, 1) = topic.innerText
x = x + 1
Next topic
Does this help?
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "https://www.facebook.com/login.php?login_attempt=1&lwv=110"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
To get the content of facebook, either one has to deal with api or to use selenium which is definitely the worst way. However, my below script can log in to the page and parse some titles:
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "https://www.facebook.com/"
.FindElementById("email").SendKeys ("email_id")
.FindElementById("pass").SendKeys ("Pass_word")
.FindElementById("u_0_2").Click
.Wait 5
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("profileLink")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub

Extracting file URL from a Hyperlinked Image

Sub DownloadFile()
Dim myURL As String
myURL = "http://data.bls.gov/timeseries/LNS14000000"
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False, "username", "password"
WinHttpReq.send
myURL = WinHttpReq.responseBody
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile "C:\Downloads\abc.xlsx", 2
oStream.Close
End If
End Sub
I am trying to download data using VBA and found this code running pretty well. The webpage URL from which I am trying to download data is the one I have used in the code. Please take a moment and open the webpage as the Excel file I am trying to download is linked in an image and so I am not able to find the URL to download the file from that image. Please advice. Thanks.
You might be able to hit the form target directly with a POST (action="/pdq/SurveyOutputServlet") but it is expecting a post string of the <input> elements together with their values. Most if not all of these input elements have been filled out for you simply by going to that page. All you need to do is collect and concatenate them into a post string to be shoved back at the form.
Option Explicit
'base web page
Public Const csBLSGOVpg = "http://data.bls.gov/timeseries/LNS14000000"
'form's action target
Public Const csXLSDLpg = "http://data.bls.gov/pdq/SurveyOutputServlet"
Sub mcr_Stream_Buyer_Documents()
Dim xmlDL As New MSXML2.ServerXMLHTTP60, xmlBDY As New HTMLDocument, adoFILE As Object
Dim xmlSend As String, strFN As String, f As Long, i As Long
With xmlDL
.SetTimeouts 5000, 5000, 15000, 25000
'start by going to the base web page
.Open "GET", csBLSGOVpg, False
.setRequestHeader "Content-Type", "text/javascript"
.send
If .Status <> "200" Then GoTo bm_Exit
'get the source HTML for examination; zero the post string var
xmlBDY.body.innerHTML = .responseText
xmlSend = vbNullString
'loop through the forms until you find the right one
'then loop through the input elements and construct a post string
For f = 0 To xmlBDY.getElementsByTagName("form").Length - 1
If xmlBDY.getElementsByTagName("form")(f).Name = "excel" Then
With xmlBDY.getElementsByTagName("form")(f)
For i = 0 To .getElementsByTagName("input").Length - 1
xmlSend = xmlSend & Chr(38) & _
.getElementsByTagName("input")(i).Name & Chr(61) & _
.getElementsByTagName("input")(i).Value
Next i
xmlSend = "?.x=5&.y=5" & xmlSend
End With
Exit For
End If
Next f
'Debug.Print xmlSend 'check the POST string
'send the POST string back to the form's action target
.Open "POST", csXLSDLpg, False
xmlDL.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
xmlDL.send xmlSend
If xmlDL.Status <> "200" Then GoTo bm_Exit
'pick up the response as a stream and save it as a .XLSX
strFN = Environ("USERPROFILE") & "\Documents\LNS14000000" & Format(Date, "yyyymmdd") & ".xlsx"
On Error Resume Next
Kill strFN
On Error GoTo 0
Set adoFILE = CreateObject("ADODB.Stream")
adoFILE.Type = 1
adoFILE.Open
adoFILE.Write .responseBody
adoFILE.SaveToFile strFN, 2
Set adoFILE = Nothing
End With
Set xmlBDY = Nothing
Set xmlDL = Nothing
Exit Sub
bm_Exit:
Debug.Print Err.Number & ":" & Err.Description
End Sub
This is pretty minimalist but it is all that you need. There is at least one non-standard input element that does not have a name but I elected to send its value back anyway. I did not sequentially remove things until it broke; I just built the POST string given what I retrieved and sent it back.
     
               LNS1400000020150916.xlsx
You will probably be moving this code to some sort of loop. Adjust the receiving file name accordingly. Each new page should adjust its own form input elements accordingly.
Once response is stored in an HTMLDocument object you can use a CSS selector of
#download_xlsx
The "#" means id.
You can then click on this element
htmlDocument.querySelector("#download_xlsx").Click
VBA:
Option Explicit
Public Sub DownloadFile()
Dim ie As New InternetExplorer
With ie
.Visible = True
.navigate "https://data.bls.gov/timeseries/LNS14000000"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("#download_xlsx").Click
.Quit
End With
End Sub
Other:
You could even target the form and submit:
.document.forms("excel").submit
This triggers the POST request mentioned in the other answer (which is an awesome answer btw).

Strip multiple li elements from google with VBA

My goal is to strip all Google search results from page 1 with VBA to Excel. Until so far I managed to strip the first result. The head, link and date are stored in cells 4, 5, 6. I now have to make a loop for the other li's, but I can't get it straight. Also the function that stores the date isn't very optimal coded I think. Anyone who knows the answer?
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object, Objdatum As Object, Ddatum As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 3) & "Skipr" & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
Set Objdatum = objResultDiv.getelementsbytagname("span")(2)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
dat_text = Objdatum.innerHTML
Cells(i, 4) = str_text
Cells(i, 5) = link.href
Cells(i, 6) = dat_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
You need to iterate the collections returned to you by your getelementsbytagname call instead of returning only the first element with the array index (0)
I had a similar project, below are some tips & my approach for your reference, it might assist you in working & maintaining the code in the future:
First, Instead of using CreateObject I prefer to reference the Object Libraries that expose the COM objects, this gives me the ability to browse the functions and properties of each object F2 and gives me code completion (speed & less bugs) within VBA editor (F7 takes you back to code view).
Giving me documentation and code completion:
Also, use these const for clarity
'see ready state : https://msdn.microsoft.com/en-us/library/ie/ms534361(v=vs.85).aspx
Const READYSTATE_UNINITIALIZED = 0
Const READYSTATE_LOADING = 1
Const READYSTATE_LOADED = 2
Const READYSTATE_INTERACTIVE = 3
Const READYSTATE_COMPLETE = 4
Finally, using DOMDocument60 to parse the XML into a Document Object Model in memory.
and MSHTML.HTMLDocument to parse the HTML document and iterate the table rows.
Below is code where I iterate all returned rows from a table within a html document embedded in the initial XML document returned by the webserver:
Dim xmlDoc As DOMDocument60
Set xmlDoc = GetXMLDocument("http://www.nbg.ge/rss.php")
'extract publication date
Debug.Print xmlDoc.getElementsByTagName("pubDate")(0).Text
'unwrap html document from CDATA in "//item/description" element
Dim htmlDoc As New MSHTML.HTMLDocument
htmlDoc.body.innerHTML = xmlDoc.SelectNodes("//item/description")(0).Text
'extract table data from html document
Dim tr As IHTMLElement, td As IHTMLElement
For Each tr In htmlDoc.getElementsByTagName("tr")
For Each td In tr.Children
'each cell in current row
Debug.Print " " & td.innerHTML
Next td
'next row
Debug.Print "-----"
Next tr
Sample Data returned by webservice I was calling:
<rss version="2.0">
<channel>
<title>RSS NBG Currency Rates</title>
<link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link>
<description>Currency Rates</description>
<language>geo</language>
<copyright>Copyright 2015, NBG</copyright>
<pubDate>Wed, 29 Apr 2015 12:39:50 +0400</pubDate>
<lastBuildDate>Wed, 29 Apr 2015 12:39:50 +0400</lastBuildDate>
<managingEditor>alex#proservice.ge</managingEditor>
<webMaster>alex#proservice.ge</webMaster>
<item>
<title>Currency Rates 2015-04-29</title>
<link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link>
<description>
<![CDATA[
<table border="0">
<tr>
<td>AED</td>
<td>10 არაბეთის გაერთიანებული საამიროების დირჰამი</td>
<td>6.2858</td>
<td><img src="https://www.nbg.gov.ge/images/green.gif"></td>
<td>0.0640</td> </tr><tr> <td>AMD</td> <td>1000 სომხური დრამი</td>
<td>4.8676</td>
<td><img src="https://www.nbg.gov.ge/images/green.gif"></td>
<td>0.0414</td>
</tr>
</table>
]]>
</description>
<pubDate>Wed, 29 Apr 2015 12:39:50 +0400</pubDate>
<guid>
https://www.nbg.gov.ge/index.php?m=236&lang=geo&date=2015-04-29
</guid>
</item>
</channel>
</rss>
and the function that actually gets the document from the webserver (only works if you added the references as shown in above pictures)
Function GetXMLDocument(url As String) As MSXML2.DOMDocument60
Dim xhr As New XMLHTTP60
Dim doc As New DOMDocument60
Dim msg As String
With xhr
.Open bstrMethod:="GET", bstrUrl:=url, varAsync:=False
On Error GoTo SendError
.send
On Error GoTo 0
'http status codes - http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
'200 = SUCCESS - OK
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
'Debug.Print .responseText
doc.LoadXML (.responseText)
Else
msg = "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
GoTo Error
End If
End With
Set GetXMLDocument = doc
Exit Function
SendError:
'by default access to data source accross internet dissabled
'go to internet options & under security>custom level>Misc>access data sources accross domains> enable
'see: http://stackoverflow.com/a/17402920
MsgBox "Make sure access data sources accross domains is enabled under internet options>security>custom", vbOKOnly, "Could not send request to server"
Error:
MsgBox msg, vbOKOnly, "Unexpected Error"
End Function