Sub DownloadFile()
Dim myURL As String
myURL = "http://data.bls.gov/timeseries/LNS14000000"
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False, "username", "password"
WinHttpReq.send
myURL = WinHttpReq.responseBody
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile "C:\Downloads\abc.xlsx", 2
oStream.Close
End If
End Sub
I am trying to download data using VBA and found this code running pretty well. The webpage URL from which I am trying to download data is the one I have used in the code. Please take a moment and open the webpage as the Excel file I am trying to download is linked in an image and so I am not able to find the URL to download the file from that image. Please advice. Thanks.
You might be able to hit the form target directly with a POST (action="/pdq/SurveyOutputServlet") but it is expecting a post string of the <input> elements together with their values. Most if not all of these input elements have been filled out for you simply by going to that page. All you need to do is collect and concatenate them into a post string to be shoved back at the form.
Option Explicit
'base web page
Public Const csBLSGOVpg = "http://data.bls.gov/timeseries/LNS14000000"
'form's action target
Public Const csXLSDLpg = "http://data.bls.gov/pdq/SurveyOutputServlet"
Sub mcr_Stream_Buyer_Documents()
Dim xmlDL As New MSXML2.ServerXMLHTTP60, xmlBDY As New HTMLDocument, adoFILE As Object
Dim xmlSend As String, strFN As String, f As Long, i As Long
With xmlDL
.SetTimeouts 5000, 5000, 15000, 25000
'start by going to the base web page
.Open "GET", csBLSGOVpg, False
.setRequestHeader "Content-Type", "text/javascript"
.send
If .Status <> "200" Then GoTo bm_Exit
'get the source HTML for examination; zero the post string var
xmlBDY.body.innerHTML = .responseText
xmlSend = vbNullString
'loop through the forms until you find the right one
'then loop through the input elements and construct a post string
For f = 0 To xmlBDY.getElementsByTagName("form").Length - 1
If xmlBDY.getElementsByTagName("form")(f).Name = "excel" Then
With xmlBDY.getElementsByTagName("form")(f)
For i = 0 To .getElementsByTagName("input").Length - 1
xmlSend = xmlSend & Chr(38) & _
.getElementsByTagName("input")(i).Name & Chr(61) & _
.getElementsByTagName("input")(i).Value
Next i
xmlSend = "?.x=5&.y=5" & xmlSend
End With
Exit For
End If
Next f
'Debug.Print xmlSend 'check the POST string
'send the POST string back to the form's action target
.Open "POST", csXLSDLpg, False
xmlDL.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
xmlDL.send xmlSend
If xmlDL.Status <> "200" Then GoTo bm_Exit
'pick up the response as a stream and save it as a .XLSX
strFN = Environ("USERPROFILE") & "\Documents\LNS14000000" & Format(Date, "yyyymmdd") & ".xlsx"
On Error Resume Next
Kill strFN
On Error GoTo 0
Set adoFILE = CreateObject("ADODB.Stream")
adoFILE.Type = 1
adoFILE.Open
adoFILE.Write .responseBody
adoFILE.SaveToFile strFN, 2
Set adoFILE = Nothing
End With
Set xmlBDY = Nothing
Set xmlDL = Nothing
Exit Sub
bm_Exit:
Debug.Print Err.Number & ":" & Err.Description
End Sub
This is pretty minimalist but it is all that you need. There is at least one non-standard input element that does not have a name but I elected to send its value back anyway. I did not sequentially remove things until it broke; I just built the POST string given what I retrieved and sent it back.
LNS1400000020150916.xlsx
You will probably be moving this code to some sort of loop. Adjust the receiving file name accordingly. Each new page should adjust its own form input elements accordingly.
Once response is stored in an HTMLDocument object you can use a CSS selector of
#download_xlsx
The "#" means id.
You can then click on this element
htmlDocument.querySelector("#download_xlsx").Click
VBA:
Option Explicit
Public Sub DownloadFile()
Dim ie As New InternetExplorer
With ie
.Visible = True
.navigate "https://data.bls.gov/timeseries/LNS14000000"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("#download_xlsx").Click
.Quit
End With
End Sub
Other:
You could even target the form and submit:
.document.forms("excel").submit
This triggers the POST request mentioned in the other answer (which is an awesome answer btw).
Related
I'm trying to optimize a VBA application that generates a report.
This report requires the application to download and embed multiple images.
I've identified this as the biggest bottle neck in the application.
My first attempt was to get VBA to execute a Powershell command that would download the images early on in the generation of the report and then the application would embed them from the HD after it was done crunching data.
Do to obvious security concerns my work environment prevents VBA from executing shell scripts.
After a few unimportant failed attempts (trying to open another/this xlsm workbook with a new excel application independently of my vba thread with an on open execution and variations of this) I've come here asking for suggestions.
How would you use vba to start downloading image (using any native windows 10 application/command/process/...) and not wait for the download to complete before moving onto the next line of code?
Later on in the application I'll have code to scan the destination directory to determine if the files are done being downloaded if not it'll sleep and repeat x times before fail.
Update: Based on the comments I think I'm very close to a solution. I've included the code I'm currently working with at the bottom of this update. The problem now is that it quickly downloads the file as long as I have made a request to the same url at least once before.
On the first request it hangs on 'oXMLHTTP.send' for a period of time a little greater than what I would expect it would take to download the file through a browser and then for some reason resizes itself.
Could anyone help me with this hanging issue and / or explain why this code calls 'Workbook_WindowResize'?
This happens on and off my works VPN. Looking at Fiddler I can tell that only two requests get sent out.
Result 200: http://ipv4.download.thinkbroadband.com/50MB.zip?randomizer=ff%2014
Result 200: http://ipv4.download.thinkbroadband.com/50MB.zip?randomizer=ee%20761
Result and Code
In a brand new workbook I pasted the code found at the end of this update.
This is what it I got in the immediate window.
A took: 33375milliseconds
Pre DoEvents
Workbook_WindowResized
Post DoEvents
B took: 593milliseconds
Pre DoEvents
Post DoEvents
C took: 33797milliseconds
Pre DoEvents
Workbook_WindowResized
Post DoEvents
Do work
Pre DoEvents
Post DoEvents
a done
b done
c done
ThisWorkbook Code
Private mlngStart As Long
Private Declare Function GetTickCount Lib "kernel32" () As Long
Public Sub StartTimer()
mlngStart = GetTickCount
End Sub
Public Function EndTimer() As Long
EndTimer = (GetTickCount - mlngStart)
End Function
Function StartDownload(ByVal vWebFile As String, sPath As String) As Object
Dim oXHTTP As Object
Dim oStream As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP.3.0")
Set oStream = CreateObject("ADODB.Stream")
Application.StatusBar = "Fetching " & vWebFile & " as " & sPath
oXHTTP.Open "GET", vWebFile, False
oXHTTP.send
With oStream
.Type = 1 'adTypeBinary
.Open
.Write oXHTTP.responseBody
.SaveToFile sPath, 2 'adSaveCreateOverWrite
.Close
End With
Set StartDownload = oXHTTP
Set oStream = Nothing
Application.StatusBar = False
End Function
Sub FinishDownload(ByRef oXMLHTTP, ByVal vLocalFile As String)
'Wait for request to finish
Do While oXMLHTTP.readyState <> 4
DoEvents
Loop
End Sub
Function foo()
Dim dest As String
dest = "C:\sandbox\"
Dim a, b, c As Object
DoEvents
Url = "http://ipv4.download.thinkbroadband.com/50MB.zip?randomizer=ff" & Str(Math.Round(Math.Rnd(12) * 1000, 0))
Call StartTimer
Set a = StartDownload(Url, dest & "a.zip")
Debug.Print "A took: " & EndTimer & "milliseconds"
Debug.Print "Pre DoEvents"
DoEvents
Debug.Print "Post DoEvents"
Call StartTimer
Set b = StartDownload(Url, dest & "b.zip")
Debug.Print "B took: " & EndTimer & "milliseconds"
Debug.Print "Pre DoEvents"
DoEvents
Debug.Print "Post DoEvents"
Url = "http://ipv4.download.thinkbroadband.com/50MB.zip?randomizer=ee" & Str(Math.Round(Math.Rnd(12) * 1000, 0))
Call StartTimer
Set c = StartDownload(Url, dest & "c.zip")
Debug.Print "C took: " & EndTimer & "milliseconds"
Debug.Print "Pre DoEvents"
DoEvents
Debug.Print "Post DoEvents"
Debug.Print ("Do work")
Call bar
Debug.Print "Pre DoEvents"
DoEvents
Debug.Print "Post DoEvents"
Call FinishDownload(a, dest & "a.zip")
Debug.Print ("a done")
Call FinishDownload(b, dest & "b.zip")
Debug.Print ("b done")
Call FinishDownload(c, dest & "c.zip")
Debug.Print ("c done")
End Function
Function Download_File(ByVal vWebFile As String, ByVal vLocalFile As String) As Boolean
Dim oXMLHTTP As Object, i As Long, vFF As Long, oResp() As Byte
'You can also set a ref. to Microsoft XML, and Dim oXMLHTTP as MSXML2.XMLHTTP
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
oXMLHTTP.Open "GET", vWebFile, False 'Open socket to get the website
oXMLHTTP.send 'send request
'Wait for request to finish
Do While oXMLHTTP.readyState <> 4
DoEvents
Loop
oResp = oXMLHTTP.responseBody 'Returns the results as a byte array
'Create local file and save results to it
vFF = FreeFile
If Dir(vLocalFile) <> "" Then Kill vLocalFile
Open vLocalFile For Binary As #vFF
Put #vFF, , oResp
Close #vFF
'Clear memory
Set oXMLHTTP = Nothing
End Function
Sub bar()
Dim F As Integer
F = FreeFile
Open "C:\sandbox\" & "\example.txt" For Output As F
Close #F
End Sub
Private Sub Workbook_WindowResize(ByVal Wn As Window)
Debug.Print "Workbook_WindowResized"
End Sub
From the links provided by #Tim Williams in the comments I created this and it works.
Function StartDownload(ByVal vWebFile As String) As Object
Dim oXMLHTTP As Object, i As Long, vFF As Long, oResp() As Byte
'You can also set a ref. to Microsoft XML, and Dim oXMLHTTP as MSXML2.XMLHTTP
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
oXMLHTTP.Open "GET", vWebFile, True'Open socket to get the website
oXMLHTTP.Send 'send request
Set StartDownload = oXMLHTTP
End Function
Sub FinishDownload(ByRef oXMLHTTP, ByVal vLocalFile As String)
'Wait for request to finish
Do While oXMLHTTP.readyState <> 4
DoEvents
Loop
oResp = oXMLHTTP.responseBody 'Returns the results as a byte array
'Create local file and save results to it
vFF = FreeFile
If Dir(vLocalFile) <> "" Then Kill vLocalFile
Open vLocalFile For Binary As #vFF
Put #vFF, , oResp
Close #vFF
'Clear memory
Set oXMLHTTP = Nothing
End Sub
Function foo()
Dim dest As String
dest = "C:\sandbox\"
url = "http://ipv4.download.thinkbroadband.com/200MB.zip"
Dim a, b, c As Object
DoEvents
Set a = DownloadManager.StartDownload(url)
DoEvents
Set b = DownloadManager.StartDownload(url)
DoEvents
Set c = DownloadManager.StartDownload(url)
DoEvents
Debug.Print ("Do Something")
Call FinishDownload(a, dest & "a.zip")
Debug.Print ("a done")
Call FinishDownload(b, dest & "b.zip")
Debug.Print ("b done")
Call FinishDownload(c, dest & "c.zip")
Debug.Print ("c done")
End Function
I'm trying to build a VBA code which has as input this calendar:
https://www.fxstreet.com/economic-calendar#
In this link there exists the option to download it in format .csv. For example this was the link of the download. https://calendar.fxstreet.com/eventdate/?f=csv&v=2&timezone=Central+Standard+Time&rows=&view=range&start=20180909&end=20180915&countrycode=US&volatility=0&culture=en&columns=CountryCurrency%2CCountdown
I want to define a code in VBA based on it, changing that start date and end date according to my input in cell "A1" and "A2", but it's impossible due to the structure of the link (it doesn't finish in .csv). If you go to section of downloads in your browser, and press the link, it won't download again, instead a message of error will appear. It just works when opening the first link and selecting the option to download- so, I can´t build a structure in VBA based on it.
Does there exist a way that VBA can open the link and then "select" the option to download, or do you have another idea to download it using VBA?
I don't see any kind of CSV file in the link you posted, but this is one way you could do it with VBA.
Sub Download()
Dim myURL As String
myURL = "http://www.asx.com.au/data/options_code_list.csv"
Dim WinHttpReq As Object
Dim ostream as Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False
WinHttpReq.Send
myURL = WinHttpReq.ResponseBody
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.ResponseBody
oStream.SaveToFile ("C:\your_path_here\file.csv")
oStream.Close
End If
End Sub
Not great due to sendkeys but does download the CSV for the current period. Setting dates seems to be a lot harder. Whilst entering custom dates ranges and clicking apply is easy, the values don't appear to be retained (manually or through code!). The only way values seem to be retained is if you actually make selections on the calendar itself. That then becomes a lot more finicky. I could address that in a new question if required.
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, calendar As Object, t As Date
Const WAIT_TIME_SECS As Long = 10
With IE
.Visible = True
.navigate "https://www.fxstreet.com/economic-calendar#"
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
If Timer - t > WAIT_TIME_SECS Then Exit Do
On Error Resume Next
Set calendar = .document.querySelector(".fa.fa-calendar")
On Error GoTo 0
Loop While calendar Is Nothing
If calendar Is Nothing Then Exit Sub
.document.querySelector("[fxs_csv]").Click
With Application
.Wait Now + TimeSerial(0, 0, 2)
.SendKeys "%{S}"
.Wait Now + TimeSerial(0, 0, 5)
End With
.Quit
End With
End Sub
References:
VBE > Tools > References and add a reference to Microsoft Internet Controls
Adjust the 'iTable' variable to the table number that you want to import (i.e., 1, 2, 3, etc)
Sub HTML_Table_To_Excel()
Dim htm As Object
Dim Tr As Object
Dim Td As Object
Dim Tab1 As Object
'Replace the URL of the webpage that you want to download
'Web_URL = "https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
Web_URL = "https://www.fxstreet.com/economic-calendar"
'Create HTMLFile Object
Set HTML_Content = CreateObject("htmlfile")
'Get the WebPage Content to HTMLFile Object
With CreateObject("msxml2.xmlhttp")
.Open "GET", Web_URL, False
.send
HTML_Content.body.innerHTML = .responseText 'this is the highlighted part for the error
End With
Column_Num_To_Start = 1
iRow = 1
iCol = 1
iTable = 1
'Loop Through Each Table and Download it to Excel in Proper Format
For Each Tab1 In HTML_Content.getElementsByTagName("table")
With HTML_Content.getElementsByTagName("table")(iTable)
For Each Tr In .Rows
For Each Td In Tr.Cells
Worksheets("Sheet1").Cells(iRow, iCol).Select
Worksheets("Sheet1").Cells(iRow, iCol) = Td.innerText
iCol = iCol + 1
Next Td
iCol = Column_Num_To_Start
iRow = iRow + 1
Next Tr
End With
Next Tab1
MsgBox "Process Completed"
End Sub
My goal is to strip all Google search results from page 1 with VBA to Excel. Until so far I managed to strip the first result. The head, link and date are stored in cells 4, 5, 6. I now have to make a loop for the other li's, but I can't get it straight. Also the function that stores the date isn't very optimal coded I think. Anyone who knows the answer?
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object, Objdatum As Object, Ddatum As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 3) & "Skipr" & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
Set Objdatum = objResultDiv.getelementsbytagname("span")(2)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
dat_text = Objdatum.innerHTML
Cells(i, 4) = str_text
Cells(i, 5) = link.href
Cells(i, 6) = dat_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
You need to iterate the collections returned to you by your getelementsbytagname call instead of returning only the first element with the array index (0)
I had a similar project, below are some tips & my approach for your reference, it might assist you in working & maintaining the code in the future:
First, Instead of using CreateObject I prefer to reference the Object Libraries that expose the COM objects, this gives me the ability to browse the functions and properties of each object F2 and gives me code completion (speed & less bugs) within VBA editor (F7 takes you back to code view).
Giving me documentation and code completion:
Also, use these const for clarity
'see ready state : https://msdn.microsoft.com/en-us/library/ie/ms534361(v=vs.85).aspx
Const READYSTATE_UNINITIALIZED = 0
Const READYSTATE_LOADING = 1
Const READYSTATE_LOADED = 2
Const READYSTATE_INTERACTIVE = 3
Const READYSTATE_COMPLETE = 4
Finally, using DOMDocument60 to parse the XML into a Document Object Model in memory.
and MSHTML.HTMLDocument to parse the HTML document and iterate the table rows.
Below is code where I iterate all returned rows from a table within a html document embedded in the initial XML document returned by the webserver:
Dim xmlDoc As DOMDocument60
Set xmlDoc = GetXMLDocument("http://www.nbg.ge/rss.php")
'extract publication date
Debug.Print xmlDoc.getElementsByTagName("pubDate")(0).Text
'unwrap html document from CDATA in "//item/description" element
Dim htmlDoc As New MSHTML.HTMLDocument
htmlDoc.body.innerHTML = xmlDoc.SelectNodes("//item/description")(0).Text
'extract table data from html document
Dim tr As IHTMLElement, td As IHTMLElement
For Each tr In htmlDoc.getElementsByTagName("tr")
For Each td In tr.Children
'each cell in current row
Debug.Print " " & td.innerHTML
Next td
'next row
Debug.Print "-----"
Next tr
Sample Data returned by webservice I was calling:
<rss version="2.0">
<channel>
<title>RSS NBG Currency Rates</title>
<link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link>
<description>Currency Rates</description>
<language>geo</language>
<copyright>Copyright 2015, NBG</copyright>
<pubDate>Wed, 29 Apr 2015 12:39:50 +0400</pubDate>
<lastBuildDate>Wed, 29 Apr 2015 12:39:50 +0400</lastBuildDate>
<managingEditor>alex#proservice.ge</managingEditor>
<webMaster>alex#proservice.ge</webMaster>
<item>
<title>Currency Rates 2015-04-29</title>
<link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link>
<description>
<![CDATA[
<table border="0">
<tr>
<td>AED</td>
<td>10 არაბეთის გაერთიანებული საამიროების დირჰამი</td>
<td>6.2858</td>
<td><img src="https://www.nbg.gov.ge/images/green.gif"></td>
<td>0.0640</td> </tr><tr> <td>AMD</td> <td>1000 სომხური დრამი</td>
<td>4.8676</td>
<td><img src="https://www.nbg.gov.ge/images/green.gif"></td>
<td>0.0414</td>
</tr>
</table>
]]>
</description>
<pubDate>Wed, 29 Apr 2015 12:39:50 +0400</pubDate>
<guid>
https://www.nbg.gov.ge/index.php?m=236&lang=geo&date=2015-04-29
</guid>
</item>
</channel>
</rss>
and the function that actually gets the document from the webserver (only works if you added the references as shown in above pictures)
Function GetXMLDocument(url As String) As MSXML2.DOMDocument60
Dim xhr As New XMLHTTP60
Dim doc As New DOMDocument60
Dim msg As String
With xhr
.Open bstrMethod:="GET", bstrUrl:=url, varAsync:=False
On Error GoTo SendError
.send
On Error GoTo 0
'http status codes - http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
'200 = SUCCESS - OK
If .readyState = READYSTATE_COMPLETE And .Status = 200 Then
'Debug.Print .responseText
doc.LoadXML (.responseText)
Else
msg = "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
GoTo Error
End If
End With
Set GetXMLDocument = doc
Exit Function
SendError:
'by default access to data source accross internet dissabled
'go to internet options & under security>custom level>Misc>access data sources accross domains> enable
'see: http://stackoverflow.com/a/17402920
MsgBox "Make sure access data sources accross domains is enabled under internet options>security>custom", vbOKOnly, "Could not send request to server"
Error:
MsgBox msg, vbOKOnly, "Unexpected Error"
End Function
Thanks in advance for the help. I'm running Windows 8.1, I have the latest IE / Chrome browsers, and the latest Excel. I'm trying to write an Excel Macro that pulls data from StackOverflow (https://stackoverflow.com/tags). Specifically, I'm trying to pull the date (that the macro is run), the tag names, the # of tags, and the brief description of what the tag is. I have it working for the first page of the table, but not for the rest (there are 1132 pages at the moment). Right now, it overwrites the data everytime I run the macro, and I'm not sure how to make it look for the next empty cell before running.. Lastly, I'm trying to make it run automatically once per week.
I'd much appreciate any help here. Problems are:
Pulling data from the web table beyond the first page
Making it scrape data to the next empty row rather than overwriting
Making the Macro run automatically once per week
Code (so far) is below. Thanks!
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'Cells.Clear
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A3").CurrentRegion.EntireColumn.AutoFit
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1:D1").Merge
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub
There's no need to scrape Stack Overflow when they make the underlying data available to you through things like the Data Explorer. Using this query in the Data Explorer should get you the results you need:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
The permalink to that query is here and the "Download CSV" option which appears after the query runs is probably the easiest way to get the data into Excel. If you wanted to automate that part of things, the direct link to the CSV download of results is here
You can improve this to parse out exact elements but it loops all the pages and grabs all the tag info (everything next to a tag)
Option Explicit
Public Sub ImportStackOverflowData()
Dim ie As New InternetExplorer, html As HTMLDocument
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "https://stackoverflow.com/tags"
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Dim numPages As Long, i As Long, info As Object, item As Object, counter As Long
numPages = html.querySelector(".page-numbers.dots ~ a").innerText
For i = 1 To 2 ' numPages ''<==1 to 2 for testing; use to numPages
DoEvents
Set info = html.getElementById("tags_list")
For Each item In info.getElementsByClassName("grid-layout--cell tag-cell")
counter = counter + 1
Cells(counter, 1) = item.innerText
Next item
html.querySelector(".page-numbers.next").Click
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Next i
Application.ScreenUpdating = True
.Quit '<== Remember to quit application
End With
End Sub
I'm not making use of the DOM, but I find it very easy to get around just searching between known tags. If ever the expressions you are looking for are too common just tweak the code a bit so that it looks for a string after a string).
An example:
Public Sub ZipLookUp()
Dim URL As String, xmlHTTP As Object, html As Object, htmlResponse As String
Dim SStr As String, EStr As String, EndS As Integer, StartS As Integer
Dim Zip4Digit As String
URL = "https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=1&companyName=&address1=1642+Harmon+Street&address2=&city=Berkeley&state=CA&urbanCode=&postalCode=&zip=94703"
Set xmlHTTP = CreateObject("MSXML2.XMLHTTP")
xmlHTTP.Open "GET", URL, False
On Error GoTo NoConnect
xmlHTTP.send
On Error GoTo 0
Set html = CreateObject("htmlfile")
htmlResponse = xmlHTTP.ResponseText
If htmlResponse = Null Then
MsgBox ("Aborted Run - HTML response was null")
Application.ScreenUpdating = True
GoTo End_Prog
End If
'Searching for a string within 2 strings
SStr = "<span class=""address1 range"">" ' first string
EStr = "</span><br />" ' second string
StartS = InStr(1, htmlResponse, SStr, vbTextCompare) + Len(SStr)
EndS = InStr(StartS, htmlResponse, EStr, vbTextCompare)
Zip4Digit = Left(Mid(htmlResponse, StartS, EndS - StartS), 4)
MsgBox Zip4Digit
GoTo End_Prog
NoConnect:
If Err = -2147467259 Or Err = -2146697211 Then MsgBox "Error - No Connection": GoTo End_Prog 'MsgBox Err & ": " & Error(Err)
End_Prog:
End Sub
I have the following VBA code that it is intended to download a file from the web, give me a message "Downloading Data from ..." and as soon as downloaded give me a message "Downloaded to ...". Here is my code:
Sub DownloadFileFromWeb()
Dim IE As Object
Dim links As Variant, lnk As Variant
Dim download_path As String
download_path = "\\xxxxx\Save Raw File here.xls"
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "http://www.hkma.gov.hk/eng/market-data-and-statistics/monthly-statistical-bulletin/table.shtml#section2" 'load web page
While IE.Busy
DoEvents 'wait until IE is done loading page.
Wend
Set links = IE.document.getElementsByTagName("a")
For Each lnk In links
If Len(lnk.href) > 4 And Right(lnk.href, 4) = ".xls" And InStr(1, lnk.href, "http://www.hkma.gov.hk/media/eng/doc/market-data-and-statistics/monthly-statistical-bulletin/T080102.xls") <> 0 Then
MsgBox "Downloading Data from " & lnk.href
Download_File lnk.href, download_path
MsgBox "Downloaded to - " & download_path
Exit For
End If
Next
End Sub
Function Download_File(ByVal vWebFile As String, ByVal vLocalFile As String) As Boolean
Dim oXMLHTTP As Object, i As Long, vFF As Long, oResp() As Byte
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
oXMLHTTP.Open "GET", vWebFile, False 'Open socket to get the website
oXMLHTTP.Send 'send request
'Wait for request to finish
Do While oXMLHTTP.readyState <> 4
DoEvents
Loop
oResp = oXMLHTTP.responseBody 'Returns the results as a byte array
'Create local file and save results to it
vFF = FreeFile
If Dir(vLocalFile) <> "" Then Kill vLocalFile
Open vLocalFile For Binary As #vFF
Put #vFF, , oResp
Close #vFF
'Clear memory
Set oXMLHTTP = Nothing
End Function
The problem i have with this one is that most of the times i will not get any message box appearing and nothing gets downloaded in the meantime. Can you please help me in order to get the message box all of the time?
Thank you very much!
Tested your code on my end and I can see no errors. I've downloaded it like a hundred times already and it doesn't break. However, I made some minor modifications.
Change your main subroutine to the following:
Sub DownloadFileFromWeb()
Dim IE As Object
Dim links As Variant, lnk As Variant
Dim download_path As String
download_path = "C:\...\SavedFile.xls" 'Modify.
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "http://www.hkma.gov.hk/eng/market-data-and-statistics/monthly-statistical-bulletin/table.shtml#section8" 'load web page
While IE.Busy
DoEvents 'wait until IE is done loading page.
Wend
Set links = IE.document.getElementsByTagName("a")
For Each lnk In links
If Len(lnk.href) > 4 And Right(lnk.href, 4) = ".xls" And InStr(1, lnk.href, "T080102.xls") <> 0 Then
If MsgBox("Downloading Data from " & lnk.href, vbOKOnly) = vbOK Then
Download_File lnk.href, download_path
MsgBox "Downloaded to - " & download_path
Exit For
End If
End If
Next
End Sub
Basically, I just changed one thing: the message box will wait for your input before it downloads the file. Notice how I did If MsgBox(...) = vbOKOnly. This way, it will wait for your input and not break.
Minor change as well to URL. Changed section2 to section8, since that's the table you want (not going to affect anything, IMHO).
Let us know if this helps.