Type Miss Match Error in Excel VBA Code - vba

I am working on a website data extractor. I have two worksheets one for input and other for output, which looks like this..
In the first sheet the cell contains the URL needed to extract data. I am trying this URL
https://www.propertyguru.com.sg/singapore-property-listing/property-for-sale?limit=30&market=residential&property_type_code%5B%5D=4S&property_type=H&freetext=Yishun
I have written this macro..
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim str, e As String
Dim pgf, pgt, pg As Integer
Dim ele, Results As Object
Dim add, size, cno, price, inurl, sp, sp1 As String
Dim isheet, rts As Worksheet
Dim LastRow As Long
Dim pgno As Variant
Set IE = CreateObject("InternetExplorer.Application")
Set isheet = Worksheets("InputSheet")
Set rts = Worksheets("Results")
URL = isheet.Cells(3, 2)
RowCount = 1
rts.Range("A" & RowCount) = "Address"
rts.Range("B" & RowCount) = "Size"
rts.Range("C" & RowCount) = "Contact Number"
rts.Range("D" & RowCount) = "Price"
rts.Range("E" & RowCount) = "Url"
LastRow = rts.Cells(Rows.Count, 2).End(xlUp).Row
'RowCount = LastRow
With IE
.Visible = True
.navigate (URL)
DoEvents
Do While IE.Busy Or IE.readyState <> 4
Loop
'Application.Wait (Now + #12:00:05 AM#)
For Each Results In .document.all
Select Case Results.className
Case "title search-title"
str = Results.innerText
str1 = Split(str, " ")
str = CInt(str1(0))
End Select
If Results.className = "btn btn-main-inverted dropdown-toggle" And InStr(1, Results.Title, " page") > 2 Then
str2 = Results.Title
str1 = Split(str2, " ")
str2 = CInt(str1(0))
End If
Next
If str2 = 0 Then
pgno = CVErr(xlErrDiv0)
Else
pgno = WorksheetFunction.RoundUp(str / str2, 0)
End If
End With
IE.Quit
Set IE = Nothing
UrlS = Split(URL, "?")
Url1 = UrlS(0)
Url2 = "?" & UrlS(1)
For i = 1 To pgno
Set IE = CreateObject("InternetExplorer.Application")
URL = Url1 & "/" & i & Url2
With IE
.Visible = True
.navigate (URL)
DoEvents
Do While IE.Busy Or IE.readyState <> 4
Loop
'Application.Wait (Now + #12:00:08 AM#)
For Each ele In .document.all
Select Case ele.className
Case "listing-img-a"
inurl = ele.href
rts.Cells(LastRow + 1, 5) = inurl
Case "listing-location"
LastRow = LastRow + 1
add = ele.innerText
rts.Cells(LastRow, 1) = add
Case "lst-sizes"
sp = Split(ele.innerText, " ยท")
size = sp(0)
rts.Cells(LastRow, 2) = size
Case "pgicon pgicon-phone js-agent-phone-number" ' btn-tools" 'pgicon pgicon-phone js-agent-phone-number" 'agent-phone-number"
rts.Cells(LastRow, 3) = ele.innerText
Case "listing-price"
price = ele.innerText
rts.Cells(LastRow, 4) = price
End Select
Next
LastRow = rts.Cells(Rows.Count, 2).End(xlUp).Row
rts.Activate
rts.Range("A" & LastRow).Select
End With
IE.Quit
Set IE = Nothing
Application.Wait (Now + #12:00:04 AM#)
Next i
MsgBox "Success"
End Sub
When I run this macro I am getting the error
Type Miss Match
When I debug it highlights the code
For i = 1 To pgno
Set IE = CreateObject("InternetExplorer.Application") URL = Url1 & "/" & i & Url2
With IE .Visible = True .navigate (URL)
I have tried my best to figure it out but could not understand where the problem is. Please help me to make correction..
It is also not getting the whole records on the link. This link contains more than 200 Records as per page is 30 records.

You can rely on implicit conversion and use the following. Assuming all pages do have numbering. You might want to improve error handling. I default to page numbers = 1 if the penultimate li CSS selector fails, otherwise it attempts to get the last page number before the ">"
Refer to my prior answer to your related question which shows you how to more effiently scrape the info off the page.
Sample code to show function being used:
Option Explicit
Public Sub GetListings()
Dim IE As New InternetExplorer, pgno As Long
With IE
.Visible = True
.navigate "https://www.propertyguru.com.sg/singapore-property-listing/property-for-sale?limit=30&market=residential&property_type_code%5B%5D=4S&property_type=H&freetext=Yishun", False
While .Busy Or .readyState < 4: DoEvents: Wend
pgno = GetNumberOfPages(.document)
End With
End Sub
Public Function GetNumberOfPages(ByVal doc As HTMLDocument) As Long
On Error GoTo errhand:
GetNumberOfPages = doc.querySelector(".listing-pagination li:nth-last-child(2)").innerText
Exit Function
errhand:
If Err.Number <> 0 Then GetNumberOfPages = 1
End Function

Related

Extract data from yahoo finance using Excel VBA

Hi I need help with this code I'm trying to extract data from this page https://finance.yahoo.com/quote/ADM.L/balance-sheet?p=ADM.L ,
but the problem is page is by default set to annual but I need quarterly values of total assets and total liabilities.
This code runs but most of the time it is picking annual values. Please suggest something what can I do.
Private Sub CommandButton3_Click()
'
Dim ie As Object
Set Rng = Range("A2:A50")
Set Row = Range(Rng.Offset(1, 0), Rng.Offset(1, 0).End(xlDown))
Set ie = CreateObject("InternetExplorer.Application")
With ie
'.Visible = False
For Each Row In Rng
.navigate "https://finance.yahoo.com/quote/" & Range("A" & Row.Row).Value & "/balance-sheet?p=" & Range("A" & Row.Row).Value
'Application.Wait (Now + TimeValue("0:00:02"))
While ie.readyState <> 4
Wend
Do While ie.Busy: DoEvents: Loop
Dim doc As HTMLDocument
Set doc = ie.document
doc.getElementsByClassName("P(0px) M(0px) C($actionBlue) Bd(0px) O(n)")(2).Click
Do While ie.Busy: DoEvents: Loop
Application.Wait (Now + TimeValue("0:00:05"))
Range("D" & Row.Row).Value = doc.getElementsByClassName("Fw(b) Fz(s) Ta(end)")(4).innerText
Range("E" & Row.Row).Value = doc.getElementsByClassName("Fw(b) Fz(s) Ta(end)")(12).innerText
Range("F" & Row.Row).Value = doc.getElementsByClassName("C($gray) Ta(end)")(0).innerText
Next Row
End With
ie.Quit
'
End Sub
This should be a good start for you to get going.
Sub DownloadData()
Set ie = CreateObject("InternetExplorer.application")
With ie
.Visible = True
.navigate "https://finance.yahoo.com/quote/ADM.L/balance-sheet?p=ADM.L"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Set e = ie.Document.GetElementsByClassName("Fz(s) Fw(500) D(ib) Pend(15px) H(18px) C($finDarkLink):h Mend(15px)")(1)
e.Click
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
End With
End Sub
Basically, 'Annual' is the default and you have to click the 'Quarterly' link, to get the quarterly data displayed. I believe yahoo used to have 2 different URLs for Annual and Quarterly. Now, apparently, they give you 2 links to click to toggle back and forth between the 2 frequencies of financial statements.
Do find below on the code fix. Do take note that Yahoo has updated the classname i.e. from "P(0px) M(0px) C($actionBlue) Bd(0px) O(n)" ==> "P(0px) M(0px) C($c-fuji-blue-1-b) Bd(0px) O(n)".
Private Sub CommandButton3_Click()
Dim ie As Object
Set Rng = Range("A2:A50")
Set row = Range(Rng.Offset(1, 0), Rng.Offset(1, 0).End(xlDown))
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
For Each row In Rng
.navigate "https://finance.yahoo.com/quote/" & Range("A" & row.row).Value & "/balance-sheet?p=" & Range("A" & row.row).Value
While ie.readyState <> 4
Wend
Do While ie.Busy: DoEvents: Loop
Dim doc As HTMLDocument
Set doc = ie.document
Set element = doc.getElementsByClassName("P(0px) M(0px) C($c-fuji-blue-1-b) Bd(0px) O(n)")(2)
element.Click
Do While ie.Busy: DoEvents: Loop
Range("D" & row.row).Value = doc.getElementsByClassName("Fw(b) Fz(s) Ta(end)")(4).innerText
Range("E" & row.row).Value = doc.getElementsByClassName("Fw(b) Fz(s) Ta(end)")(12).innerText
Range("F" & row.row).Value = doc.getElementsByClassName("C($gray) Ta(end)")(0).innerText
Next row
End With
ie.Quit
End Sub
It is not possible to extract Quarterly data from Yahoo using VBA.
Annual data can be extracted, but not quarterly.

Creating a facebook scraper

I tried to make a facebook bot to parse the profile links. However, it signs in and parses the content of left-sided bar inconsistently. I can't go further. Could anyone point me in the right direction so that i can rectify my mistakes I've made already in my code and parse the profile links. Here is the code:
strdata = "email=sth.com&pass=xxx"
http.Open "POST", "https://www.facebook.com/login.php?login_attempt=1&lwv=110", False
http.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
http.send strdata
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("_li")(0).getElementsByTagName("a")
For Each topic In topics
Cells(x, 1) = topic.innerText
x = x + 1
Next topic
Does this help?
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "https://www.facebook.com/login.php?login_attempt=1&lwv=110"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
To get the content of facebook, either one has to deal with api or to use selenium which is definitely the worst way. However, my below script can log in to the page and parse some titles:
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "https://www.facebook.com/"
.FindElementById("email").SendKeys ("email_id")
.FindElementById("pass").SendKeys ("Pass_word")
.FindElementById("u_0_2").Click
.Wait 5
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("profileLink")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub

What is intermittently breaking my short block of VBA?

As per this post I have fixed the object checker. Sometimes the code will run fine for 10 entries, getting them all correct, sometimes it'll run for five. Sometimes it'll get the entries wrong.
It always tends to fail on the getting innertext of the element. When it gets the Y/N result wrong, I don't know at all what's causing that.
Please help! It's driving me mad. I've error checked at every stage over and again.
Sub LetsAutomateIE()
Dim barcode As String
Dim rowe As Integer
Dim document As HTMLDocument
Dim Element As HTMLDivElement
Dim text As String
Dim pos As Integer
Set ie = CreateObject("InternetExplorer.Application")
rowe = 2
While Not IsEmpty(Cells(rowe, 2))
barcode = Cells(rowe, "B").Value
pos = 0
text = ""
Set document = Nothing
With ie
.Visible = False
.navigate2 "https://www.amazon.co.uk/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=" & barcode
Do Until ie.readyState = 4
Loop
End With
Set document = ie.document
If IsObject(document.getElementById("result_0")) = False Then GoTo Here
text = document.getElementById("result_0").innerText
If InStr(text, "STEELBOOK") Or InStr(text, "Steelbook") Or InStr(text, "Steel book") <> 0 Then pos = 1
If pos <> 0 Then Cells(rowe, 4) = "Y" Else Cells(rowe, 4) = "N"
Here:
rowe = rowe + 1
Wend
Set ie = Nothing
End Sub
Here is a selection of sample barcodes I was working with. I've never managed to get through these successfully.
5030305517076
5030305517816
5060223767925
5060223767949
5060223767956
5060223767970
5060223767994
8717418358563
8717418365851
Thank you so much,
Sam
One problem is the fact that for some barcodes no results are found.
If you would test your code with IE.Visible = true then you will see text like this:
Your search "5060223767949" did not match any products.
Another problem is the condition IsObject(document.getElementById("result_0")) = False. This doesn't work well, because IsObject(Nothing) returns true. Better would be to use If <variable-name> Is Nothing Then ....
The complete code. HTH
' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library
Sub LetsAutomateIE()
Dim IE As SHDocVw.InternetExplorer
Dim barcode As String
Dim rowe As Integer
Dim document As HTMLDocument
Dim Element As HTMLDivElement
Dim result01 As HTMLListElement
Dim noResults As HTMLHeaderElement
Dim text As String
Dim pos As Integer
Dim url As String
rowe = 2
url = "https://www.amazon.co.uk/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords="
Set IE = New SHDocVw.InternetExplorer
While Not IsEmpty(Cells(rowe, 2))
barcode = Cells(rowe, "B").Value
pos = 0
text = ""
IE.Navigate url & barcode
While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set document = IE.document
Set result01 = document.getElementById("result_0")
If result01 Is Nothing Then
Set noResults = document.getElementById("noResultsTitle")
If Not noResults Is Nothing Then MsgBox noResults.outerText
GoTo Here
End If
text = document.getElementById("result_0").innerText
If InStr(text, "STEELBOOK") Or InStr(text, "Steelbook") Or InStr(text, "Steel book") <> 0 Then pos = 1
If pos <> 0 Then Cells(rowe, 4) = "Y" Else Cells(rowe, 4) = "N"
Here:
rowe = rowe + 1
Wend
IE.Quit
Set IE = Nothing
End Sub
I'm actually looking just to check the title of the first returned
product on the page...
The title is displayed with h2 element within li with id result_0. So it is possible to limit the search just to this li element and search for first h2 element.
' text = document.getElementById("result_0").innerText
Dim h2Elements As IHTMLElementCollection
Dim h2 As HTMLHeadElement
Set h2Elements = result01.getElementsByTagName("h2")
If h2Elements.Length > 0 Then
Set h2 = h2Elements.Item(0)
text = h2.innerText
Debug.Print text
Else
MsgBox "Text not found"
End If
Output:
RED 2 Blu-ray Steelbook UK Exclusive
The Hunger Games With Mockingjay Pendant
The Hunger Games
The Hunger Games
Avengers Assemble BD Steelbook
Avengers Assemble Bonus Disc BD Retail
I had an issue with document.getElementById("result_0") throwing an error. My workaround was to test if the element was in the Document.Body.InnerHTML.
If you set DebugMode to True then the webpage with bad results is left open for further inspection.
The barcode will be marked NA if not found.
Option Explicit
Sub LetsAutomateIE()
Const DebugMode As Boolean = True
Dim barcode As String, text As String
Dim rowe As Integer
Dim doc As HTMLDocument, liResults As HTMLLIElement
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
rowe = 2
While Not IsEmpty(Cells(rowe, 2))
barcode = Cells(rowe, "B").Value
With ie
.Visible = False
.navigate2 "https://www.amazon.co.uk/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=" & barcode
Do Until ie.readyState = 4
Loop
End With
Set doc = ie.document
If InStr(doc.body.innerHTML, "li id=""result_0""") Then
Set liResults = doc.getElementById("result_0")
text = liResults.innerText
Cells(rowe, 4) = IIf(InStr(text, "STEELBOOK") Or InStr(text, "Steelbook") Or InStr(text, "Steel book"), "Y", "N")
Else
Cells(rowe, 4) = "NA"
If DebugMode Then
ie.Visible = True
Set ie = CreateObject("InternetExplorer.Application")
End If
End If
rowe = rowe + 1
Wend
ie.Quit
Set ie = Nothing
End Sub

Excel Amazon Seller Web Scraper Issue

I have been trying to get this code to work for workflow efficiency purposes, but I cannot seem to make it function correctly.
Steps:
1. Login to Amazon Seller
Use order numbers in column A and place them in searchbox to search
Search for element innerText of "Estimated Delivery:" and scrape information into column B adjacent the order number
Move onto the next order number and repeat process until order number column is empty.
The webpage code (what I'm trying to obtain is highlighted):
Option Explicit
Dim HTMLDoc As HTMLDocument
Dim MyBrowser As InternetExplorer
Sub MyAmazonSellereEDD()
Dim MyHTML_Element As IHTMLElement
Dim MyURL As String
Dim oSignInLink As HTMLLinkElement
Dim oInputEmail As HTMLInputElement
Dim oInputPassword As HTMLInputElement
Dim oInputSignInButton As HTMLInputButtonElement
'InputSearchOrder will be the destination for order numbers taken from the workbook
Dim InputSearchOrder As HTMLInputElement
Dim InputSearchButton As HTMLInputButtonElement
Dim IE As InternetExplorer
Dim AAOrder As Workbook
Dim AAws As Worksheet
Dim AAws2 As Worksheet
Dim R As Range
Dim x As Integer
Dim i As Long
Dim ar As Variant
Dim elems As IHTMLElementCollection
Dim TDelement As HTMLTableCell
Dim ExcludWords() As Variant, a As Range, b As Long, LR As Long
ExcludWords = Array("Estimated Delivery:")
MyURL = "https://sellercentral.amazon.com/gp/homepage.html"
Set IE = New InternetExplorer
' Open the browser and navigate.
With IE
.Silent = True
.navigate MyURL
.Visible = True
Do
DoEvents
Loop Until .readyState = READYSTATE_COMPLETE
End With
' Get the html document.
Set HTMLDoc = IE.document
With HTMLDoc
.all.Item("username").Value = "blankityblank#blank.net"
.all.Item("password").Value = "*********"
.all.Item("sign-in-button").Click
End With
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:08"))
'Set AAOrder = Application.Workbooks.Open("Z:\Automation Anywhere\5 Automated Tracking Imports\Amazon Prime\PrimeOrdersWithNoFulfillment.csv")
'Set AAws = AAOrder.Worksheets("PrimeOrdersWithNoFulfillment")
x = 2
'Do Until Range("A" & x) = ""
If Range("B" & x).Value = "" Then
'If AAws.Range("B" & x).Value = "" Then
'x = x + 1
Do Until Range("A" & x) = ""
Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field")
InputSearchOrder.Value = Range("A" & x)
Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0)
InputSearchButton.Click
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:05"))
Set elems = HTMLDoc.getElementsByTagName("td")
'ExcludWords = Array("Package Weight:", "Tracking ID:", "Ship Date:", "Carrier:", "Shipping Service:")
i = 2
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).Value = TDelement.innerText
i = i + 1
End If
Next
LR = Range("B" & Rows.Count).End(xlUp).Row
For i = 1 To LR
Set a = Cells(i, "B")
For b = 0 To UBound(ExcludWords)
a.Formula = Replace((a.Formula), ExcludWords(b), "")
Next b
Next i
'End If
x = x + 1
Loop
'Loop
End If
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
MsgBox ("Process is done! :)")
End Sub
My issue is that when it scrapes the data, the words "Estimated Delivery:" and the actual estimated delivery date it is supposed to scrape are separate, but still should be included in the output data in column B. What it's doing is finding and inserting only "Estimated Delivery:" and then using the code, it trims those characters as instructed. The space is still blank after that. I'm not sure what the issue is.
The TDelement you pick up in the following part of code only includes "Estimated Delivery:" in its innerText, the part with the date is actually a separate TDelement:
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).Value = TDelement.innerText
i = i + 1
End If
Next
As there is not any unique information in the html code (e.g. id, name etc.) to use to reference the TDelement that contains the date you could use the reference you already have in conjunction with NextSibling so that you get the element after the one that contains the text "Estimated Delivery:". Perhaps try this (unable to test anything at the moment but should work):
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).value = TDelement.NextSibling.innerText
i = i + 1
End If
Next

Excel VBA Macro: Scraping data from site table that spans multiple pages

Thanks in advance for the help. I'm running Windows 8.1, I have the latest IE / Chrome browsers, and the latest Excel. I'm trying to write an Excel Macro that pulls data from StackOverflow (https://stackoverflow.com/tags). Specifically, I'm trying to pull the date (that the macro is run), the tag names, the # of tags, and the brief description of what the tag is. I have it working for the first page of the table, but not for the rest (there are 1132 pages at the moment). Right now, it overwrites the data everytime I run the macro, and I'm not sure how to make it look for the next empty cell before running.. Lastly, I'm trying to make it run automatically once per week.
I'd much appreciate any help here. Problems are:
Pulling data from the web table beyond the first page
Making it scrape data to the next empty row rather than overwriting
Making the Macro run automatically once per week
Code (so far) is below. Thanks!
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'Cells.Clear
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A3").CurrentRegion.EntireColumn.AutoFit
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1:D1").Merge
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub
There's no need to scrape Stack Overflow when they make the underlying data available to you through things like the Data Explorer. Using this query in the Data Explorer should get you the results you need:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
The permalink to that query is here and the "Download CSV" option which appears after the query runs is probably the easiest way to get the data into Excel. If you wanted to automate that part of things, the direct link to the CSV download of results is here
You can improve this to parse out exact elements but it loops all the pages and grabs all the tag info (everything next to a tag)
Option Explicit
Public Sub ImportStackOverflowData()
Dim ie As New InternetExplorer, html As HTMLDocument
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "https://stackoverflow.com/tags"
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Dim numPages As Long, i As Long, info As Object, item As Object, counter As Long
numPages = html.querySelector(".page-numbers.dots ~ a").innerText
For i = 1 To 2 ' numPages ''<==1 to 2 for testing; use to numPages
DoEvents
Set info = html.getElementById("tags_list")
For Each item In info.getElementsByClassName("grid-layout--cell tag-cell")
counter = counter + 1
Cells(counter, 1) = item.innerText
Next item
html.querySelector(".page-numbers.next").Click
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Next i
Application.ScreenUpdating = True
.Quit '<== Remember to quit application
End With
End Sub
I'm not making use of the DOM, but I find it very easy to get around just searching between known tags. If ever the expressions you are looking for are too common just tweak the code a bit so that it looks for a string after a string).
An example:
Public Sub ZipLookUp()
Dim URL As String, xmlHTTP As Object, html As Object, htmlResponse As String
Dim SStr As String, EStr As String, EndS As Integer, StartS As Integer
Dim Zip4Digit As String
URL = "https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=1&companyName=&address1=1642+Harmon+Street&address2=&city=Berkeley&state=CA&urbanCode=&postalCode=&zip=94703"
Set xmlHTTP = CreateObject("MSXML2.XMLHTTP")
xmlHTTP.Open "GET", URL, False
On Error GoTo NoConnect
xmlHTTP.send
On Error GoTo 0
Set html = CreateObject("htmlfile")
htmlResponse = xmlHTTP.ResponseText
If htmlResponse = Null Then
MsgBox ("Aborted Run - HTML response was null")
Application.ScreenUpdating = True
GoTo End_Prog
End If
'Searching for a string within 2 strings
SStr = "<span class=""address1 range"">" ' first string
EStr = "</span><br />" ' second string
StartS = InStr(1, htmlResponse, SStr, vbTextCompare) + Len(SStr)
EndS = InStr(StartS, htmlResponse, EStr, vbTextCompare)
Zip4Digit = Left(Mid(htmlResponse, StartS, EndS - StartS), 4)
MsgBox Zip4Digit
GoTo End_Prog
NoConnect:
If Err = -2147467259 Or Err = -2146697211 Then MsgBox "Error - No Connection": GoTo End_Prog 'MsgBox Err & ": " & Error(Err)
End_Prog:
End Sub