Creating a facebook scraper

Creating a facebook scraper - vba

I tried to make a facebook bot to parse the profile links. However, it signs in and parses the content of left-sided bar inconsistently. I can't go further. Could anyone point me in the right direction so that i can rectify my mistakes I've made already in my code and parse the profile links. Here is the code:
strdata = "email=sth.com&pass=xxx"
http.Open "POST", "https://www.facebook.com/login.php?login_attempt=1&lwv=110", False
http.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
http.send strdata
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("_li")(0).getElementsByTagName("a")
For Each topic In topics
Cells(x, 1) = topic.innerText
x = x + 1
Next topic

Does this help?
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "https://www.facebook.com/login.php?login_attempt=1&lwv=110"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub

To get the content of facebook, either one has to deal with api or to use selenium which is definitely the worst way. However, my below script can log in to the page and parse some titles:
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "https://www.facebook.com/"
.FindElementById("email").SendKeys ("email_id")
.FindElementById("pass").SendKeys ("Pass_word")
.FindElementById("u_0_2").Click
.Wait 5
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("profileLink")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub

Related

How can i web scrape data outside of div tags

Im having trouble webscraping a value from https://next-episode.net/star-trek-picard. Im looking to pull the season number from the "Previous__episode" tag ID. The number seems to be in between 2 tags and I cant seem to find a way to reference it. Ive previously been pulling all of the inner text of the previous_episode tag and using regex functions to isolate the values that I want giving me the following
Previous Episode
Name:Broken Pieces
Date:
Thu Mar 12, 2020
Season:
1
Episode:8
Summary:Episode Summary
(the above has no lines in between them in excel. The formatting in the question box is placing them all on one line for some reason)
Before tonight the number 1 was in the same line as "Season:" allowing my regex function to work.
Or possibly a regex patter to allow me to get the 1 on the next line after "season:"
Dim XML_05 As New MSXML2.XMLHTTP60
Dim HTML_05 As New MSHTML.HTMLDocument
XML_05.Open "GET", Cells(Row, NextEpisodeURL).Value, False
XML_05.send
HTML_05.body.innerHTML = XML_05.responseText
Dim NETC_05 As MSHTML.IHTMLElementCollection
Dim NET_05 As MSHTML.IHTMLElement
Dim REC_05 As MSHTML.IHTMLElement
Dim CEC_05 As MSHTML.IHTMLElementCollection
Dim CE_05 As MSHTML.IHTMLElement
Dim REO_05 As VBScript_RegExp_55.RegExp
Dim MO_05 As Object
Dim SN_05() As String
Dim ENA_05() As String
Dim EN_05() As String
Dim LatestEpisodeName As String
Set NET_05 = HTML_05.getElementById("previous_episode")
Set REO_05 = New VBScript_RegExp_55.RegExp
REO_05.Global = True
REO_05.IgnoreCase = True
REO_05.Pattern = "(Name:(.*))"
Set MO_05 = REO_05.Execute(NET_05.innerText)
Debug.Print MO_05.Count
Debug.Print MO_05(0).Value
ENA_05 = Split(MO_05(0), ":")
Debug.Print ENA_05(1)
LatestEpisodeName = ENA_05(1)
REO_05.Pattern = "(Episode:([0-9]*))"
Set MO_05 = REO_05.Execute(NET_05.innerText)
Debug.Print MO_05.Count
Debug.Print MO_05(0).Value
EN_05 = Split(MO_05(0), ":")
Debug.Print EN_05(1)
Cells(Row, EpisodeNet).Value = EN_05(1)
REO_05.Pattern = "(Season:\s+([0-9]*))"
Set MO_05 = REO_05.Execute(NET_05.innerText)
Debug.Print MO_05.Count
Debug.Print MO_05(5).Value
SN_05 = Split(MO_05(0), ":")
Debug.Print SN_05(1)
Trim (SN_05(1))
Cells(Row, SeasonNet).Value = SN_05(1)
Set NETC_05 = HTML_05.getElementById("next_episode").Children
Cells(Row, CountDown).Value = NETC_05(5).innerText
Debug.Print NETC_05(5).innerText

I suppose this is something that might help you get the required fields you wish to grab:
Sub FetchData()
Const Url$ = "https://next-episode.net/star-trek-picard"
Dim HTML As New HTMLDocument, post As Object
Dim rxp As New RegExp, R&
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
HTML.body.innerHTML = .responseText
End With
Set post = HTML.getElementById("previous_episode")
With rxp
.pattern = "(Name:(.*))"
If .Execute(post.innerText).Count > 0 Then
R = R + 1: Cells(R, 1) = .Execute(post.innerText)(0).SubMatches(0)
End If
.pattern = "(Season:\s*([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
Cells(R, 2) = .Execute(post.innerText)(0).SubMatches(0)
End If
.pattern = "(Episode:([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
Cells(R, 3) = .Execute(post.innerText)(0).SubMatches(0)
End If
End With
End Sub

Const Url$ = "https://next-episode.net/star-trek-picard"
Dim HTML As New HTMLDocument, post As Object
Dim rxp As New RegExp, R&
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
HTML.body.innerHTML = .responseText
End With
Set post = HTML.getElementById("previous_episode")
With rxp
.pattern = "(Name:(.*))"
If .Execute(post.innerText).Count > 0 Then
R = R + 1: Cells(R, 1) = .Execute(post.innerText)(0).SubMatches(0)
End If
.pattern = "(Season:\s+([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
CleanString = Application.WorksheetFunction.Clean(.Execute(post.innerText)(0).SubMatches(0))
SeasonNumber = Split(CleanString, ":")
Cells(R, 2) = SeasonNumber(1)
End If
.pattern = "(Episode:([0-9]*))"
If .Execute(post.innerText).Count > 0 Then
Cells(R, 3) = .Execute(post.innerText)(0).SubMatches(0)
End If
End With
End Sub

It is cleaner and faster to use the DOM to your advantage. The nodes you want are NextSiblings to the items matched by the nice and fast css selector #previous_episode .subheadline. You get a returned nodeList which are the left hand side elements e.g. "Name", "Date" etc... Simply, navigate with NextSibling to jump to the right hand side elements e.g. "Et in Arcadia Ego, Part 1".
The returned nodeList starts at 0 and you can index in to get individual items. Note that the second item you want is a textNode and therefore you must use a NodeValue property rather than innerText.
This is faster and mpre correct in terms of dealing with HTML parsing.
Option Explicit
Public Sub GetPriorEpisodeInfp()
Const URL = "https://next-episode.net/star-trek-picard"
Dim html As New mshtml.HTMLDocument, previousEpisodeItems As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Set previousEpisodeItems = html.querySelectorAll("#previous_episode .subheadline")
With ActiveSheet
.Cells(1, 1) = previousEpisodeItems.item(0).NextSibling.innerText
.Cells(1, 2) = previousEpisodeItems.item(2).NextSibling.NodeValue 'textNode
.Cells(1, 3) = previousEpisodeItems.item(3).NextSibling.innerText
End With
End Sub

Type Miss Match Error in Excel VBA Code

I am working on a website data extractor. I have two worksheets one for input and other for output, which looks like this..
In the first sheet the cell contains the URL needed to extract data. I am trying this URL
https://www.propertyguru.com.sg/singapore-property-listing/property-for-sale?limit=30&market=residential&property_type_code%5B%5D=4S&property_type=H&freetext=Yishun
I have written this macro..
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim str, e As String
Dim pgf, pgt, pg As Integer
Dim ele, Results As Object
Dim add, size, cno, price, inurl, sp, sp1 As String
Dim isheet, rts As Worksheet
Dim LastRow As Long
Dim pgno As Variant
Set IE = CreateObject("InternetExplorer.Application")
Set isheet = Worksheets("InputSheet")
Set rts = Worksheets("Results")
URL = isheet.Cells(3, 2)
RowCount = 1
rts.Range("A" & RowCount) = "Address"
rts.Range("B" & RowCount) = "Size"
rts.Range("C" & RowCount) = "Contact Number"
rts.Range("D" & RowCount) = "Price"
rts.Range("E" & RowCount) = "Url"
LastRow = rts.Cells(Rows.Count, 2).End(xlUp).Row
'RowCount = LastRow
With IE
.Visible = True
.navigate (URL)
DoEvents
Do While IE.Busy Or IE.readyState <> 4
Loop
'Application.Wait (Now + #12:00:05 AM#)
For Each Results In .document.all
Select Case Results.className
Case "title search-title"
str = Results.innerText
str1 = Split(str, " ")
str = CInt(str1(0))
End Select
If Results.className = "btn btn-main-inverted dropdown-toggle" And InStr(1, Results.Title, " page") > 2 Then
str2 = Results.Title
str1 = Split(str2, " ")
str2 = CInt(str1(0))
End If
Next
If str2 = 0 Then
pgno = CVErr(xlErrDiv0)
Else
pgno = WorksheetFunction.RoundUp(str / str2, 0)
End If
End With
IE.Quit
Set IE = Nothing
UrlS = Split(URL, "?")
Url1 = UrlS(0)
Url2 = "?" & UrlS(1)
For i = 1 To pgno
Set IE = CreateObject("InternetExplorer.Application")
URL = Url1 & "/" & i & Url2
With IE
.Visible = True
.navigate (URL)
DoEvents
Do While IE.Busy Or IE.readyState <> 4
Loop
'Application.Wait (Now + #12:00:08 AM#)
For Each ele In .document.all
Select Case ele.className
Case "listing-img-a"
inurl = ele.href
rts.Cells(LastRow + 1, 5) = inurl
Case "listing-location"
LastRow = LastRow + 1
add = ele.innerText
rts.Cells(LastRow, 1) = add
Case "lst-sizes"
sp = Split(ele.innerText, " ·")
size = sp(0)
rts.Cells(LastRow, 2) = size
Case "pgicon pgicon-phone js-agent-phone-number" ' btn-tools" 'pgicon pgicon-phone js-agent-phone-number" 'agent-phone-number"
rts.Cells(LastRow, 3) = ele.innerText
Case "listing-price"
price = ele.innerText
rts.Cells(LastRow, 4) = price
End Select
Next
LastRow = rts.Cells(Rows.Count, 2).End(xlUp).Row
rts.Activate
rts.Range("A" & LastRow).Select
End With
IE.Quit
Set IE = Nothing
Application.Wait (Now + #12:00:04 AM#)
Next i
MsgBox "Success"
End Sub
When I run this macro I am getting the error
Type Miss Match
When I debug it highlights the code
For i = 1 To pgno
Set IE = CreateObject("InternetExplorer.Application") URL = Url1 & "/" & i & Url2
With IE .Visible = True .navigate (URL)
I have tried my best to figure it out but could not understand where the problem is. Please help me to make correction..
It is also not getting the whole records on the link. This link contains more than 200 Records as per page is 30 records.

You can rely on implicit conversion and use the following. Assuming all pages do have numbering. You might want to improve error handling. I default to page numbers = 1 if the penultimate li CSS selector fails, otherwise it attempts to get the last page number before the ">"
Refer to my prior answer to your related question which shows you how to more effiently scrape the info off the page.
Sample code to show function being used:
Option Explicit
Public Sub GetListings()
Dim IE As New InternetExplorer, pgno As Long
With IE
.Visible = True
.navigate "https://www.propertyguru.com.sg/singapore-property-listing/property-for-sale?limit=30&market=residential&property_type_code%5B%5D=4S&property_type=H&freetext=Yishun", False
While .Busy Or .readyState < 4: DoEvents: Wend
pgno = GetNumberOfPages(.document)
End With
End Sub
Public Function GetNumberOfPages(ByVal doc As HTMLDocument) As Long
On Error GoTo errhand:
GetNumberOfPages = doc.querySelector(".listing-pagination li:nth-last-child(2)").innerText
Exit Function
errhand:
If Err.Number <> 0 Then GetNumberOfPages = 1
End Function

How To Scrape data from web when source file doesn't have the content of web page

I tried to scrape data from here, but problem I am facing is that its source code doesn't contain the content which are available on web page. I believe its scripted.
How do I get it? I got the suggestion that to use selenium? Any other suggestion I could get from you guys it would be very helpful. Thanks.
With xhr
.Open "GET", URL, False
.send
If .readyState = 4 And .Status = 200 Then
Set internetdata = New MSHTML.HTMLDocument
internetdata.body.innerHTML = .responseText
htmlT = internetdata.body.outerHTML
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
from this code (string) I am trying to get all the text available on web page. But not getting all the content here.

Try this. It should fetch you all the description of each product:
Sub Web_Data()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim topic As Object
With IE
.Visible = True
.navigate "http://www.webcollage.net/MainApp/preview-ppp?module=dellbtoc&site=epartner&wcpc=1512144817149&view=live&rcpName=Webcollage"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set html = .document
End With
Application.Wait Now + TimeValue("00:00:05") ''if you haven't found your data already, just increase the time
For Each topic In html.getElementsByClassName("wc-rich-content-description")
r = r + 1: Cells(r, 1) = topic.innerText
Next topic
IE.Quit
End Sub

Extracting file URL from a Hyperlinked Image

Sub DownloadFile()
Dim myURL As String
myURL = "http://data.bls.gov/timeseries/LNS14000000"
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False, "username", "password"
WinHttpReq.send
myURL = WinHttpReq.responseBody
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile "C:\Downloads\abc.xlsx", 2
oStream.Close
End If
End Sub
I am trying to download data using VBA and found this code running pretty well. The webpage URL from which I am trying to download data is the one I have used in the code. Please take a moment and open the webpage as the Excel file I am trying to download is linked in an image and so I am not able to find the URL to download the file from that image. Please advice. Thanks.

You might be able to hit the form target directly with a POST (action="/pdq/SurveyOutputServlet") but it is expecting a post string of the <input> elements together with their values. Most if not all of these input elements have been filled out for you simply by going to that page. All you need to do is collect and concatenate them into a post string to be shoved back at the form.
Option Explicit
'base web page
Public Const csBLSGOVpg = "http://data.bls.gov/timeseries/LNS14000000"
'form's action target
Public Const csXLSDLpg = "http://data.bls.gov/pdq/SurveyOutputServlet"
Sub mcr_Stream_Buyer_Documents()
Dim xmlDL As New MSXML2.ServerXMLHTTP60, xmlBDY As New HTMLDocument, adoFILE As Object
Dim xmlSend As String, strFN As String, f As Long, i As Long
With xmlDL
.SetTimeouts 5000, 5000, 15000, 25000
'start by going to the base web page
.Open "GET", csBLSGOVpg, False
.setRequestHeader "Content-Type", "text/javascript"
.send
If .Status <> "200" Then GoTo bm_Exit
'get the source HTML for examination; zero the post string var
xmlBDY.body.innerHTML = .responseText
xmlSend = vbNullString
'loop through the forms until you find the right one
'then loop through the input elements and construct a post string
For f = 0 To xmlBDY.getElementsByTagName("form").Length - 1
If xmlBDY.getElementsByTagName("form")(f).Name = "excel" Then
With xmlBDY.getElementsByTagName("form")(f)
For i = 0 To .getElementsByTagName("input").Length - 1
xmlSend = xmlSend & Chr(38) & _
.getElementsByTagName("input")(i).Name & Chr(61) & _
.getElementsByTagName("input")(i).Value
Next i
xmlSend = "?.x=5&.y=5" & xmlSend
End With
Exit For
End If
Next f
'Debug.Print xmlSend 'check the POST string
'send the POST string back to the form's action target
.Open "POST", csXLSDLpg, False
xmlDL.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
xmlDL.send xmlSend
If xmlDL.Status <> "200" Then GoTo bm_Exit
'pick up the response as a stream and save it as a .XLSX
strFN = Environ("USERPROFILE") & "\Documents\LNS14000000" & Format(Date, "yyyymmdd") & ".xlsx"
On Error Resume Next
Kill strFN
On Error GoTo 0
Set adoFILE = CreateObject("ADODB.Stream")
adoFILE.Type = 1
adoFILE.Open
adoFILE.Write .responseBody
adoFILE.SaveToFile strFN, 2
Set adoFILE = Nothing
End With
Set xmlBDY = Nothing
Set xmlDL = Nothing
Exit Sub
bm_Exit:
Debug.Print Err.Number & ":" & Err.Description
End Sub
This is pretty minimalist but it is all that you need. There is at least one non-standard input element that does not have a name but I elected to send its value back anyway. I did not sequentially remove things until it broke; I just built the POST string given what I retrieved and sent it back.
     
               LNS1400000020150916.xlsx
You will probably be moving this code to some sort of loop. Adjust the receiving file name accordingly. Each new page should adjust its own form input elements accordingly.

Once response is stored in an HTMLDocument object you can use a CSS selector of
#download_xlsx
The "#" means id.
You can then click on this element
htmlDocument.querySelector("#download_xlsx").Click
VBA:
Option Explicit
Public Sub DownloadFile()
Dim ie As New InternetExplorer
With ie
.Visible = True
.navigate "https://data.bls.gov/timeseries/LNS14000000"
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("#download_xlsx").Click
.Quit
End With
End Sub
Other:
You could even target the form and submit:
.document.forms("excel").submit
This triggers the POST request mentioned in the other answer (which is an awesome answer btw).

Excel VBA Macro: Scraping data from site table that spans multiple pages

Thanks in advance for the help. I'm running Windows 8.1, I have the latest IE / Chrome browsers, and the latest Excel. I'm trying to write an Excel Macro that pulls data from StackOverflow (https://stackoverflow.com/tags). Specifically, I'm trying to pull the date (that the macro is run), the tag names, the # of tags, and the brief description of what the tag is. I have it working for the first page of the table, but not for the rest (there are 1132 pages at the moment). Right now, it overwrites the data everytime I run the macro, and I'm not sure how to make it look for the next empty cell before running.. Lastly, I'm trying to make it run automatically once per week.
I'd much appreciate any help here. Problems are:
Pulling data from the web table beyond the first page
Making it scrape data to the next empty row rather than overwriting
Making the Macro run automatically once per week
Code (so far) is below. Thanks!
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'Cells.Clear
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A3").CurrentRegion.EntireColumn.AutoFit
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1:D1").Merge
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub

There's no need to scrape Stack Overflow when they make the underlying data available to you through things like the Data Explorer. Using this query in the Data Explorer should get you the results you need:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
The permalink to that query is here and the "Download CSV" option which appears after the query runs is probably the easiest way to get the data into Excel. If you wanted to automate that part of things, the direct link to the CSV download of results is here

You can improve this to parse out exact elements but it loops all the pages and grabs all the tag info (everything next to a tag)
Option Explicit
Public Sub ImportStackOverflowData()
Dim ie As New InternetExplorer, html As HTMLDocument
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "https://stackoverflow.com/tags"
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Dim numPages As Long, i As Long, info As Object, item As Object, counter As Long
numPages = html.querySelector(".page-numbers.dots ~ a").innerText
For i = 1 To 2 ' numPages ''<==1 to 2 for testing; use to numPages
DoEvents
Set info = html.getElementById("tags_list")
For Each item In info.getElementsByClassName("grid-layout--cell tag-cell")
counter = counter + 1
Cells(counter, 1) = item.innerText
Next item
html.querySelector(".page-numbers.next").Click
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Next i
Application.ScreenUpdating = True
.Quit '<== Remember to quit application
End With
End Sub

I'm not making use of the DOM, but I find it very easy to get around just searching between known tags. If ever the expressions you are looking for are too common just tweak the code a bit so that it looks for a string after a string).
An example:
Public Sub ZipLookUp()
Dim URL As String, xmlHTTP As Object, html As Object, htmlResponse As String
Dim SStr As String, EStr As String, EndS As Integer, StartS As Integer
Dim Zip4Digit As String
URL = "https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=1&companyName=&address1=1642+Harmon+Street&address2=&city=Berkeley&state=CA&urbanCode=&postalCode=&zip=94703"
Set xmlHTTP = CreateObject("MSXML2.XMLHTTP")
xmlHTTP.Open "GET", URL, False
On Error GoTo NoConnect
xmlHTTP.send
On Error GoTo 0
Set html = CreateObject("htmlfile")
htmlResponse = xmlHTTP.ResponseText
If htmlResponse = Null Then
MsgBox ("Aborted Run - HTML response was null")
Application.ScreenUpdating = True
GoTo End_Prog
End If
'Searching for a string within 2 strings
SStr = "<span class=""address1 range"">" ' first string
EStr = "</span><br />" ' second string
StartS = InStr(1, htmlResponse, SStr, vbTextCompare) + Len(SStr)
EndS = InStr(StartS, htmlResponse, EStr, vbTextCompare)
Zip4Digit = Left(Mid(htmlResponse, StartS, EndS - StartS), 4)
MsgBox Zip4Digit
GoTo End_Prog
NoConnect:
If Err = -2147467259 Or Err = -2146697211 Then MsgBox "Error - No Connection": GoTo End_Prog 'MsgBox Err & ": " & Error(Err)
End_Prog:
End Sub

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Creating a facebook scraper - vba

Related

How can i web scrape data outside of div tags

Type Miss Match Error in Excel VBA Code

How To Scrape data from web when source file doesn't have the content of web page

Extracting file URL from a Hyperlinked Image

Excel VBA Macro: Scraping data from site table that spans multiple pages

Categories

Resources