How do I scrape data from property tax website - vba

I want to scrape a few things from this page http://bexar.trueautomation.com/clientdb/Property.aspx?cid=110&prop_id=1229623
I want the Property ID: under the Property table
Sqft and Market Value under the Land Table
Impprovement #1 under Improvement / Building
and running into a few issues
I can't find any combo of elementID or tag/class name to pull this info
Same as above
I only want to pull the first item in the list and adding (0) to multiple spots in my code below isn't working to make this happen
I was thinking the best way to do this is make a sub ProcessHTMLPage ProcessHTMLPage2 and ProcessHTMLPage3 that does all these and then I can work on the formatting to get them into the appropriate columns as needed
Sub GetHTMLDocumentXML()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLPage As MSHTML.HTMLDocument
Dim URL As String
Dim HTMLDiv As MSHTML.IHTMLElement
Dim HTMLTable As MSHTML.IHTMLElement
XMLPage.Open "GET", "http://bexar.trueautomation.com/clientdb/Property.aspx?cid=110&prop_id=1229623", False
XMLPage.send
If XMLPage.Status <> 200 Then
MsgBox XMLPage.Status & " - " & XMLPage.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLPage.responseText
ProcessHTMLPage2 HTMLDoc
End Sub
Sub ProcessHTMLPage2(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLTables As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim RowNum As Long, ColNum As Integer
Set HTMLTables = HTMLPage.getElementsByClassName("improvements")
Cells.Clear
For Each HTMLTable In HTMLTables
Debug.Print HTMLTable.className
RowNum = RowNum + 1
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
Debug.Print vbTab & HTMLRow.innerText
ColNum = 1
For Each HTMLCell In HTMLRow.Children
Debug.Print vbTab & HTMLCell.innerText
Cells(RowNum, ColNum) = HTMLCell.innerText
ColNum = ColNum + 1
Next HTMLCell
Next HTMLRow
Next HTMLTable
Range("A1").Select
ActiveCell.CurrentRegion.EntireColumn.AutoFit
End Sub

Try the following to get Property ID,Sqft,Market Value from that webpage. I had to use hardcoded index to locate the last two elements as I could not find any specific marker.
Public Sub FetchInfo()
Const Url$ = "http://bexar.trueautomation.com/clientdb/Property.aspx?cid=110&prop_id=1229623"
Dim S$, oItem As Object
Dim propertyId$, Sqft$, marketValue$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
.send
S = .responseText
End With
With CreateObject("HTMLFile")
.write S
For Each oItem In .getElementsByTagName("td")
If InStr(oItem.innerText, "Property ID:") > 0 Then
propertyId = oItem.NextSibling.innerText
Exit For
End If
Next oItem
Sqft = .getElementById("landDetails").getElementsByTagName("td")(4).innerText
marketValue = .getElementById("landDetails").getElementsByTagName("td")(7).innerText
Debug.Print propertyId, Sqft, marketValue
End With
End Sub

Related

Vba getting rid of table while web scraping

Can anyone help for a step further?
What I did here, I took from the website given inside code block the tablename by getElementById, tagName although there is only class of div "data". Then I will put all of the data - only currency rates and date to the excel cells on a worksheet. But It gives me also the calendar days, I want to get rid of calendar days which is shown in Debug.Print mode. But cannot find the right tag name of calendar for excluding it from the code. For now, I only need,assistance for getting rid of calendar days; the code is below
Sub gettingTablesfromCBR()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDOC As MSHTML.HTMLDocument
Dim HTMLTABLES As MSHTML.IHTMLElementCollection
Dim HTMLTABLE As MSHTML.IHTMLElement
Dim HTMLDIV As MSHTML.IHTMLElement
Dim TableSection As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
Dim TableCell As MSHTML.IHTMLElement
IE.Visible = True
IE.navigate "https://www.cbr.ru/eng/currency_base/dynamics/?UniDbQuery.Posted=True&UniDbQuery.so=1&UniDbQuery.mode=1&UniDbQuery.date_req1=&UniDbQuery.date_req2=&UniDbQuery.VAL_NM_RQ=R01100&UniDbQuery.From=07.08.2021&UniDbQuery.To=05.11.2021"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDOC = IE.Document
Set HTMLDIV = HTMLDOC.getElementById("data")
Set HTMLTABLES = HTMLDOC.getElementsByTagName("table")
Dim RowText As String
For Each HTMLTABLE In HTMLTABLES
For Each TableSection In HTMLTABLE.Children
For Each TableRow In TableSection.Children
RowText = ""
For Each TableCell In TableRow.Children
RowText = RowText & vbTab & TableCell.innerText
Next TableCell
Debug.Print , , RowText
Next TableRow
Next TableSection
Next HTMLTABLE
End Sub
Updated:
You can add a counter to your innermost For loop that is reset to 1 before that loop starts. Then you can test this counter to see if it's the first cell of the row and skip it.
Example of this with some small tweaks to your code to just target the table (that has a classname of data):
Sub gettingTablesfromCBR()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDOC As MSHTML.HTMLDocument
Dim HTMLTABLES As MSHTML.IHTMLElementCollection
Dim HTMLTABLE As MSHTML.IHTMLElement
Dim HTMLDIV As MSHTML.IHTMLElement
Dim TableSection As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
Dim TableCell As MSHTML.IHTMLElement
Dim Unit As String
Dim Rate As String
Dim TableCellCount As Integer
IE.Visible = True
IE.navigate "https://www.cbr.ru/eng/currency_base/dynamics/?UniDbQuery.Posted=True&UniDbQuery.so=1&UniDbQuery.mode=1&UniDbQuery.date_req1=&UniDbQuery.date_req2=&UniDbQuery.VAL_NM_RQ=R01100&UniDbQuery.From=07.08.2021&UniDbQuery.To=05.11.2021"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDOC = IE.Document
Set HTMLTABLES = HTMLDOC.getElementsByClassName("data")
For Each HTMLTABLE In HTMLTABLES 'Table
For Each TableSection In HTMLTABLE.Children 'Body
For Each TableRow In TableSection.Children
RowText = ""
TableCellCount = 1
For Each TableCell In TableRow.Children
If TableCellCount = 2 Then Unit = TableCell.innerText
If TableCellCount = 3 Then Rate = TableCell.innerText
TableCellCount = TableCellCount + 1
Next TableCell
Debug.Print Unit, Rate
Next TableRow
Next TableSection
Next
End Sub
With help of #JNevill, I solved and improved the code. Now if anyone interested can easily put the numbers as a date then the result of rates will be put in a table form in xl sheet.
Sub gettingTablesfromCBR2()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDOC As MSHTML.HTMLDocument
Dim HTMLTABLES As MSHTML.IHTMLElementCollection
Dim HTMLTABLE As MSHTML.IHTMLElement
Dim HTMLDIV As MSHTML.IHTMLElement
Dim TableSection As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
Dim TableCell As MSHTML.IHTMLElement
Dim Unit As String
Dim Rate, DateR As String
Dim TableCellCount, RowNum As Integer
Dim inputdate_from, inputdate_to, inputcurr_val As Variant
inputdate_from = ThisWorkbook.Worksheets("PrimoPagina").Range("A3").Value
inputdate_to = ThisWorkbook.Worksheets("PrimoPagina").Range("E3").Value
inputcurr_val = ThisWorkbook.Worksheets("PrimoPagina").Range("B1").Value
If inputcurr_val = "BLG" Then
inputcurr_val = "R01100"
Else
inputcurr_val = "R01239" 'EUR
End If
IE.Visible = True
IE.navigate "https://www.cbr.ru/eng/currency_base/dynamics/?UniDbQuery.Posted=True&UniDbQuery.so=1&UniDbQuery.mode=1&UniDbQuery.date_req1=&UniDbQuery.date_req2=&UniDbQuery.VAL_NM_RQ=" & inputcurr_val & "&UniDbQuery.From=" & inputdate_from & "&UniDbQuery.To=" & inputdate_to
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDOC = IE.Document
Set HTMLTABLES = HTMLDOC.getElementsByClassName("data")
For Each HTMLTABLE In HTMLTABLES 'Table
For Each TableSection In HTMLTABLE.Children 'Body
For Each TableRow In TableSection.Children
RowText = ""
TableCellCount = 1
For Each TableCell In TableRow.Children
If TableCellCount = 1 Then DateR = TableCell.innerText
If TableCellCount = 2 Then Unit = TableCell.innerText
If TableCellCount = 3 Then Rate = TableCell.innerText
TableCellCount = TableCellCount + 1
Next TableCell
Debug.Print Unit, Rate
RowNum = RowNum + 1
ThisWorkbook.Worksheets("Results").Cells(RowNum, 3).Value = Unit
ThisWorkbook.Worksheets("Results").Cells(RowNum, 2).Value = Rate
ThisWorkbook.Worksheets("Results").Cells(RowNum, 1).Value = DateR
Next TableRow
Next TableSection
Next
End Sub

Web-scrape data without class name or ID

I'm trying to make access login in site and get some data from it
that is my code :
Private Sub Command4_Click()
Dim i As SHDocVw.InternetExplorer
Dim ht As HTMLDocument
Set i = New InternetExplorer
i.Visible = True
i.navigate ("https://billing.te.eg/en-US")
Do While i.ReadyState <> READYSTATE_COMPLETE
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = i.Document
idoc.all.TxtAreaCode.Value = "45"
idoc.all.TxtPhoneNumber.Value = "03824149"
Dim ele As MSHTML.IHTMLElement
Dim eles As MSHTML.IHTMLElementCollection
Set eles = idoc.getElementsByClassName("btn")
For Each ele In eles
If ele.Type = "button" Then
ele.Click
Else
End If
Next ele
Do While i.ReadyState <> READYSTATE_COMPLETE
Loop
If i.ReadyState = READYSTATE_COMPLETE Then
Dim Doc As HTMLDocument
Set Doc = i.Document
Dim sdd As String
sdd = Trim(Doc.getElementsByClassName("col-md-12").innerText)
MsgBox sdd
Else: End If
End Sub
and this is the part that i need to get data , I need to know the idea of how to get data which doesn't have a class name or id such like that
Try complying with the following approach. It is way faster than IE.
Sub FetchData()
Const Url$ = "https://billing.te.eg/api/Account/Inquiry"
Dim S$, elem As Object, payload As Variant
Dim phone$, areaCode$, counter&
counter = 1
areaCode = "45" 'put areacode here
phone = "03824149" 'put phone number here
payload = "AreaCode=" & areaCode & "&PhoneNumber=" & phone & "&PinCode=&InquiryBy=telephone&AccountNo="
Do
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"
.setRequestHeader "Referer", "https://billing.te.eg/en-US"
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send payload
S = .responseText
End With
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = "TotalAmount"":(.*?),"
Set elem = .Execute(S)
If elem.Count > 0 Then
MsgBox elem(0).SubMatches(0)
Exit Do
End If
End With
counter = counter + 1
If counter = 3 Then Exit Do
Loop
End Sub

VBA: Subscript out of range or Type Mismatch

Very new to VBA, and am really stuck. Below is my code, you'll see near the end my For loop for Des and DesArr. All I'm trying to do with that loop is pull a column of cells from the work sheet "SIC", which is Sheet2 in my Workbook, I either get the error "Subscript out of Range" or "Type Mismatch" and whenever I try and google/correct for one, the other error takes its place. If anyone can help me work through this I'd greatly appreciate it!
Public Sub getGoogleDescriptions(strSearch As String)
Dim URL As String, strResponse As String
Dim objHTTP As Object
Dim htmlDoc As HTMLDocument
Dim result As String
Dim i As Integer
Dim u As Integer
Dim resultArr As Variant
Dim Des As String
Dim DesArr(2 To 48) As Long
Set htmlDoc = CreateObject("htmlfile")
'Set htmlDoc = New HTMLDocument
Dim objResults As Object
Dim objResult As Object
strSearch = Replace(strSearch, " ", "+")
URL = "https://www.google.com/search?q=" & strSearch
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
With objHTTP
.Open "GET", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
htmlDoc.body.innerHTML = .responseText
End With
Set objResults = htmlDoc.getElementsByClassName("st")
Debug.Print objResults(0).innerText
result = CStr(objResults(0).innerText)
resultArr = Split(result, " ", -1, 0)
For i = LBound(resultArr) To UBound(resultArr) 'Define i to be the length of the List'
Debug.Print i, resultArr(i) 'Prints the corresponding index value and array element'
Next i 'repeat
Set htmlDoc = Nothing
Set objResults = Nothing
Set objHTTP = Nothing
Set wk = ActiveWorkbook
For u = 2 To 48
Des = Sheets("SIC").Range("C" & u).Value
DesArr(u) = Des
Next u
Debug.Print DesArr(2)
End Sub
You're getting type mismatch because it's expecting DesArr to be a long data type which is a number between -2,147,483,648 to 2,147,483,647.
In it's use within the subroutine, it's used as a variant. So 2 corrections - change it to a variant as shown below
Then just adjust your 2 to 48 to within your statement... in this case it's a simple offset of 2, so just use (u - 2) and your Variant length is 47 starting at 0 instead of 1.
Public Sub getGoogleDescriptions(strSearch As String)
Dim URL As String, strResponse As String
Dim objHTTP As Object
Dim htmlDoc As HTMLDocument
Dim result As String
Dim i As Integer
Dim u As Integer
Dim resultArr As Variant
Dim Des As String
Dim DesArr(0) : ReDim DesArr(46)
Set htmlDoc = CreateObject("htmlfile")
'Set htmlDoc = New HTMLDocument
Dim objResults As Object
Dim objResult As Object
strSearch = Replace(strSearch, " ", "+")
URL = "https://www.google.com/search?q=" & strSearch
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
With objHTTP
.Open "GET", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
htmlDoc.body.innerHTML = .responseText
End With
Set objResults = htmlDoc.getElementsByClassName("st")
Debug.Print objResults(0).innerText
result = CStr(objResults(0).innerText)
resultArr = Split(result, " ", -1, 0)
For i = LBound(resultArr) To UBound(resultArr) 'Define i to be the length of the List'
Debug.Print i, resultArr(i) 'Prints the corresponding index value and array element'
Next i 'repeat
Set htmlDoc = Nothing
Set objResults = Nothing
Set objHTTP = Nothing
Set wk = ActiveWorkbook
For u = 2 To 48
Des = Sheets("SIC").Range("C" & u).Value
DesArr(u - 2) = Des
Next u
Debug.Print DesArr(0)
End Sub

Open website, find specific value and return value to Excel in VBA

I would like to use VBA to open a website, look for a certain paragraph in the HTML code of this website (<p class="myClass">XYZ</p>) and return this value to Excel, in my example "XYZ".
The website has only one paragraph (p element) with the above class.
I know that this is possible but don't know where to start here.
My code:
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Navigate varUrl
Do While .Busy
Application.Wait Now + TimeValue("0:00:01")
Loop
.Visible = True
End With
Instead of opening IE, use a web request:
Set oRequest = New WinHttp.WinHttpRequest
With oRequest
.Open "GET", sUrl, True
.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.Send "{range:9129370}"
.WaitForResponse
Set index = .ResponseText.IndexOf("<p class=""myClass"">")
Set text = .ResponseText.Substring(index,3)
Cells(row, col).Value = text
End With
If you don't know the length of the string you are looking for, you could also do a loop after index until you hit a "<" character.
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Navigate varUrl
Do While .Busy
Application.Wait Now + TimeValue("0:00:01")
Loop
.Visible = True
End With
'HTML document
Dim doc As Object
Set doc = objIE.document
Dim el As Object
Dim myText as string
For Each el In doc.GetElementsByClassName("myClass")
'put paragrah text in cell A1
Cells(1, 1).Value = el.innerText
'put your paragraph text in a variable string
myText = el.innerText
Next el
That is a tricky and interesting question. Let's say that you want to obtain the title of this current website, which is in class question-hyperlink within StackOverflow. Thus, using the idea of the solution of #Matt Spinks you may come up with something like this:
Option Explicit
Public Sub TestMe()
Dim oRequest As Object
Dim strOb As String
Dim strInfo As String: strInfo = "class=""question-hyperlink"">"
Dim lngStart As Long
Dim lngEnd As Long
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
With oRequest
.Open "GET", "http://stackoverflow.com/questions/42254051/vba-open-website-find-specific-value-and-return-value-to-excel#42254254", True
.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.Send "{range:9129370}"
.WaitForResponse
strOb = .ResponseText
End With
lngStart = InStr(1, strOb, strInfo)
lngEnd = InStr(lngStart, strOb, "<")
Debug.Print Mid(strOb, lngStart + Len(strInfo), lngEnd - lngStart - Len(strInfo))
End Sub
Instead of Debug.print you may get the Title in a string and work further.

Run time error 91, object variable or with block variable not set

My intention is to scrape all the app name from that page and the app link leading to the next page. However, when i run it, i see that after looping once it produces the following error "Run time error 91, object variable or with block variable not set".Here is the full code. Any help would really be appreciated. Thanks in advance.
Sub app_crawler()
Dim xmlpage As New XMLHTTP60, htmldoc As New HTMLDocument
Dim htmlas As Object, htmla As Object, sstr As String
xmlpage.Open "GET", "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8", False
xmlpage.send
htmldoc.body.innerHTML = xmlpage.responseText
For Each htmlas In htmldoc.getElementsByClassName("lockup-info")(0).getElementsByTagName("a")
sstr = htmlas.href
xmlpage.Open "GET", sstr, False
xmlpage.send
htmldoc.body.innerHTML = xmlpage.responseText
For Each htmla In htmldoc.getElementsByClassName("intro")(1).getElementsByTagName("h1")
x = x + 1: Cells(x, 1) = htmla.innerText
Next htmla
Next htmlas
End Sub
As Mat's Mug commented, you'll have to test if htmlas(x) returns Nothing before getting elements from it, and the same applies for getElementByTagName and others:
Sub TestSth()
Dim xmlpage As New MSXML2.XMLHTTP60
Dim htmldoc As New MSHTML.HTMLDocument
Dim htmlas As Object, gist As Object
Dim htmla As Object, main As Object, lux As String
Dim x As Long, link As Object, thank As Object
Range("A1").Select
xmlpage.Open "GET", "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8", False
xmlpage.send
htmldoc.body.innerHTML = xmlpage.responseText
Set xmlpage = Nothing
Set htmlas = htmldoc.getElementsByClassName("lockup-info")
For x = 0 To htmlas.Length - IIf(htmlas.Length > 0, 1, 0)
If Not htmlas(x) Is Nothing Then
If Not htmlas(x).getElementsByTagName("a") Is Nothing Then
If Not htmlas(x).getElementsByTagName("a")(0) Is Nothing Then
lux = htmlas(x).getElementsByTagName("a")(0).getAttribute("href")
If lux <> "" Then
xmlpage.Open "GET", lux, False
xmlpage.send
htmldoc.body.innerHTML = xmlpage.responseText
Set main = htmldoc.getElementsByClassName("intro")(1)
Set thank = main.getElementsByTagName("div")
For Each link In thank
ActiveCell.Value = link.innertext
ActiveCell.Offset(1, 0).Select
Next link
End If
End If
End If
End If
Next x
End Sub
This is the answer which fixes all the problems I was having:
Sub app_crawler()
Dim http As New XMLHTTP60, hdoc As New HTMLDocument, hdoc_one As New HTMLDocument
Dim elem As Object, post As Object, sstr As String
With http
.Open "GET", "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8", False
.send
hdoc.body.innerHTML = .responseText
End With
For Each elem In hdoc.getElementsByClassName("lockup-info")
With elem.getElementsByTagName("li")(0).getElementsByTagName("a")
If .Length Then sstr = .Item(0).href
End With
With http
.Open "GET", sstr, False
.send
hdoc_one.body.innerHTML = .responseText
End With
For Each post In hdoc_one.getElementsByClassName("intro")
With post.getElementsByTagName("h1")
If .Length Then i = i + 1: Cells(i, 1) = .Item(0).innerText
End With
Next post
Next elem
End Sub