Excel 2010 VBA - Pull Table from website - vba

I'm very new to Excel VBA and I was given some code to play with. I successfully modified it once, but then I tried to modify it again and it won't pull the right info. It might be because I don't know the table id for sure? I don't understand all of the code from this example...that's probably the other issue. Anyways I'm trying to pull the Historical Prices Table from this page. It pulls some data, but not the correct data. Any help would be appreciated. Thanks!
Here is my current code:
Sub GrabHistData()
Dim Ptrtbl As Long, r As Long, c As Long
Dim htm As Object
Dim elemCollection As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://finance.yahoo.com/q/hp?s=TWTR&a=04&b=30&c=2012&d=01&e=7&f=2014&g=d", False
.send
htm.body.innerhtml = .responsetext
End With
Set elemCollection = htm.getElementsByTagName("TABLE")
Ptrtbl = 1
For Each elem In elemCollection
Ptrtbl = Ptrtbl + 1
If elem.ID <> "yfncsumtab" Then GoTo Nxtelem
With elemCollection(Ptrtbl)
For c = 0 To (.Rows(r).Cells.Length - 1)
Cells(r + 1, c + 1) = .Rows(r).Cells(c).innertext
Next c
End With
Exit For
Nxtelem:
Next elem
End Sub

If you want to stick with your current approach, this works for me...
Sub GrabHistData()
Dim Ptrtbl As Long, r As Long, c As Long
Dim htm As Object
Dim elemCollection As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://finance.yahoo.com/q/hp?s=TWTR&a=04&b=30&c=2012&d=01&e=7&f=2014&g=d", False
.send
htm.body.innerhtml = .responseText
End With
Set elemCollection = htm.getElementsByTagName("td")
For Each itm In elemCollection
If itm.classname = "yfnc_tabledata1" Then
ActiveCell = itm.innertext
If ActiveCell.Column = 7 Then
ActiveCell.Offset(1, -6).Select
Else
ActiveCell.Offset(0, 1).Select
End If
End If
Next
End Sub

Related

Scraping Historical Data Table using VBA

I'm trying to scrape Historical Data Table from Yahoo finance using VBA. However I only managed to get only 1 data instead of the whole table. Is there is any easy method to do it? Please help me.
Public Sub History()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://finance.yahoo.com/quote/0166.KL/history?p=0166.KL", False
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
ActiveSheet.Cells(2, 2).Value = html.getElementsByClassName("Py(10px) Pstart(10px)")(0).innerText
Application.ScreenUpdating = True
End Sub
Your code returns only the first element of the Element object. This means ...(0).innerText.
Please, try the next code:
Public Sub History()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Dim hist As Object, el As Variant, i As Long, j As Long, startCel As Range, boolDiv As Boolean
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://finance.yahoo.com/quote/0166.KL/history?p=0166.KL", False
.send
html.Body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hist = html.getElementsByClassName("Py(10px) Pstart(10px)")
i = 2: j = 2
Set startCel = ActiveSheet.cells(1, 2)
Application.Calculation = xlCalculationManual
For Each el In hist
If j <= 7 Then
ActiveSheet.cells(i, j).Value = el.innerText: j = j + 1
Else
If InStr(el.innerText, "Dividend") > 0 Then boolDiv = True
j = 2: i = i + 1
ActiveSheet.cells(i, j).Value = el.innerText
If boolDiv Then
boolDiv = False: j = 2: i = i + 1
Else: j = j + 1: End If
End If
Next
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub

Data Scraping from Website - receiving error on different player pages

I'm trying to data scrape game logs from basketball reference.com. It worked perfectly on two players I choose (demar derozan and lamarcus aldridge). But then I started going through other players and it just wouldn't scrape the data for many of the other players (Kevin Durant).
I have NO Idea why it wouldn't work. For example, I tried Stephen Curry and it worked fine, but players like Draymond Green and Kevin Durant, the code would just not scrape the data at all. For some reason after the column Date, everything stopped working.
Sub Data()
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
ieObj.Visible = True
ieObj.navigate "https://www.basketball-reference.com/players/d/duranke01/gamelog/2019"
Application.ScreenUpdating = False
Application.DisplayStatusBar = True
Application.Wait Now + TimeValue("00:00:05")
For Each htmlEle In ieObj.document.getElementsByClassName("stats_table")(0).getElementsByTagName("tr")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
.Range("C" & i).Value = htmlEle.Children(2).textContent
.Range("D" & i).Value = htmlEle.Children(3).textContent
End With
i = i + 1
On Error Resume Next
Next htmlEle
End Sub
The error always happens on this line:
.Range("D" & i).Value = htmlEle.Children(3).textContent
I tried skipping columns to but it still wouldn't work.
I found no problem with using id for table, as seen in .responseText and using xmlhttp over browser.
I only tested with 3 urls - in sheet 1 A1:A3
https://www.basketball-reference.com/players/d/duranke01/gamelog/2019
https://www.basketball-reference.com/players/c/curryst01/gamelog/2019
https://www.basketball-reference.com/players/g/greendr01/gamelog/2019
With this site sometimes you can encounter tables inside comments so I stripped out the comment tags before processing. This was not necessary for the links I tried.
I use clipboard to copy paste but you could just use
Set hTable = html.getElementById("pgl_basic")
Then loop the tr and td as you wish using getElementsByTagName.
References (VBE>Tools>References):
Microsoft HTML Object Library
Option Explicit
Public Sub GetPlayerInfo()
Dim urls(), i As Long, html As HTMLDocument, hTable As Object
Dim ws As Worksheet, wsCurrent As Object, clipboard As Object
Dim lastRow As Long, playerIdentifier As String, arr() As String
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).Row
urls = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", urls(i), False
.send
html.body.innerHTML = Replace$(Replace$(.responseText, "-->", vbNullString), "<!--", vbNullString) 'remove comments
arr = Split(urls(i), "/")
playerIdentifier = arr(5)
If SheetExists(playerIdentifier) Then
With ThisWorkbook.Worksheets(playerIdentifier).Cells
.ClearContents
.ClearFormats
Set wsCurrent = ThisWorkbook.Worksheets(playerIdentifier)
End With
Else
Set wsCurrent = ThisWorkbook.Worksheets.Add
wsCurrent.name = playerIdentifier
End If
Set hTable = html.querySelector("#pgl_basic")
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
wsCurrent.Range("A1").PasteSpecial
Set wsCurrent = Nothing: Set hTable = Nothing: Erase arr: playerIdentifier = vbNullString
Application.CutCopyMode = False
Next
End With
End Sub
Public Function SheetExists(ByVal sheetName As String) As Boolean '<== function by #Rory
SheetExists = Evaluate("ISREF('" & sheetName & "'!A1)")
End Function
IE
Note proper page load wait used.
Option Explicit
Public Sub GetPlayerInfo()
Dim ieObj As InternetExplorer, htmlEle As IHTMLElement
Dim urls(), i As Long, j As Long, hTable As Object
Dim ws As Worksheet, wsCurrent As Object
Dim lastRow As Long, playerIdentifier As String, arr() As String
Application.ScreenUpdating = False
On Error GoTo errHand
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
urls = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
Set ieObj = New InternetExplorer
With ieObj
.Visible = True
For j = LBound(urls) To UBound(urls)
.navigate urls(j)
While .Busy Or .readyState <> 4: DoEvents: Wend
arr = Split(urls(j), "/")
playerIdentifier = arr(5)
If SheetExists(playerIdentifier) Then
With ThisWorkbook.Worksheets(playerIdentifier).Cells
.ClearContents
.ClearFormats
Set wsCurrent = ThisWorkbook.Worksheets(playerIdentifier)
End With
Else
Set wsCurrent = ThisWorkbook.Worksheets.Add
wsCurrent.Name = playerIdentifier
End If
i = 1
Set hTable = .document.getElementById("pgl_basic")
If Not hTable Is Nothing Then
For Each htmlEle In hTable.getElementsByTagName("tr")
With wsCurrent
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
.Range("C" & i).Value = htmlEle.Children(2).textContent
.Range("D" & i).Value = htmlEle.Children(3).textContent
End With
i = i + 1
Next htmlEle
End If
Set wsCurrent = Nothing: Set hTable = Nothing: Erase arr: playerIdentifier = vbNullString
Next
End With
errHand:
Application.ScreenUpdating = True
If Err.Number <> 0 Then Debug.Print Err.Number, Err.Description
ie.Quit
End Sub

Can't get Web Scraped Table to populate multiple cells VBA

I am trying to scrape a table from a website via VBA. I am able to gather all the data from the table however I am only able to get it to populate a single cell, rather than break out the information as formatted on the website. Essentially the data should be in four columns and then go down as many rows as there is data. I know I'm missing something simple here and I just can't put my finger on it. Help please! coding is below:
Sub WebScrape()
Dim objIE As InternetExplore'
Dim ele As Object
Dim y As Integer
objIE.navigate "http://www.uscfinvestments.com/holdings/usci"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each ele In objIE.document.getElementById("holdingsTableDiv").getElementsByTagName("table")
Debug.Print ele.textContent
Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
Next
ActiveWorkbook.Save
End Sub
I show a number of methods below. Personally, I prefer the last which uses an API.
Using clipboard:
A nice easy way, if you want to appear as on page, is to simply copy the table to clipboard and paste
Option Explicit
Public Sub GetInfo()
Dim ie As InternetExplorer, hTable As Object, clipboard As Object, ws As Worksheet, t As Date
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 "http://www.uscfinvestments.com/holdings/usci"
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
On Error Resume Next
Set hTable = .document.getElementById("holdingsTableID")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While hTable Is Nothing
If Not hTable Is Nothing Then
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ws.Cells(1, 1).PasteSpecial
End If
.Quit
End With
End Sub
Looping table:
If you want to loop rows and columns of a table then you can use row class name and row number to determine how to write out
Option Explicit
Public Sub GetInfo()
Dim ie As InternetExplorer, hTable As Object, ws As Worksheet, t As Date
Const MAX_WAIT_SEC As Long = 10
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 "http://www.uscfinvestments.com/holdings/usci"
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
On Error Resume Next
Set hTable = .document.getElementById("holdingsTableID")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While hTable Is Nothing
If Not hTable Is Nothing Then
WriteTable hTable, 1, ws
End If
.Quit
End With
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, td As Object, r As Long, c As Long, th As Object
r = startRow
With ws
For Each tRow In hTable.getElementsByTagName("tr")
c = 1
If r = startRow Or tRow.className = "subHeader" Then
For Each th In tRow.getElementsByTagName("th")
.Cells(r, c) = th.innerText
c = c + 1
Next
Else
For Each td In tRow.getElementsByTagName("td")
.Cells(r, c).Value = td.innerText
c = c + 1
Next td
End If
r = r + 1
Next tRow
End With
End Sub
API:
There is an API which serves the data in a json format
https://cssecure.alpsinc.com/api/v1//holding/usci
It requires authentication. I am using jsonconverter.bas to parse the json returned. After downloading and adding the .bas you need to go VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
Option Explicit
Public Sub GetValues()
Dim json As Object, authorization As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.uscfinvestments.com/uscfinvestments-template/assets/javascript/api_key.php", False
.send
authorization = Split(Split(.responseText, "'")(1), "'")(0)
.Open "GET", "https://cssecure.alpsinc.com/api/v1//holding/usci", False
.setRequestHeader "Authorization", authorization
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Dim arr(), headers(), item As Object, r As Long
headers = Array("Security", "Quantity", "Price", "Market Value")
r = 1
ReDim arr(1 To json.Count, 1 To 4)
For Each item In json
arr(r, 1) = item("name")
arr(r, 2) = item("shares")
Dim test As String
If IsNull(item("contractprice")) Then
arr(r, 3) = item("settlementprice")
Else
arr(r, 3) = item("contractprice")
End If
arr(r, 4) = item("marketvalue")
r = r + 1
Next
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(arr, 1), UBound(arr, 2)) = arr
End With
End Sub
There is nothing much to add here. QHarr has already covered them. I used hardcoded delay instead of explicit wait to make the script less verbose.
Sub GetContent()
Dim Html As HTMLDocument, elem As Object, tRow As Object, C&, R&
With New InternetExplorer
.Visible = False
.navigate "http://www.uscfinvestments.com/holdings/usci"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set Html = .Document
End With
Application.Wait Now + TimeValue("00:00:03") 'make it 05 if it fails somehow
For Each elem In Html.getElementById("holdingsTableID").Rows
For Each tRow In elem.Cells
C = C + 1: ThisWorkbook.Worksheets("Sheet1").Cells(R + 1, C) = tRow.innerText
Next tRow
C = 0: R = R + 1
Next elem
End Sub

Unable to convert my existing scraper to xmlhttp request

I have written a script to get Title from any random websites. It works flawlessly. I have written this using Internet Explorer. I've tried a lot but can't make the same using xmlhttp request as performance is a big issue to consider. What I've tried so far is:
Sub Title_scraping()
Dim IE As Object
Dim doc As Object, cel As Range
For Each cel In Range("A1:A5")
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate cel.Value
While IE.Busy
DoEvents
Wend
Set doc = IE.document
x = x + 1
Cells(x, 2) = doc.title
Next cel
End Sub
Sites I've tried with and got results:
https://stackoverflow.com/documentation/
https://codereview.stackexchange.com/
https://yts.ag/browse-movies
Combining your code and the code posted here, here is your final code:
Sub GetData()
Dim title As String
Dim objHttp As Object, cel As Range, x As Long
Set objHttp = CreateObject("MSXML2.ServerXMLHTTP")
For Each cel In Range("A1:A5")
objHttp.Open "GET", cel.Value, False
On Error Resume Next
objHttp.send ""
title = objHttp.responseText
If InStr(1, UCase(title), "<TITLE>") Then
title = Mid(title, InStr(1, UCase(title), "<TITLE>") + Len("<TITLE>"))
title = Mid(title, 1, InStr(1, UCase(title), "</TITLE>") - 1)
Else
title = ""
End If
x = x + 1
Cells(x, 2) = title
Next cel
End Sub

VBA HTML Scraping - '.innertext' from complex table

All,
I've created the following Module to scrape a single value (1m % change in London house prices) from the below address:
https://www.hometrack.com/uk/insight/uk-cities-house-price-index/
The specific value is nested within the following code:
The below VBA code is my attempt at scraping. I, perhaps wrongly, feel that I am very close to capturing the value - but the code will not work.
Does anyone know where I am going wrong here? It doesn't show an error message but also doesn't output any values.
Sub HousePriceData()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim ie As Object
Dim V As Variant
Dim myValue As Variant
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE "https://www.hometrack.com/uk/insight/uk-cities-house-price-index/"
ie.Visible = False
While ie.ReadyState <> 4
DoEvents
Wend
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Input")
Set TxtRng = ws.Range("C15")
Set myValue = ie.document.getElementById("cities-index-table").getElementsByTagName("tr")(7).g‌​etElementsByTagName("td")(5)
TxtRng = myValue.innerText
End Sub
Try to use XHR and primitive parsing instead of awkward IE:
Sub Test()
Dim strUrl As String
Dim strTmp As String
Dim arrTmp As Variant
strUrl = "https://www.hometrack.com/uk/insight/uk-cities-house-price-index/"
With CreateObject("MSXML2.XMLHttp")
.Open "GET", strUrl, False
.Send ""
strTmp = .ResponseText
End With
arrTmp = Split(strTmp, ">London</a></td>", 2)
strTmp = arrTmp(1)
arrTmp = Split(strTmp, "<td>", 7)
strTmp = arrTmp(6)
arrTmp = Split(strTmp, "</td>", 2)
strTmp = arrTmp(0)
ThisWorkbook.Sheets("Input").Range("C15").Value = strTmp
End Sub
try use this
Dim Engmt As String
Engmt = "ERRORHERE"
On Error Resume Next
Engmt = Trim(ie.document.getElementById("cities-index- table").getElementsByTagName("tr")(12).g‌​etElementsByTagName("td")(4).innerText)
On Error GoTo 0
If Engmt = "ERRORHERE" Then
TxtRng.Value = "ERROR"
Else
TxtRng.Value = Engmt
End If