Scraping website using Excel vba - vba

So I'm new to vba and i am try to get price (i tried everyting my know). The macro is :
Sub Deneme()
Dim objIE As InternetExplorer
Dim Prc1 As String
Set objIE = New InternetExplorer
Dim Search_Terms() As Variant
Dim CopiedData() As Variant
Dim y As Integer
objIE.Visible = False
Search_Terms = Application.Transpose(ActiveSheet.Range("A2:A169").Value)
ReDim CopiedData(LBound(Search_Terms) To UBound(Search_Terms))
y = 2
For a = LBound(Search_Terms) To UBound(Search_Terms)
objIE.navigate "https://steamcommunity.com/market/listings/578080/" & Search_Terms(a)
Do: DoEvents: Loop Until objIE.readyState = 4
Prc1 = objIE.document.getElementsByClassName("market_commodity_orders_table")(4).getElementsByTagName("tr")(1).textContent '<----- the problem is here
ActiveSheet.Range("D" & y).Value = Prc1
y = y + 1
Next
objIE.Quit
End Sub
The website is THIS and I am trying to get this value:
Mostly error is :
Run-time error '91':
Object variable or With block variable not set.
And Debug is :
objIE.document.getElementsByClassName("market_commodity_orders_table")(4).getElementsByTagName("tr")(1).textContent

In the process of me testing my new code for you, I realized that you have other issues other than the class name you were attempting to use not existing.
The other issue is that the document loads before some of the other resources - this is likely due to the fact that this site updates the price every second (and therefore the price is not initially loaded in the objIE.Document object).
To get around this, I've added a couple of loops to wait for your object to become available. This should work for you.
Sub Deneme()
Dim objIE As InternetExplorer
Dim Prc1 As String
Set objIE = New InternetExplorer
Dim Search_Terms() As Variant
Dim CopiedData() As Variant
Dim y As Integer
Dim elemObj As Object
objIE.Visible = False
Search_Terms = Application.Transpose(ActiveSheet.Range("A2:A169").Value)
ReDim CopiedData(LBound(Search_Terms) To UBound(Search_Terms))
y = 2
For a = LBound(Search_Terms) To UBound(Search_Terms)
objIE.navigate "https://steamcommunity.com/market/listings/578080/" & Search_Terms(a)
Do: DoEvents: Loop Until objIE.readyState = 4
Do While Prc1 = ""
Do While elemObj Is Nothing
Set elemObj = objIE.document.getElementById("market_commodity_buyrequests")
Set elemObj = elemObj.getElementsByClassName("market_commodity_orders_header_promote")(1)
Loop
Prc1 = elemObj.innerText
Loop
ActiveSheet.Range("D" & y).Value = Prc1
Set elemObj = Nothing
Prc1 = vbNullString
y = y + 1
Next
objIE.Quit
End Sub

There are 2 issues in your code …
There is no class called market_commodity_orders_table
item counting starts with 0 so the 4ᵗʰ item is item no 3.
You can use this:
Prc1 = objIE.document.getElementsByClassName("market_commodity_orders_header_promote").Item(3).innerText

Let us try it in a slightly different manner. If you have IE9 or later then the following code should work for you flawlessly. I used .querySelector() here. Give this a shot and find the price you are after.
Sub GetPrice()
Const URL As String = "https://steamcommunity.com/market/listings/578080/PLAYERUNKNOWN's%20Bandana"
Dim HTML As HTMLDocument, post As Object
With New InternetExplorer
.Visible = True
.navigate URL
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
Do: Set post = HTML.querySelector("#market_commodity_buyrequests .market_commodity_orders_header_promote:nth-of-type(2)"): DoEvents: Loop While post Is Nothing
[A1] = post.innerText
.Quit
End With
End Sub
Reference to add to the library:
Microsoft Internet Controls
Microsoft HTML Object Library

Related

Extracting data from website to excel error

I am having a difficulty setting up data extraction from website to Excel.
I want to extract exact price of a product to excel.
So far I have this code:
Sub GetData()
Dim objIE As InternetExplorer 'Microsoft Internet Controls library added
Dim itemEle As Object
Dim data As String
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "https://www.nay.sk/samsung-ue55nu7172"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemEle In objIE.document.getElementsByClassName("price")
data = itemEle.getElementsByClassName("price")(0).innerText
y = y + 1
Next
data = Range("A1").Value
End Sub
What would you suggest?
Do you want every price?
You can list the first two for example this way:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, i As Long, html As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nay.sk/samsung-ue55nu7172", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
Dim titles As Object, prices As Object
With html
.body.innerHTML = sResponse
Set titles = .querySelectorAll(".title")
Set prices = .querySelectorAll(".price")
End With
For i = 0 To 1
Debug.Print titles(i).innerText & prices(i).innerText
Next i
End Sub
That loop returns you these:
You actually have all the elements on the page with a price class stored in the object prices.
You can view all the prices by looping the length of that object/nodeList with:
For i = 0 To prices.Length - 1
Debug.Print Prices.item(i).innerText
Next i
Likewise you can loop the .Length of titles but note that it is a different length from prices. There are more prices on the page (or rather elements with a price class versus elements with a title class.
References (VBE>Tools>References):
HTML Object Library
Try this:
Sub GetData()
Dim objIE As New InternetExplorer 'Microsoft Internet Controls library added
Dim itemEle As Object
Dim data As String
Dim y As Integer
objIE.Visible = True
objIE.navigate "https://www.nay.sk/samsung-ue55nu7172"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemEle In objIE.document.getElementsByClassName("price")
Cells(y, 1) = itemEle.outertext
y = y + 1
Next
End Sub
This is what you get:
To get the correct properties of the itemEle:
put a stop sign on the line from the printscreen below
select itemEle with your mouse
press Shift+F9

GetElementsByTagName Returning [object HTMLParagraphElement]

I have the below code, wherein I'm trying to open a series of urls and pull in the data from each url (example: http://apps.mohltc.ca/ltchomes/detail.php?id=2588&lang=en). Of most interest to me would be those labeled as "Local Health Integration Network", "Licensee" and "Licensed Beds".
As it stands, I'm trying to just pull in all elements with tag name "p" and deal with the data scrubbing later on. My code currently pulls in "[object HTML Paragraph Element]" instead of the array that I'm hoping for. Can someone explain why this is?
Sub ImportLicenseeData()
Dim ie As Object
Dim LH As Object
Dim r As Integer
Set ie = CreateObject("InternetExplorer.Application")
For r = 4 To 10
With ie
ie.Visible = False
ie.Navigate Cells(r, "H").Value
Do While (ie.Busy Or ie.ReadyState <> 4): DoEvents: Loop
Set Doc = ie.Document
Set LH = Doc.getElementsByTagName("p")
End With
Worksheets("Sheet1").Range("J" & r).Value = LH
Next r
End Sub
Any help is appreciated.
Dim LH As IHTMLElementCollection
Dim htmlEle1 as IHTMLElement
It requires Microsoft HTML Object Library reference. Then you can interact with elements of LH collection (it's not an array) like this:
Set LH = Doc.getElementsByTagName("p")
For Each htmlEle1 in LH
Debug.Print htmlEle1.innerText
Next htmlEle1
Thanks for the help everyone. I wasn't too familiar with handling the HTML Elements, so I ended up going with a different approach. Appreciate the feedback regardless.
via http://www.ozgrid.com/forum/showthread.php?t=178150
Sub RetrieveHTML()
Dim rngSelect As Range
Dim sURL As String
Set rngSelect = Range("H8", Range("H8").End(xlDown))
Debug.Print rngSelect.Address
Set ie = CreateObject("InternetExplorer.Application")
For Each c In rngSelect
sURL = c.Value
With ie
.Visible = False
.Navigate sURL
Do Until .ReadyState = 4
DoEvents
Loop
Do While .Busy: DoEvents: Loop
Range(c.Address).Offset(0, 1).Value = ie.Document.DocumentElement.outerHTML
End With
Next c
End Sub

Scraping a single value from an HTML table and inserting into an Excel cell with VBA

Please see the code below. I am compiling a list of unusual currency pairings in excel and I wish to scrape this data with VBA. I only want to insert the value itself into the cell. Does anyone know where I am going wrong here? I am getting a 'Run-time error '91': object variable or With block variable not set'. I'm relatively new to VBA and i've put a lot a deal of thought into this.
Sub ie_open()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim ie As Object
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE "http://www.barchart.com/quotes/forex/British_Pound/Costa_Rican_Colon/%5EGBPCRC"
ie.Visible = True
While ie.ReadyState <> 4
DoEvents
Wend
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Test Sheet")
Set TxtRng = ws.Range("A1")
TxtRng.Value = ie.document.getelementsbyname("divQuotePage").Item.innertext
End Sub
This is the data which I am trying to scrape:
Thanks.
I'm not that accomplished at web scraping, but that kind of error often means that what you are looking for isn't there. In particular, I don't see divQuotePage in the screen shot you provided.
But if you want the quote (793.19) you could do something like:
Dim V As Variant
Set V = ie.document.getelementbyid("dtaLast")
TxtRng = V.innertext
This will work.
Sub Test()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.Navigate "http://www.barchart.com/quotes/forex/British_Pound/Costa_Rican_Colon/%5EGBPCRC" ' should work for any URL
Do Until .ReadyState = 4: DoEvents: Loop
x = .document.body.innertext
y = InStr(1, x, "Last Price")
Z = Mid(x, y, 19)
Range("A1").Value = Trim(Z)
.Quit
End With
End Sub
You can target that element with a CSS selector of div.pricechangerow > span.last-change;
which can be simplified to .last-change.
The "." means class and you can retrieve this specific item with
Debug.Print ie.document.querySelector.querySelector(".last-change").innerText
That is for the website's current incarnation at 2018-06-30

Get website data from Urls using VBA

I have multiple urls stored in Excel sheet. I want to Get data reside within particular div tag. For One Website it works fine
Sub Cityline()
Dim IE As Object
Set IE = CreateObject("Internetexplorer.application")
IE.Visible = True
IE.navigate "http://Someurl.com/bla/bla/bla"
Do While IE.busy
DoEvents
Loop
Do
DoEvents
Dim Doc As Object
Set Doc = IE.Document
Dim workout As String
workout = Doc.getElementsByClassName("CLASS_NAME_OF_DATA")(0).innertext
Range("A2") = workout
Loop
End Sub
I used Below code for loop Through all urls but its not working
Sub GetData()
Dim oHtm As Object: Set oHtm = CreateObject("HTMLFile")
Dim req As Object: Set req = CreateObject("msxml2.xmlhttp")
Dim oRow As Object
Dim oCell As Range
Dim url As String
Dim y As Long, x As Long
x = 1
For Each oCell In Sheets("sheet1").Range("A2:A340")
req.Open "GET", oCell.Offset(, 1).Value, False
req.send
With oHtm
.body.innerhtml = req.responsetext
With .getelementsbytagname("table")(1)
With Sheets(1)
.Cells(x, 1).Value = oCell.Offset(, -1).Value
.Cells(x, 2).Value = oCell.Value
End With
y = 3
For Each oRow In .Rows
Sheets(1).Cells(x, y).Value = oRow.Cells(1).innertext
y = y + 1
Next oRow
End With
End With
x = x + 1
Next oCell
End Sub
But its not working
can any one suggest me where i went wrong ?
I used Fetching Data from multiple URLs but it doesn't works for me.
Please guide me how to get data from all urls at a Time
I'm new to SO, so apologies to the mods if this should be in comments (I couldn't get it to fit).
I agree with Silver's comments, but I thought I'd suggest a different approach that might help. If you have URLs in a column of cells, you could create a custom VBA function that will extract the relevant data out of the HTML. Just use this function in the cells to the right of your URL to return the relevant data from the HTML. An example is this:
Public Function GetHTMLData(SiteURL As String, FieldSearch As String) As String
Dim IE As Object
Dim BodyHTML As String
Dim FieldStart As Integer
Dim FieldEnd As Integer
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Navigate SiteURL
Do While .Busy Or .ReadyState <> 4
DoEvents
Loop
BodyHTML = IIf(StrComp(.Document.Title, "Cannot find server", vbTextCompare) = 0, _
vbNullString, .Document.body.innerhtml)
FieldStart = InStr(1, BodyHTML, FieldSearch) + Len(FieldSearch) + 12
FieldEnd = InStr(FieldStart, BodyHTML, "<")
GetHTMLData = Mid(BodyHTML, FieldStart, FieldEnd - FieldStart)
.Quit
End With
Set IE = Nothing
End Function
The function above has 2 input parameters: the URL and a string that will be searched for within the HTML. It will then return a string from within the HTML, starting from 12 characters after the searched parameter and ending at the following '<' within the HTML.
Hope that helps.

VBA hanging on ie.busy and readystate check

I am trying to grab some football player data from a website to fill a privately used database. I've included the entire code below. This first section is a looper that calls the second function to fill a database. I've run this code in MSAccess to fill a database last summer and it worked great.
Now I am only getting a few teams to fill before the program gets hung up at
While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
I've searched countless websites regarding this error and tried changing this code by putting in sub function to wait a period of seconds or other work-arounds. None of those solve the issue. I've also tried running this on multiple computers.
The first computer made it through 3 teams (or three calls of the 2nd function). The second slower computer makes it through 5 teams. Both eventually hang. The 1st computer has Internet Explorer 10 and the second has IE8.
Sub Parse_NFL_RawSalaries()
Status ("Importing NFL Salary Information.")
Dim mydb As Database
Dim teamdata As DAO.Recordset
Dim i As Integer
Dim j As Double
Set mydb = CurrentDb()
Set teamdata = mydb.OpenRecordset("TEAM")
i = 1
With teamdata
Do Until .EOF
Call Parse_Team_RawSalaries(teamdata![RotoworldTeam])
.MoveNext
i = i + 1
j = i / 32
Status("Importing NFL Salary Information. " & Str(Round(j * 100, 0)) & "% done")
Loop
End With
teamdata.Close ' reset variables
Set teamdata = Nothing
Set mydb = Nothing
Status ("") 'resets the status bar
End Sub
Second function:
Function Parse_Team_RawSalaries(Team As String)
Dim mydb As Database
Dim rst As DAO.Recordset
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim TABLEelements As IHTMLElementCollection
Dim TRelements As IHTMLElementCollection
Dim TDelements As IHTMLElementCollection
Dim TABLEelement As Object
Dim TRelement As Object
Dim TDelement As HTMLTableCell
Dim c As Long
' open the table
Set mydb = CurrentDb()
Set rst = mydb.OpenRecordset("TempSalary")
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "http://www.rotoworld.com/teams/contracts/nfl/" & Team
While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = IE.Document
Set TABLEelements = HTMLdoc.getElementsByTagName("Table")
For Each TABLEelement In TABLEelements
If TABLEelement.id = "cp1_tblContracts" Then
Set TRelements = TABLEelement.getElementsByTagName("TR")
For Each TRelement In TRelements
If TRelement.className <> "columnnames" Then
rst.AddNew
rst![Team] = Team
c = 0
Set TDelements = TRelement.getElementsByTagName("TD")
For Each TDelement In TDelements
Select Case c
Case 0
rst![Player] = Trim(TDelement.innerText)
Case 1
rst![position] = Trim(TDelement.innerText)
Case 2
rst![ContractTerms] = Trim(TDelement.innerText)
End Select
c = c + 1
Next TDelement
rst.Update
End If
Next TRelement
End If
Next TABLEelement
' reset variables
rst.Close
Set rst = Nothing
Set mydb = Nothing
IE.Quit
End Function
In Parse_Team_RawSalaries, instead of using the InternetExplorer.Application object, how about using MSXML2.XMLHTTP60?
So, instead of this:
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "http://www.rotoworld.com/teams/contracts/nfl/" & Team
While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = IE.Document
Maybe try using this (add a reference to "Microsoft XML 6.0" in VBA Editor first):
Dim IE As MSXML2.XMLHTTP60
Set IE = New MSXML2.XMLHTTP60
IE.Open "GET", "http://www.rotoworld.com/teams/contracts/nfl/" & Team, False
IE.send
While IE.ReadyState <> 4
DoEvents
Wend
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.htmlBody
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
HTMLBody.innerHTML = IE.responseText
I've generally found that MSXML2.XMLHTTP60 (and WinHttp.WinHttpRequest, for that matter) generally perform better (faster and more reliable) than InternetExplorer.Application.
I've found this post very helpful when I encountered similiar problem. Here is my solution:
I used
Dim browser As SHDocVw.InternetExplorer
Set browser = New SHDocVw.InternetExplorer
and
cTime = Now + TimeValue("00:01:00")
Do Until (browser.readyState = 4 And Not browser.Busy)
If Now < cTime Then
DoEvents
Else
browser.Quit
Set browser = Nothing
MsgBox "Error"
Exit Sub
End If
Loop
Sometimes page is loaded but code stops on DoEvents and goes on and on and on. Using this code it goes on only for 1 minute and if browser is not ready it quits the browser and exits sub.
I know this is a old post but. I have had the same problem with my code for downloading web site pictures using Excel VBA automation. Some sites wont let you download a image file using a link without first opening the link in a browser. However my code was getting hung up sometimes with when the objBrowser.visible was set to false with the folowing code
Do Until (objBrowser.busy = False And objBrowser.readyState = 4)
Application.Wait (Now + TimeValue("0:00:01"))
DoEvents 'browser.readyState = 4
Loop
the simple fix was to make the objBrowser.visible
I fixed it with
Dim Passes As Integer: Passes = 0
Do Until (objBrowser.busy = False And objBrowser.readyState = 4)
Passes = Passes + 1 'count loops
Application.Wait (Now + TimeValue("0:00:01"))
DoEvents
If Passes > 5 Then
'set size browser cannot set it smaller than 400
objBrowser.Width = 400 'set size
objBrowser.Height = 400
Label8.Caption = Passes 'display loop count
' position browser "you cannot move it off the screen" ready state wont change
objBrowser.Left = UserForm2.Left + UserForm2.Width
objBrowser.Top = UserForm2.Top + UserForm2.Height
objBrowser.Visible = True
DoEvents
objBrowser.Visible = False
End If
Loop
objBrowser only flashes for less than a second but it gets the job done!