Web Scraping with VBA: collection of strong elements - vba

I would like to receive the following data for each player:
Purchase price (this value does not exist for each player!),
lowest price,
maximum price.
With my current coding I get as innertext all 3 values. But these are not separated! How can I evaluate these values individually?
If the purchase price does not exist, the placeholder "not found" should be output. How can I do this without an individual id or classname?
HTML structure is as follows:
<div class="historyPrice">
<div>
<strong>6,2 Mio</strong>
<span>Gekauft</span>
</div>
<div>
<strong>0,5 Mio</strong>
<span>Tiefstwert</span>
</div>
<div>
<strong>9,4 Mio</strong>
<span>Höchstwert</span>
</div>
</div>
It´s the same structure without "purchase Price" (in german "gekauft"). But the first "div" is missing.
<div class="historyPrice">
<div>
<strong>0,5 Mio</strong>
<span>Tiefstwert</span>
</div>
<div>
<strong>9,4 Mio</strong>
<span>Höchstwert</span>
</div>
</div>
My coding is as follows:
Sub PlayerValues()
Dim ie As New SHDocVw.InternetExplorer
Dim HTMLdoc As MSHTML.HTMLDocument
Dim HTMLPlayerValues As MSHTML.IHTMLElementCollection
Dim HTMLPlayerValue As Object
Dim URL1 As String
Dim currentRow As Long
URL1 = ws.Cells(currentRow, 11).Value
ie.Visible = False
ie.Navigate URL1
lastRow = ws.Cells(Rows.Count, 10).End(xlUp).Row
For currentRow = 7 To lastRow
Do While ie.ReadyState <> READYSTATE_COMPLETE
Loop
Application.Wait (Now + TimeValue("0:00:3"))
Set HTMLdoc = ie.Document
Set HTMLPlayerValues = HTMLdoc.getElementsByClassName("historyPrice")
For Each HTMLPlayerValue In HTMLPlayerValues
Debug.Print HTMLPlayerValue.innerText
Debug.Print "----------"
Next HTMLPlayerValue
Next currentRow
ie.Quit
End Sub

Related

How to update a field of dynamic Elements (Input ID & Name)

I am trying to update an amount to a field which having the below elements. I am no sure what is the code to use as the input id and name are not fixed and it change every time i access it. It doesn't give me any error message but the field is not being updated after I ran the codes.
`<div class="field-item dataValueWrite">
<table>
<tbody>
<tr>
<td nowrap="" align="left">
<div id="93497ErrorCUSTOM" class="" style="display:none;"><span class="" errid="" title="Please enter a valid amount" id="93497PegaRULESErrorFlag"></span></div>
<input id="93497" name="93497" onchange="changedInputAmountInCurrencyDisplay(this , '2' , '.' , '.' , ',' , 'false' ); " class="rightJustifyStyle" type="text" size="10" value="" data-changed="false"><script> formatCurrencyBeforeLoad(document.getElementById('93497'),'2','.', 'false');
</script><input id="93497HIDDEN" name="$PAcqCaseCreation$pMessageAmountUSD" type="hidden" value="0.00">
<script>
formatCurrencyBeforeLoad(document.getElementById('93497HIDDEN'),'2', '.' , 'false');
</script>
USD
</td>
</tr>
</tbody>
</table>
</div>`
Below is the code that I created so far:
Sub Test()
Dim IE As Object
Set IE = New InternetExplorerMedium
Dim doc As HTMLDocument
IE.navigate "intranet"
IE.Visible = True
While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set doc = IE.document
Dim i As Integer
Dim lastRow As Integer
Dim fileLink26 As String
Dim fileLink19 As String
Sheets("Case").Select
lastRow = Sheets("Case").Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastRow
doc.contentDocument.querySelector("[onclick='CreateAcqCase();']").Click
Application.Wait (Now + TimeValue("00:00:05"))
fileLink26 = Cells(i, 26)
fileLink19 = Cells(i, 19)
If fileLink26 = "" And fileLink19 = "CB" And Cells(i, 10).Value = "USD" Then
On Error Resume Next
Dim DispStg2 As Object
Do While DispStg2 Is Nothing
Set DispStg2 = doc.getElementsByTagName("iframe")(0).contentDocument.querySelector("input[value='FirstCB']")
DoEvents
Loop
DispStg2.Click
Application.Wait (Now + TimeValue("00:00:03"))
doc.getElementsByTagName("iframe")(0).contentDocument.querySelector("input.rightJustifyStyle").Item(2).Value = Sheets("Case").Range("k2") '>> This is where I stuck cause nothing happen after this code ran
End If
Next i
End Sub
doc.getElementsByTagName("iframe")(0).contentDocument.querySelector("input.rightJustifyStyle").Item(2).Value = Sheets("Case").Range("k2")
From your above sample code, it looks like you are trying to set the value of a second input element which has class name rightJustifyStyle in an iframe.
You can refer an example below.
Public Sub demo()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate2 "D:\Backup20190913\tests\354.html"
While .Busy Or .readyState < 4: DoEvents: Wend
ie.Navigate2 ie.document.querySelector("iframe").src
While .Busy Or .readyState < 4: DoEvents: Wend
ie.document.querySelector("input.rightJustifyStyle:nth-of-type(2)").Value = "Test value..."
While .Busy Or .readyState < 4: DoEvents: Wend
Stop
ie.Quit
End Sub
Output:
Let me know If I misunderstood anything. I will try to correct myself.

using getElementByClassName in VBA

I am using this code to get product name from a page
code of page is
<div class="product-shop col-sm-7">
<div class="product-name">
<h1 >Claro Glass 1.5 L Rectangular Air Tight Food Container with Lid- Clear GMA0215A</h1>
</div>
my vba code is
Public Sub GetValueFromBrowser()
Dim ie As Object
Dim name As String
Do Until IsEmpty(ActiveCell)
ActiveCell.Offset(0, 1).Value = "RUNNING"
URL = Selection.Value
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = 0
.navigate URL
While .Busy Or .readyState <> 4
DoEvents
Wend
End With
Dim Doc As HTMLDocument
Set Doc = ie.document
ActiveCell.Offset(0, 1).Value = "ERROR"
name = Trim(Doc.getElementByClassName("product-name").innerText)
ActiveCell.Offset(0, 1).Value = name
ie.Quit
Loop
End Sub
error i am getting is
run-time error '438':
Object doesn't support this property or method
GetElementsByClassName method
You are missing an s in the name of the method getElementsByClassName.
Change this name = Trim(Doc.getElementByClassName("product-name").innerText)
To this name = Trim(Doc.getElementsByClassName("product-name")(0).innerText). Substitude the (0) for the item you are targeting.
It is still possible to define your own function getElementByClassName.
This function returns the very first element with given class name in the DOM document and Nothing when no element with this class name exist in the DOM document.
Public Function getElementByClassName(doc As MSHTML.HTMLDocument, className As String) As IHTMLElement
Set getElementByClassName = doc.querySelector("[class='" & className & "']")
End Function
Usage:
Dim elm As IHTMLElement
Set elm = getElementByClassName(doc, "product-name")
If Not elm Is Nothing Then
Debug.Print elm.innerText
End If

Get image src by class name in VBA

i am trying to get url of large image from a page
<ul id="etalage">
<li class=" product-image-thumbs" >
<img class="etalage_source_image_large" src="http://website.com/media/1200x1200/16235_1.jpg" title="" />
<img class="etalage_source_image_small" src="http://website.com/media/450x450/16235_1.jpg" title="" />
</li>
</ul>
my vba code is
Public Sub macro1()
Dim ie As Object
Dim name As String
Do Until IsEmpty(ActiveCell)
ActiveCell.Offset(0, 1).Value = "RUNNING"
URL = Selection.Value
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = 1
.navigate URL
While .Busy Or .readyState <> 4
DoEvents
Wend
End With
Dim Doc As HTMLDocument
Set Doc = ie.document
ActiveCell.Offset(0, 1).Value = "ERROR"
name = Trim(Doc.getElementsByClassName("product-image-thumbs")(0).innerText)
ActiveCell.Offset(0, 2).Value = name
ActiveCell.Offset(0, 1).Value = "successful"
ActiveCell.Offset(1, 0).Select
ie.Quit
Loop
End Sub
my code giving blank cell...
also please suggest me how to run this macro faster.... i have 3000 url to work on.
Thanks in advance
According to the comments, try to speed the code up this way (untested code). The inner-text of the li element is empty string becasue there is no text inside of it, there is an image element but no text. HTH
Public Sub macro1()
Dim ie As Object
Dim name As String
Dim Doc As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 1
Do Until IsEmpty(ActiveCell)
ActiveCell.Offset(0, 1).Value = "RUNNING"
url = Selection.Value
ie.navigate url
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
Set Doc = ie.document
ActiveCell.Offset(0, 1).Value = "ERROR"
name = Trim(Doc.getElementsByClassName("product-image-thumbs")(0).innerText)
ActiveCell.Offset(0, 2).Value = name
ActiveCell.Offset(0, 1).Value = "successful"
ActiveCell.Offset(1, 0).Select
Loop
ie.Quit
End Sub
To get the src of all the images try using querySelectorAll method.
Dim img, imgs As IHTMLDOMChildrenCollection, i
Set imgs = Doc.querySelectorAll("li[class~='product-image-thumbs']>img")
For i = 0 To imgs.Length - 1
Set img = imgs.item(i)
Debug.Print img.getAttribute("src")
Next
See CSS attribute selectors.
EDIT:
If there are more img elements inside if the li.product-image-thumbs element then you have more possibilities how to get the right one img.
Get img which is placed immediately after the li :
"li[class~='product-image-thumbs']+img"
Get img inside of li by class name :
"li[class~='product-image-thumbs'] img[class~='etalage_source_image_small']"

VBA : Retrieve label value from HTML to MsgBox

I have following HTML code, that I want to retrieve data from:
<div class="span4">
<div>
<label for="Game_type">Portal Games</label>
XXX
</div>
<div>
<label for="Game_Reference">Game reference</label>
22130903
</div>
<div>
<label for="Release_Date">Release Date</label>
2015-07-13
</div>
<div>
<label for="Prise">Prise</label>
USD 90,00
</div>
<div>
<label for="Game_Rank">Game Rank</label>
4
</div>
</div>
How I am able to get all those label values/at least one value into MsgBox?. (Later I will input them into Excel myself)
I have tried using following code to get first value:
Dim IE As Object
Set IE = CreateObject("INTERNETEXPLORER.APPLICATION")
'page address is stated in code
IE.navigate "page name"
IE.Visible = True
While IE.Busy
'Wait until IE is busy and loading page
DoEvents
Wend
Set gtype = IE.Document.getElementsByClassName("span4")(0).getElementsById("Game_type")
GtypeValue = gtype.Value
MsgBox (GtypeValue)
End Sub
I received run-time error "91:"
Object variable or With Block variable not set.
150904
Hopefully last one question, regarding this topic.Default code looks like
strCont = objIE.Document.getElementsByClassName("span4")(0).innerHTML
But I want to have a variable instead of "span4", in example Dim1= "span4"
I state following:
strCont = "objIE.Document.getElementsByClassName(" & Chr(34) & Dim1 & Chr(34) & ")(0).innerHTML"
It does not work, empty value in MsgBox. How can I make sure that this sting will be counted as exact code to be executed later in step:
Set objMatches = .Execute(strCont)
Why not to try regex for parsing?
Sub MsgGameType()
Dim objIE As Object
Dim strCont As String
Dim objMatches As Object
Dim objMatch As Object
Set objIE = CreateObject("InternetExplorer.Application")
'page address is stated in code
objIE.Navigate "page name"
objIE.Visible = True
Do While objIE.Busy Or Not objIE.readyState = 4
DoEvents
Loop
Do Until objIE.document.readyState = "complete"
DoEvents
Loop
strCont = objIE.document.getElementsByClassName("span4")(0).innerHtml
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "<div>\s*<label for="".*?"">(.*?)</label>\s*(.+?)\s*?</div>"
Set objMatches = .Execute(strCont)
For Each objMatch In objMatches
MsgBox objMatch.SubMatches(0) & " = " & objMatch.SubMatches(1)
Next
End With
End Sub
See XHTML parsing with RegExp disclaimer.

Extracting a value in a class from Internet Explorer using Visual Basic

With this code, I want to retrieve the value a specific tag from a website and put it on my spreadsheet:
Sub get_tit()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
Set wb = CreateObject("internetExplorer.Application")
wb.navigate "https://www.tyre-shopper.co.uk/search/205-55-16/V/91"
Do While wb.Busy
Loop
Set doc = wb.document
Price = SelectNodes("#more-tyres > li:nth-child(4) > div > div.result-buy > form > span.tyre-price > div.tyre-price-cost.tyres-1 > strong")
Range("A5").Value = Price
End Sub
I tried to use the CSS path to select it as a node but couldn't.
I have also tried to select it from the class but again it did not work
This is the code from the website, from which I want to retrieve the value 57.50
<span class="tyre-price">
Fully Fitted Price
<div class="tyre-price-cost tyres-1">
<strong>£57.50</strong>
</div><div class="tyre-price-cost tyres-2" style="display:none">
<strong>£115.00</strong>
</div><div class="tyre-price-cost tyres-3" style="display:none">
<strong>£172.50</strong>
</div><div class="tyre-price-cost tyres-4" style="display:none">
<strong>£230.00</strong>
</div><div class="tyre-price-cost tyres-5" style="display:none">
<strong>£287.50</strong>
</div>
</span>
I've never had much success with directly extracting elements according to the getElementsByClassName method but cycling through the collection returned by the getElementsByTagName method and comparing the class property seems to work as least as well.
Sub get_tit()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim iDIV As Long, sPrice As Variant, sib As Long
Dim eSIB As IHTMLElement
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
Set wb = CreateObject("internetExplorer.Application")
wb.Visible = True
wb.navigate "https://www.tyre-shopper.co.uk/search/205-55-16/V/91"
Do While wb.Busy And wb.readyState <> 4
DoEvents
Loop
Set doc = wb.document
With doc.body
sPrice = Array(0, 0, 0, 0)
For iDIV = 0 To .getElementsByTagName("div").Length - 1
With .getElementsByTagName("div")(iDIV)
Select Case .className
Case "tyre-price-cost tyres-1"
sPrice(0) = .innerText
Case "tyre-price-cost tyres-2"
sPrice(1) = .innerText
Case "tyre-price-cost tyres-3"
sPrice(2) = .innerText
Case "tyre-price-cost tyres-4"
sPrice(3) = .innerText
With Sheet1
.Cells(Rows.Count, "A").End(xlUp).Offset(1, 0) = sPrice(0)
.Cells(Rows.Count, "A").End(xlUp).Offset(0, 1) = sPrice(1)
.Cells(Rows.Count, "A").End(xlUp).Offset(0, 2) = sPrice(2)
.Cells(Rows.Count, "A").End(xlUp).Offset(0, 3) = sPrice(3)
End With
sPrice = Array(0, 0, 0, 0)
Case Else
End Select
End With
Next iDIV
End With
End Sub
        
fwiw, I believe you would be better served with an IXMLHTTPRequest based scrape than one using the InternetExplorer object.