using getElementByClassName in VBA

using getElementByClassName in VBA - vba

I am using this code to get product name from a page
code of page is
<div class="product-shop col-sm-7">
<div class="product-name">
<h1 >Claro Glass 1.5 L Rectangular Air Tight Food Container with Lid- Clear GMA0215A</h1>
</div>
my vba code is
Public Sub GetValueFromBrowser()
Dim ie As Object
Dim name As String
Do Until IsEmpty(ActiveCell)
ActiveCell.Offset(0, 1).Value = "RUNNING"
URL = Selection.Value
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = 0
.navigate URL
While .Busy Or .readyState <> 4
DoEvents
Wend
End With
Dim Doc As HTMLDocument
Set Doc = ie.document
ActiveCell.Offset(0, 1).Value = "ERROR"
name = Trim(Doc.getElementByClassName("product-name").innerText)
ActiveCell.Offset(0, 1).Value = name
ie.Quit
Loop
End Sub
error i am getting is
run-time error '438':
Object doesn't support this property or method

GetElementsByClassName method
You are missing an s in the name of the method getElementsByClassName.
Change this name = Trim(Doc.getElementByClassName("product-name").innerText)
To this name = Trim(Doc.getElementsByClassName("product-name")(0).innerText). Substitude the (0) for the item you are targeting.

It is still possible to define your own function getElementByClassName.
This function returns the very first element with given class name in the DOM document and Nothing when no element with this class name exist in the DOM document.
Public Function getElementByClassName(doc As MSHTML.HTMLDocument, className As String) As IHTMLElement
Set getElementByClassName = doc.querySelector("[class='" & className & "']")
End Function
Usage:
Dim elm As IHTMLElement
Set elm = getElementByClassName(doc, "product-name")
If Not elm Is Nothing Then
Debug.Print elm.innerText
End If

Related

I'm having trouble scraping this

I'm trying to understand why my references arent working well to scrape this data.
Here is the site as an example:
http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX&region=USA&culture=en-CA&ops=clear
And as a target:
<div id="data_i6" class="rf_crow"><div id="Y_1" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="741131269">741</div><div id="Y_2" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="836611464">837</div><div id="Y_3" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="939841654">940</div><div id="Y_4" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1010472512">1,010</div><div id="Y_5" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1100344312">1,100</div><div id="Y_6" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1115401551">1,115</div></div>
What I need to extract is the actual value in rawvalue="741131269" and the following is what I've gotten to work so far.
'Cells(1, 1) = Document.getElementsByClassName("rf_crow")'returns the rows of data into one cell
'Cells(1, 1) = Document.getElementById("Y_1").innerText 'returns the text for the year
'Cells(1, 1) = Document.getElementById("data_i1").innerText 'returns to first row of data
I know the above doesn't return what I want, because the comment tells you what it extracts into Excel. The sub-element doesn't seem to work as it does in other macros I've built. I thought something like this would work:
Cells(1, 1) = Document.getElementById("Y_1").getAttribute("rawvalue")
but that doesn't work, also, I tried:
Cells(1, 1) = Document.getElementById("data_i6").getElementById("Y_1").innertext
and that doesn't work either.

The solution is very easy. Just call it using it's attribute which is `rawvalue.
This is how you can go:
Using Hardcoded delay and for loop to check the availability of the desired value:
Sub GetValue()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object, elem As Object
With IE
.Visible = True
.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX&region=USA&culture=en-CA&ops=clear"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
''using hardcoded delay
Application.Wait Now + TimeValue("00:00:05")
For Each elem In HTML.getElementsByTagName("div")
If elem.innerText = "741" Then MsgBox elem.getAttribute("rawvalue"): Exit For
Next elem
End Sub
Using Explicit Wait:
Sub GetValue()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object
With IE
.Visible = True
.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX&region=USA&culture=en-CA&ops=clear"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
Do: Set post = HTML.querySelector("#data_i6 #Y_1"): DoEvents: Loop While post Is Nothing
MsgBox post.getAttribute("rawvalue")
End Sub
Output at this moment:
741131269

The following should illuminate some of the problems you were having.
.querySelectorAll
The exact element you mention is the second index returned by .querySelectorAll method of .document using the CSS selector #Y_1. The # means Id.
From that webpage it returns the following (sample shown - not all):
From the above you can see the string you want is returned by the index of 2 in the result.
querySelectorAll with Id? Isn't Id a unique identifier for a single element?
This Id, unexpectedly, is not unique to a single element on the page. It occurs a whopping 27 times:
This means you can use the .querySelectorAll method to return a nodeList of all matching items and take the item at index 2 to get your result.
Note:
If you want the long number next to rawvalue, 741131269, then parse the outerHTML of the returned element.
Debug.Print Replace(Split(Split(a.item(2).outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString)
.querySelector
Alternatively, you can target the id which is specific data_i6 with
.document.querySelector("#data_i6")
This CSS selector (#data_i6) returns the entire row as it has each year within. If using .querySelector you will only get the first item anyway which is year 1.
You can be more specific with the CSS selector and add the additional year Id to get just the year of interest:
#data_i6 #Y_1
Code: (querySelector method commented out next to querySelectorAll)
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX&region=USA&culture=en-CA&ops=clear"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 2)
Do
DoEvents
On Error Resume Next
Set a = .document.querySelectorAll("#Y_1") ' .document.querySelector("#data_i6 #Y_1")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Debug.Print Split(Split(a.item(2).innerText, "rawvalue=")(0), ">")(0) 'Split(Split(a.innerText, "rawvalue=")(0), ">")(0)
Debug.Print Replace(Split(Split(a.item(2).outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString) 'Replace(Split(Split(a.outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString)
.Quit
End With
End Sub

Try and declare "objCollection" as an object, strValue as string, and in the code below, replace in the first line the name of the http-request you declared:
Document.body.innerHTML = YourHTTPRequest.responseText
Set objCollection = Document.getElementsByClassName("rf_crow")
For Each objElement In objCollection
If objElement.ID = "Y_1" Then
strValue = objElement.getAttribute("rawvalue")
Exit For
End If
Next
Cells(1, 1) = strValue

Does this work for you?
Sub web_table_option_two()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX&region=USA&culture=en-CA&ops=clear"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub

GetElementsByTagName Returning [object HTMLParagraphElement]

I have the below code, wherein I'm trying to open a series of urls and pull in the data from each url (example: http://apps.mohltc.ca/ltchomes/detail.php?id=2588&lang=en). Of most interest to me would be those labeled as "Local Health Integration Network", "Licensee" and "Licensed Beds".
As it stands, I'm trying to just pull in all elements with tag name "p" and deal with the data scrubbing later on. My code currently pulls in "[object HTML Paragraph Element]" instead of the array that I'm hoping for. Can someone explain why this is?
Sub ImportLicenseeData()
Dim ie As Object
Dim LH As Object
Dim r As Integer
Set ie = CreateObject("InternetExplorer.Application")
For r = 4 To 10
With ie
ie.Visible = False
ie.Navigate Cells(r, "H").Value
Do While (ie.Busy Or ie.ReadyState <> 4): DoEvents: Loop
Set Doc = ie.Document
Set LH = Doc.getElementsByTagName("p")
End With
Worksheets("Sheet1").Range("J" & r).Value = LH
Next r
End Sub
Any help is appreciated.

Dim LH As IHTMLElementCollection
Dim htmlEle1 as IHTMLElement
It requires Microsoft HTML Object Library reference. Then you can interact with elements of LH collection (it's not an array) like this:
Set LH = Doc.getElementsByTagName("p")
For Each htmlEle1 in LH
Debug.Print htmlEle1.innerText
Next htmlEle1

Thanks for the help everyone. I wasn't too familiar with handling the HTML Elements, so I ended up going with a different approach. Appreciate the feedback regardless.
via http://www.ozgrid.com/forum/showthread.php?t=178150
Sub RetrieveHTML()
Dim rngSelect As Range
Dim sURL As String
Set rngSelect = Range("H8", Range("H8").End(xlDown))
Debug.Print rngSelect.Address
Set ie = CreateObject("InternetExplorer.Application")
For Each c In rngSelect
sURL = c.Value
With ie
.Visible = False
.Navigate sURL
Do Until .ReadyState = 4
DoEvents
Loop
Do While .Busy: DoEvents: Loop
Range(c.Address).Offset(0, 1).Value = ie.Document.DocumentElement.outerHTML
End With
Next c
End Sub

Use VBA to list all URL address of a web page

I used the below code for loading the web site http://www.flashscore.com/soccer/england/premier-league/results/.
After I found and click on the "Show more matches" link, all the football matches are loaded in the browser.
The below code will give as results only the first half of matches, the events showed before pressing the "Show more matches" link.
My question is how can I list all the events URL adress?
Sub Test_Flashscore()
Dim URL As String
Dim ie As New InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim dictObj As Object: Set dictObj = CreateObject("Scripting.Dictionary")
Dim tRowID As String
URL = "http://www.flashscore.com/soccer/england/premier-league/results/"
With ie
.navigate URL
.Visible = True
Do Until .readyState = READYSTATE_COMPLETE: DoEvents: Loop
Set HTMLdoc = .document
End With
For Each objLink In ie.document.getElementsByTagName("a")
If Left(objLink.innerText, 4) = "Show" Or Left(objLink.innerText, 4) = "Arat" Then
MsgBox "The link was founded!"
objLink.Click
Exit For
End If
Next objLink
With HTMLdoc
Set tblSet = .getElementById("fs-results")
Set mTbl = tblSet.getElementsByTagName("tbody")(0)
Set tRows = mTbl.getElementsByTagName("tr")
With dictObj
'If if value is not yet in dictionary, store it.
For Each tRow In tRows
'Remove the first four (4) characters.
tRowID = Mid(tRow.ID, 5)
If Not .Exists(tRowID) Then
.add tRowID, Empty
End If
Next tRow
End With
End With
i = 14
For Each Key In dictObj
ActiveSheet.Cells(i, 2) = "http://www.flashscore.com/" & Key & "/#match-summary"
i = i + 1
Next Key
Set ie = Nothing
MsgBox "Process Completed"
End Sub

You need to wait a little while for the rest of the content to load - clicking the link fires off a GET request to the server, so that needs to return content and the content needs to be rendered on the page before you can grab it.

Clicking on that link takes you to fixtures. You can replace all that before dictionary with
.navigate "https://www.flashscore.com/football/england/premier-league/fixtures/"
That is:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.flashscore.com/football/england/premier-league/fixtures/"
While .Busy Or .readyState < 4: DoEvents: Wend
'other code...using dictionary
'.Quit
End With
End Sub

Get image src by class name in VBA

i am trying to get url of large image from a page
<ul id="etalage">
<li class=" product-image-thumbs" >
<img class="etalage_source_image_large" src="http://website.com/media/1200x1200/16235_1.jpg" title="" />
<img class="etalage_source_image_small" src="http://website.com/media/450x450/16235_1.jpg" title="" />
</li>
</ul>
my vba code is
Public Sub macro1()
Dim ie As Object
Dim name As String
Do Until IsEmpty(ActiveCell)
ActiveCell.Offset(0, 1).Value = "RUNNING"
URL = Selection.Value
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = 1
.navigate URL
While .Busy Or .readyState <> 4
DoEvents
Wend
End With
Dim Doc As HTMLDocument
Set Doc = ie.document
ActiveCell.Offset(0, 1).Value = "ERROR"
name = Trim(Doc.getElementsByClassName("product-image-thumbs")(0).innerText)
ActiveCell.Offset(0, 2).Value = name
ActiveCell.Offset(0, 1).Value = "successful"
ActiveCell.Offset(1, 0).Select
ie.Quit
Loop
End Sub
my code giving blank cell...
also please suggest me how to run this macro faster.... i have 3000 url to work on.
Thanks in advance

According to the comments, try to speed the code up this way (untested code). The inner-text of the li element is empty string becasue there is no text inside of it, there is an image element but no text. HTH
Public Sub macro1()
Dim ie As Object
Dim name As String
Dim Doc As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 1
Do Until IsEmpty(ActiveCell)
ActiveCell.Offset(0, 1).Value = "RUNNING"
url = Selection.Value
ie.navigate url
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
Set Doc = ie.document
ActiveCell.Offset(0, 1).Value = "ERROR"
name = Trim(Doc.getElementsByClassName("product-image-thumbs")(0).innerText)
ActiveCell.Offset(0, 2).Value = name
ActiveCell.Offset(0, 1).Value = "successful"
ActiveCell.Offset(1, 0).Select
Loop
ie.Quit
End Sub
To get the src of all the images try using querySelectorAll method.
Dim img, imgs As IHTMLDOMChildrenCollection, i
Set imgs = Doc.querySelectorAll("li[class~='product-image-thumbs']>img")
For i = 0 To imgs.Length - 1
Set img = imgs.item(i)
Debug.Print img.getAttribute("src")
Next
See CSS attribute selectors.
EDIT:
If there are more img elements inside if the li.product-image-thumbs element then you have more possibilities how to get the right one img.
Get img which is placed immediately after the li :
"li[class~='product-image-thumbs']+img"
Get img inside of li by class name :
"li[class~='product-image-thumbs'] img[class~='etalage_source_image_small']"

Excel VBA Macro: Scraping data from site table that spans multiple pages

Thanks in advance for the help. I'm running Windows 8.1, I have the latest IE / Chrome browsers, and the latest Excel. I'm trying to write an Excel Macro that pulls data from StackOverflow (https://stackoverflow.com/tags). Specifically, I'm trying to pull the date (that the macro is run), the tag names, the # of tags, and the brief description of what the tag is. I have it working for the first page of the table, but not for the rest (there are 1132 pages at the moment). Right now, it overwrites the data everytime I run the macro, and I'm not sure how to make it look for the next empty cell before running.. Lastly, I'm trying to make it run automatically once per week.
I'd much appreciate any help here. Problems are:
Pulling data from the web table beyond the first page
Making it scrape data to the next empty row rather than overwriting
Making the Macro run automatically once per week
Code (so far) is below. Thanks!
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'Cells.Clear
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A3").CurrentRegion.EntireColumn.AutoFit
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1:D1").Merge
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub

There's no need to scrape Stack Overflow when they make the underlying data available to you through things like the Data Explorer. Using this query in the Data Explorer should get you the results you need:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
The permalink to that query is here and the "Download CSV" option which appears after the query runs is probably the easiest way to get the data into Excel. If you wanted to automate that part of things, the direct link to the CSV download of results is here

You can improve this to parse out exact elements but it loops all the pages and grabs all the tag info (everything next to a tag)
Option Explicit
Public Sub ImportStackOverflowData()
Dim ie As New InternetExplorer, html As HTMLDocument
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "https://stackoverflow.com/tags"
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Dim numPages As Long, i As Long, info As Object, item As Object, counter As Long
numPages = html.querySelector(".page-numbers.dots ~ a").innerText
For i = 1 To 2 ' numPages ''<==1 to 2 for testing; use to numPages
DoEvents
Set info = html.getElementById("tags_list")
For Each item In info.getElementsByClassName("grid-layout--cell tag-cell")
counter = counter + 1
Cells(counter, 1) = item.innerText
Next item
html.querySelector(".page-numbers.next").Click
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Next i
Application.ScreenUpdating = True
.Quit '<== Remember to quit application
End With
End Sub

I'm not making use of the DOM, but I find it very easy to get around just searching between known tags. If ever the expressions you are looking for are too common just tweak the code a bit so that it looks for a string after a string).
An example:
Public Sub ZipLookUp()
Dim URL As String, xmlHTTP As Object, html As Object, htmlResponse As String
Dim SStr As String, EStr As String, EndS As Integer, StartS As Integer
Dim Zip4Digit As String
URL = "https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=1&companyName=&address1=1642+Harmon+Street&address2=&city=Berkeley&state=CA&urbanCode=&postalCode=&zip=94703"
Set xmlHTTP = CreateObject("MSXML2.XMLHTTP")
xmlHTTP.Open "GET", URL, False
On Error GoTo NoConnect
xmlHTTP.send
On Error GoTo 0
Set html = CreateObject("htmlfile")
htmlResponse = xmlHTTP.ResponseText
If htmlResponse = Null Then
MsgBox ("Aborted Run - HTML response was null")
Application.ScreenUpdating = True
GoTo End_Prog
End If
'Searching for a string within 2 strings
SStr = "<span class=""address1 range"">" ' first string
EStr = "</span><br />" ' second string
StartS = InStr(1, htmlResponse, SStr, vbTextCompare) + Len(SStr)
EndS = InStr(StartS, htmlResponse, EStr, vbTextCompare)
Zip4Digit = Left(Mid(htmlResponse, StartS, EndS - StartS), 4)
MsgBox Zip4Digit
GoTo End_Prog
NoConnect:
If Err = -2147467259 Or Err = -2146697211 Then MsgBox "Error - No Connection": GoTo End_Prog 'MsgBox Err & ": " & Error(Err)
End_Prog:
End Sub

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

using getElementByClassName in VBA - vba

Related

I'm having trouble scraping this

GetElementsByTagName Returning [object HTMLParagraphElement]

Use VBA to list all URL address of a web page

Get image src by class name in VBA

Excel VBA Macro: Scraping data from site table that spans multiple pages

Categories

Resources