VBA : Finding nested elements in HTML Document - vba

I am trying to scrape all the href links of products from this link.
I am using the following code to get the product links on the page:
Sub urlCatch()
Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link, itm As Object
Dim url As String
Dim X As Variant
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
url = "http://www.dell.com/vg/p/desktops.aspx?c=vg&cs=vgdhs1&l=en&s=dhs&~ck=mn"
internet.Navigate url
Do Until internet.ReadyState >= 4
DoEvents
Loop
Set internetdata = internet.document
Set div_result = internetdata.getelementsbyclassname("categorySubNavigation").getelementsbyclassname("c4 seriesOptions")
Set header_links = div_result.getelementsbytagname("a")
For Each itm In header_links
Set link = itm.ChildNodes.Item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next
End Sub
I am getting an error at div_reult: "object doesn't support this property or method".
This is the first time I am using an element to retrieve something, so maybe I might be making mistakes which I am currently unaware off.
Please look into my code, and let me know what blunder I am making here.
I am told by one of my friends that I might need to use a regex to get all these links, but I wanted to get a hold of this method first.
Please give me some guidance. Thanks.

GetElementsByTagName or other fetch methods work on single element and not over a collection. You need another loop.
Sub urlCatch()
Dim url As String
Dim internet As Object
Dim internetdata
Dim div_result
Dim links
Dim itm
Dim itm2
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
url = "http://www.dell.com/vg/p/desktops.aspx?c=vg&cs=vgdhs1&l=en&s=dhs&~ck=mn"
internet.Navigate url
Do
DoEvents
Loop Until internet.ReadyState >= 4 And Not internet.busy
Set internetdata = internet.document.body
Set div_result = internetdata.getelementsbyclassname("c4 seriesOptions")
For Each itm In div_result
Set links = itm.getElementsByTagName("A")
For Each itm2 In links
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = itm2.href
Next
Next
End Sub

Related

Add not replace content in Word Content Control using VBA

I am trying to generate multiple Word documents which have content controls that are populated from an Excel file. The second content control needs to be populated with a list which varies in length.
How do I add each value to the content control instead of replacing the current value? I am currently using Rich Text Content Controls.
Here is what I have so far:
Sub CreateCoverLetters()
Dim objWord As Word.Application
Dim wDoc As Word.Document
Dim Rows As Integer
Set objWord = CreateObject(Class:="Word.Application")
objWord.Visible = True
Set wDoc = objWord.Documents.Open(*insert filepath*)
objWord.Activate
wDoc.ContentControls(1).Range.Text = Worksheets("Lists").Range("A2").Value
Rows = Worksheets("Lists").Range("A3", Range("A3").End(xlDown)).Rows.Count
r = 3
For i = 1 To Rows
wDoc.ContentControls(2).Range.Text = Worksheets("Lists").Cells(r, 1).Value
r = r + 1
Next
wDoc.SaveAs (*insert filepath*)
End Sub
Any help much appreciated!
Solved it as follows:
Sub CreateCoverLetters()
Dim objWord As Word.Application
Dim wDoc As Word.Document
Dim Rows As Integer
Dim Content As String
Set objWord = CreateObject(Class:="Word.Application")
objWord.Visible = True
Set wDoc = objWord.Documents.Open(*insert filepath*)
objWord.Activate
wDoc.ContentControls(1).Range.Text = Worksheets("Lists").Range("A2").Value
Rows = Worksheets("Lists").Range("A3", Range("A3").End(xlDown)).Rows.Count
r = 3
For i = 1 To Rows
Content = Content & "- " & Worksheets("Lists").Cells(r, 1).Value & vbNewLine
r = r + 1
Next
wDoc.ContentControls(2).Range.Text = Content
wDoc.SaveAs (*insert filepath*)
End Sub
The approach in user's answer works if the content can 1) be concatenated in a single string and 2) none of the elements require special formatting. This would also be the fastest approach.
If for any reason this process is not possible, then the way to "append" content without replacing goes something like in the code snippet that follows.
Notice how Range and ContentControl objects are declared and instantiated, especially the Range object. This makes it much easier to pick up the "target" at a later point in the code. Also, a Range object can be collapsed (think of it like pressing the right-arrow to make a selection a blinking cursor): this makes it possible to append content and work with that new content (format it, for example). Word also has a Range.InsertAfter method which can be used if the new content does not have to be manipulated in any special way.
Dim cc as Object ' Word.ContentControl
Dim rngCC as Object 'Word.Range
Set cc = wDoc.ContentControls(1).Range
Set rngCC = cc.Range
rngCC.Text = Worksheets("Lists").Range("A2").Value
'Add something at a later point
rngCC.Collapse wdCollapseEnd
rngCC.Text = " New material at the end of the content control."

Excel VBA Web Scraping

I'm new to Excel VBA programming and need some help to debug my code for web scraping some data off of Amazon.
On my sheet1, I am listing the asins of products and trying to use this code to complete the URL to land onto the product page, then will the sales rank of the product via class ID or another html tag.
However, I keep getting the error:
User-Defined Type not Defined
I can't figure out where the bug is.
The code is listed below:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("azAsin").Row And Target.Column = Range("azAsin").Column Then
Dim IE As New InternetExplorer
IE.Visible = True
IE.Navigate "https://www.amazon.com/dp/" & Range("azAsin").Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim pap As HTMLDocument
Set pap = IE.document
Dim sDoc As String
sDoc = Doc.getElementsById("SalesRank")
MsgBox sDoc
End If
End Sub
Try this:
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
'...
Dim objRank as Object
Set objRank = ie.Document.getElementById("SalesRank")
Dim rank As String
If Not objRank Is Nothing Then rank = objRank.innerText
Here is a fully latebound version that also removes the need for a browser.
Observations on your code:
The other answer has corrected, but not mentioned, that in your existing script you attempt:
Dim sDoc As String
Doc = Doc.getElementsById("SalesRank")
getElementsById is invalid as Id should be singular and the syntax is getElementById. This will also return an object, not a string. so you would need ie.Document.getElementById("SalesRank").innerText to get a string; though it is better to set to an object first and test If Not obj Is Nothing Then, as indeed the former answerer has done.
Code:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Dim html As Object, i As Long, URL As String
URL = "https://www.amazon.com/dp/" & Range("azAsin").Value
If Target.Row = Range("azAsin").Row And Target.Column = Range("azAsin").Column Then
With CreateObject("MSXML2.serverXMLHTTP")
.Open "GET", URL, False
.send
Set html = CreateObject("HTMLFile")
html.body.innerHTML = .responseText
End With
For i = 0 To html.all.Length - 1
If html.all(i).ID = "SalesRank" Then MsgBox html.all(i).innerText
Next i
End If
End Sub

Collect images url and data from webpage table to Excel table

Need to collect data from table on a webpage, some table cell have images.
The codes is to copy the data to Excel, and if the cell has images, then get its src links instead of images. below are the codes, but it is not working, I don't know how to detect if the cell has image in it or not, and add its src links to Excel cell.
Sub extractData()
Dim IE As Object, obj As Object
Dim myYear As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myYear = InputBox("Enter year")
With IE
.Visible = True
.navigate ("url")
While IE.ReadyState <> 4
DoEvents
Wend
For Each obj In IE.Document.All.Item("Year").Options
If obj.innerText = myYear Then
obj.Selected = True
End If
Next obj
IE.Document.getElementsByName("btn_search").Item.Click
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K2000").ClearContents
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 9)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
If elemCollection(t).Rows(r).Cells(c).innerText = "" Then
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).getAttribute("src")
Exit For
End If
Next
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
First, brush up on HTML Document Object Model. There are tons of tutorials on how to use JavaScript to work with the DOM, and VBA is real similar (because the DOM doesn't change based on language and VBA is very closely related to VBScript which is similar to JavaScript). Second, if you get an error but no line is highlighted when you click the Debug button, step through your code line by line with the F8 key. This will let you figure out where the error is occurring. Third, add a reference to the Microsoft HTML Object Library so you can use Intellisense for code hints.
It's tough to give an actual solution without seeing the HTML source so instead I'll give some pointers:
Use IE.Document.Body.getElementsByTagName("TABLE") (note the addition of BODY to narrow the scope) to get a collection of every table on the page. With a reference to the HTML Objects Lib you can do something like this:
Dim oTable As HTMLTable
Dim oCell As HTMLTableCell
Dim oImg As HTMLImage
Dim strSrc As String
For Each oCell In oTable.Cells
strSrc = ""
On Error Resume Next
Set oImg = oCell.getElementsByTagName("img")
strSrc = oImg.Source
On Error GoTo 0
If strSrc <> "" Then Debug.Print strSrc
Next
This should (I did not test it) loop through every cell in a table and attempt to get an img element. If it fails, no biggie, just continue to the next cell. If you want to use late binding after you get it working, remove the HTML Obj Lib reference then simply dim everything as an object. Eg:
Dim oTable As Object 'HTMLTable
Dim oCell As Object 'HTMLTableCell
Dim oImg As Object 'HTMLImage
Dim strSrc As String

How to scrape data from the following table format VBA

I am trying to scrape all the table from start page to end contents from this Webpage
Using the code below I can scrape the table contents of page 1 but I don't know how can I modify the code to get the data from start page to end.
Option Explicit
Sub NBAStats()
Dim IE As Object, obj As Object
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheet1.Clear
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Try to find the Sitemap.xml of the website you are scraping. The sitemap.xml fill have all the links present in the webpage.
Import that xml file to your Excel Sheet, Read each link & fetch each table in it.
At first, in my opinion VBA automation of the Internet Explorer is highly instable and not really practicable in productive use-cases. This means also scraping data from web sites which are provided only for viewing within a browser is not really practicable in productive use-cases. If you are entitled to use those data then you should ask for another data source (XML or JSONfor example). If you are not entitled then you should not do that. Possible the provider of the web site does not agree with this.
To be clear, I'm talking about web sites like this, which provides it's data with JavaScript only. If the data would be within the HTML then you could get those data via XMLHTTP. This is another thing.
I will nevertheless provide a "solution". So you can't simply think "He is simply unable to do this, so he is saying you should not do that."
So you must analyze the site and pick out the elements you can click for navigation.
Option Explicit
Sub NBAStats()
Dim IE As Object
Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
Dim bReady As Boolean
Dim elementsTable As Object
Dim elementsPageNavRigth As Object
Dim elemPageNavRigth As Object
Dim elementsTableDiv As Object
ThisWorkbook.Worksheets(1).Cells.Clear
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
Do While IE.busy
DoEvents
Loop
rSheet = 0
Do
Do While elementsTableDiv Is Nothing
Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
DoEvents
Loop
Do While elementsTableDiv(0) Is Nothing
DoEvents
Loop
Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
Set elemPageNavRigth = elementsPageNavRigth(0)
If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True
'If rSheet = 0 Then rStart = 0 Else rStart = 1
Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
For r = rStart To (elementsTable(0).Rows.Length - 1)
For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
Next c
Next r
rSheet = rSheet + r
If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click
Set elementsTableDiv = Nothing
Loop Until bReady Or elemPageNavRigth Is Nothing
End With
Set IE = Nothing
End Sub

Copying from Internet Explorer text area (box) but into more than a single cell

I'm currently trying to control/automate a postcode looking website from postcodes stored and updated in Excel, and my code works perfectly up to the point it has to copy the data once it's finished. For the life of me I can't figure out how to copy the data from the text box / area into Excel without it just putting it ALL into one cell (Text to Columns doesn't really work either).
The website is : http://www.doogal.co.uk/DrivingDistances.php
Sub Geo2()
Dim sht As Worksheet
Dim IE As Object
'Dim ieDoc As HTMLDocument
Dim Item As Variant
Dim objElement As Object
Dim startLoc As String
Dim endLoc As String
Dim x As Integer
Dim objNotes As Object
Dim strNotes As String
Dim str As String
'Dim SignInButton As HTMLInputButtonElement
Set sht = ThisWorkbook.Sheets("Postcode")
Set IE = CreateObject("InternetExplorer.Application")
'Open IE
IE.Visible = True
IE.Navigate "http://www.doogal.co.uk/DrivingDistances.php"
'Wait until site is loaded
Do While IE.ReadyState <> 4 'READYSTATE_COMPLETE
DoEvents
Loop
IE.Document.getElementbyID("startLocs").Value = "dn1 5pq" 'random postcode
IE.Document.getElementbyID("endLocs").Value = "wf12 2fd" 'random postcode
IE.Document.getElementsByName("calculateFor").Item(1).Checked = True
IE.Document.getElementsByName("units").Item(1).Checked = True
IE.Document.getElementsByClassName("btn btn-primary").Item(0).Click
------
'Ive tried without having it as a object and using .value but it either comes with only the first line or the entire thing rammed into a string and is unusable
----Code here is the problem-----
***Set objNotes = IE.Document.getElementbyID("distances")
str = objNotes.Value***
---------
Do While IE.ReadyState <> 4 'READYSTATE_COMPLETE
DoEvents
Loop
End Sub
The following VBA function uses the Google Maps Directions API to calculate the driving distance in meters between two locations. The code is modified from a version submitted by barrowc on this similar question.
Make sure to add a reference in Excel to Microsoft XML, v6.0.
Function getDistance(origin As String, destination As String) As String
Dim xhrRequest As XMLHTTP60
Dim domDoc As DOMDocument60
Dim ixnlDistanceNode As IXMLDOMNode
Dim RequestString As String
Dim API_Key As String
' Insert your own Google Maps Directions API key here
API_Key = "XXXXXX"
' Read the data from the website
Set xhrRequest = New XMLHTTP60
RequestString = "https://maps.googleapis.com/maps/api/directions/xml?origin=" _
& origin & "&destination=" & destination & "&sensor=false&key=" & API_Key
xhrRequest.Open "GET", RequestString, False
xhrRequest.send
' Copy the results into a format we can manipulate with XPath
Set domDoc = New DOMDocument60
domDoc.LoadXML xhrRequest.responseText
' Select the node called value underneath the leg and distance parents.
' The distance returned is the driving distance in meters.
Set ixnlDistanceNode = domDoc.SelectSingleNode("//leg/distance/value")
getDistance = ixnlDistanceNode.Text
Set ixnlDistanceNode = Nothing
Set domDoc = Nothing
Set xhrRequest = Nothing
End Function
Please note that this code by itself violates the Terms of Use of Google's API. "The Google Maps Directions API may only be used in conjunction with displaying results on a Google map; using Directions data without displaying a map for which directions data was requested is prohibited."1
Instead of putting the data all in one string, Split the string into an array, then loop through the array like this:
Set objNotes = IE.Document.getElementbyID("distances")
Dim x as Integer
Dim aDist() as Variant
aDist = Split(objNotes.Value, vbNewLine) 'May need to be vbCr or vbLf or vbCrLf
For x = 0 to Ubound(aDist) - 1
debug.print aDist(x)
Next x