Collect images url and data from webpage table to Excel table - vba

Need to collect data from table on a webpage, some table cell have images.
The codes is to copy the data to Excel, and if the cell has images, then get its src links instead of images. below are the codes, but it is not working, I don't know how to detect if the cell has image in it or not, and add its src links to Excel cell.
Sub extractData()
Dim IE As Object, obj As Object
Dim myYear As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myYear = InputBox("Enter year")
With IE
.Visible = True
.navigate ("url")
While IE.ReadyState <> 4
DoEvents
Wend
For Each obj In IE.Document.All.Item("Year").Options
If obj.innerText = myYear Then
obj.Selected = True
End If
Next obj
IE.Document.getElementsByName("btn_search").Item.Click
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K2000").ClearContents
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 9)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
If elemCollection(t).Rows(r).Cells(c).innerText = "" Then
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).getAttribute("src")
Exit For
End If
Next
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub

First, brush up on HTML Document Object Model. There are tons of tutorials on how to use JavaScript to work with the DOM, and VBA is real similar (because the DOM doesn't change based on language and VBA is very closely related to VBScript which is similar to JavaScript). Second, if you get an error but no line is highlighted when you click the Debug button, step through your code line by line with the F8 key. This will let you figure out where the error is occurring. Third, add a reference to the Microsoft HTML Object Library so you can use Intellisense for code hints.
It's tough to give an actual solution without seeing the HTML source so instead I'll give some pointers:
Use IE.Document.Body.getElementsByTagName("TABLE") (note the addition of BODY to narrow the scope) to get a collection of every table on the page. With a reference to the HTML Objects Lib you can do something like this:
Dim oTable As HTMLTable
Dim oCell As HTMLTableCell
Dim oImg As HTMLImage
Dim strSrc As String
For Each oCell In oTable.Cells
strSrc = ""
On Error Resume Next
Set oImg = oCell.getElementsByTagName("img")
strSrc = oImg.Source
On Error GoTo 0
If strSrc <> "" Then Debug.Print strSrc
Next
This should (I did not test it) loop through every cell in a table and attempt to get an img element. If it fails, no biggie, just continue to the next cell. If you want to use late binding after you get it working, remove the HTML Obj Lib reference then simply dim everything as an object. Eg:
Dim oTable As Object 'HTMLTable
Dim oCell As Object 'HTMLTableCell
Dim oImg As Object 'HTMLImage
Dim strSrc As String

Related

VBA objShell.windows.title not finding title

I run this function with two IE browsers open. IE_count finds six objects, but it does not find any titles (my_title) within the for loop for objShell. They all return an empty string.
Any idea as to why this could be? Relevant code below:
' code below adapted from ron's answer here: https://stackoverflow.com/questions/21407340/how-to-read-text-from-an-already-open-webpage-using-vba
Function SecondBrowserSearchForAndClick(ElementID As String, searchFor As String)
Dim objShell
Set objShell = CreateObject("Shell.Application")
Dim IE_count As Integer
IE_count = objShell.Windows.Count
Dim x As Integer
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
Dim my_url As String
my_url = objShell.Windows(x).document.Location
Dim my_title As String
my_title = objShell.Windows(x).document.Title
If my_title Like "*Select Process*" Then 'compare to find if the desired web page is already open
Dim tagColl_TR As Object
Set tagColl_TR = objShell.Windows(x).document.getElementById(ElementID).contentDocument.getElementsByTagName("tr")
Dim f
While f < tagColl_TR.Length
If tagColl_TR(f).Children.Length = 5 Then
If tagColl_TR(f).Children(3).Children(0).innerText Like "*" & searchFor & "*" Then
tagColl_TR(f).Children(1).Children(0).Children(1).Focus
tagColl_TR(f).Children(1).Children(0).Children(1).Click
Exit Function
End If
End If
f = f + 1
Wend
End If
Next
End Function
Any help would be appreciated.
It's easier to put the "find document by title" functionality in its own function:
Sub Tester()
Dim doc As Object
Set doc = IEDocumentByTitle("Google")
If Not doc Is Nothing Then
Debug.Print "Found window at: " & doc.Location
'work on doc here
End If
End Sub
'Return an open IE document based on its Title property
Function IEDocumentByTitle(title As String)
Dim w As Object, ttl As String
For Each w In CreateObject("Shell.Application").Windows
If w.Application.Name = "Internet Explorer" Then 'filter out Windows Explorer
ttl = ""
On Error Resume Next
ttl = w.document.title
On Error GoTo 0
If ttl Like title Then
Set IEDocumentByTitle = w.document
Exit Function
End If
End If
Next w
End Function
This works fine for me.
BTW the shell Windows collection also includes Windows Explorer instances in addition to IE windows/tabs.
Also you should really cancel On Error Resume Next as soon as possible or it will silently swallow all errors in your code, possibly leading to unexpected results.

VBA : Finding nested elements in HTML Document

I am trying to scrape all the href links of products from this link.
I am using the following code to get the product links on the page:
Sub urlCatch()
Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link, itm As Object
Dim url As String
Dim X As Variant
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
url = "http://www.dell.com/vg/p/desktops.aspx?c=vg&cs=vgdhs1&l=en&s=dhs&~ck=mn"
internet.Navigate url
Do Until internet.ReadyState >= 4
DoEvents
Loop
Set internetdata = internet.document
Set div_result = internetdata.getelementsbyclassname("categorySubNavigation").getelementsbyclassname("c4 seriesOptions")
Set header_links = div_result.getelementsbytagname("a")
For Each itm In header_links
Set link = itm.ChildNodes.Item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next
End Sub
I am getting an error at div_reult: "object doesn't support this property or method".
This is the first time I am using an element to retrieve something, so maybe I might be making mistakes which I am currently unaware off.
Please look into my code, and let me know what blunder I am making here.
I am told by one of my friends that I might need to use a regex to get all these links, but I wanted to get a hold of this method first.
Please give me some guidance. Thanks.
GetElementsByTagName or other fetch methods work on single element and not over a collection. You need another loop.
Sub urlCatch()
Dim url As String
Dim internet As Object
Dim internetdata
Dim div_result
Dim links
Dim itm
Dim itm2
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
url = "http://www.dell.com/vg/p/desktops.aspx?c=vg&cs=vgdhs1&l=en&s=dhs&~ck=mn"
internet.Navigate url
Do
DoEvents
Loop Until internet.ReadyState >= 4 And Not internet.busy
Set internetdata = internet.document.body
Set div_result = internetdata.getelementsbyclassname("c4 seriesOptions")
For Each itm In div_result
Set links = itm.getElementsByTagName("A")
For Each itm2 In links
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = itm2.href
Next
Next
End Sub

How to scrape data from the following table format VBA

I am trying to scrape all the table from start page to end contents from this Webpage
Using the code below I can scrape the table contents of page 1 but I don't know how can I modify the code to get the data from start page to end.
Option Explicit
Sub NBAStats()
Dim IE As Object, obj As Object
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheet1.Clear
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Try to find the Sitemap.xml of the website you are scraping. The sitemap.xml fill have all the links present in the webpage.
Import that xml file to your Excel Sheet, Read each link & fetch each table in it.
At first, in my opinion VBA automation of the Internet Explorer is highly instable and not really practicable in productive use-cases. This means also scraping data from web sites which are provided only for viewing within a browser is not really practicable in productive use-cases. If you are entitled to use those data then you should ask for another data source (XML or JSONfor example). If you are not entitled then you should not do that. Possible the provider of the web site does not agree with this.
To be clear, I'm talking about web sites like this, which provides it's data with JavaScript only. If the data would be within the HTML then you could get those data via XMLHTTP. This is another thing.
I will nevertheless provide a "solution". So you can't simply think "He is simply unable to do this, so he is saying you should not do that."
So you must analyze the site and pick out the elements you can click for navigation.
Option Explicit
Sub NBAStats()
Dim IE As Object
Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
Dim bReady As Boolean
Dim elementsTable As Object
Dim elementsPageNavRigth As Object
Dim elemPageNavRigth As Object
Dim elementsTableDiv As Object
ThisWorkbook.Worksheets(1).Cells.Clear
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
Do While IE.busy
DoEvents
Loop
rSheet = 0
Do
Do While elementsTableDiv Is Nothing
Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
DoEvents
Loop
Do While elementsTableDiv(0) Is Nothing
DoEvents
Loop
Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
Set elemPageNavRigth = elementsPageNavRigth(0)
If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True
'If rSheet = 0 Then rStart = 0 Else rStart = 1
Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
For r = rStart To (elementsTable(0).Rows.Length - 1)
For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
Next c
Next r
rSheet = rSheet + r
If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click
Set elementsTableDiv = Nothing
Loop Until bReady Or elemPageNavRigth Is Nothing
End With
Set IE = Nothing
End Sub

Copying from Internet Explorer text area (box) but into more than a single cell

I'm currently trying to control/automate a postcode looking website from postcodes stored and updated in Excel, and my code works perfectly up to the point it has to copy the data once it's finished. For the life of me I can't figure out how to copy the data from the text box / area into Excel without it just putting it ALL into one cell (Text to Columns doesn't really work either).
The website is : http://www.doogal.co.uk/DrivingDistances.php
Sub Geo2()
Dim sht As Worksheet
Dim IE As Object
'Dim ieDoc As HTMLDocument
Dim Item As Variant
Dim objElement As Object
Dim startLoc As String
Dim endLoc As String
Dim x As Integer
Dim objNotes As Object
Dim strNotes As String
Dim str As String
'Dim SignInButton As HTMLInputButtonElement
Set sht = ThisWorkbook.Sheets("Postcode")
Set IE = CreateObject("InternetExplorer.Application")
'Open IE
IE.Visible = True
IE.Navigate "http://www.doogal.co.uk/DrivingDistances.php"
'Wait until site is loaded
Do While IE.ReadyState <> 4 'READYSTATE_COMPLETE
DoEvents
Loop
IE.Document.getElementbyID("startLocs").Value = "dn1 5pq" 'random postcode
IE.Document.getElementbyID("endLocs").Value = "wf12 2fd" 'random postcode
IE.Document.getElementsByName("calculateFor").Item(1).Checked = True
IE.Document.getElementsByName("units").Item(1).Checked = True
IE.Document.getElementsByClassName("btn btn-primary").Item(0).Click
------
'Ive tried without having it as a object and using .value but it either comes with only the first line or the entire thing rammed into a string and is unusable
----Code here is the problem-----
***Set objNotes = IE.Document.getElementbyID("distances")
str = objNotes.Value***
---------
Do While IE.ReadyState <> 4 'READYSTATE_COMPLETE
DoEvents
Loop
End Sub
The following VBA function uses the Google Maps Directions API to calculate the driving distance in meters between two locations. The code is modified from a version submitted by barrowc on this similar question.
Make sure to add a reference in Excel to Microsoft XML, v6.0.
Function getDistance(origin As String, destination As String) As String
Dim xhrRequest As XMLHTTP60
Dim domDoc As DOMDocument60
Dim ixnlDistanceNode As IXMLDOMNode
Dim RequestString As String
Dim API_Key As String
' Insert your own Google Maps Directions API key here
API_Key = "XXXXXX"
' Read the data from the website
Set xhrRequest = New XMLHTTP60
RequestString = "https://maps.googleapis.com/maps/api/directions/xml?origin=" _
& origin & "&destination=" & destination & "&sensor=false&key=" & API_Key
xhrRequest.Open "GET", RequestString, False
xhrRequest.send
' Copy the results into a format we can manipulate with XPath
Set domDoc = New DOMDocument60
domDoc.LoadXML xhrRequest.responseText
' Select the node called value underneath the leg and distance parents.
' The distance returned is the driving distance in meters.
Set ixnlDistanceNode = domDoc.SelectSingleNode("//leg/distance/value")
getDistance = ixnlDistanceNode.Text
Set ixnlDistanceNode = Nothing
Set domDoc = Nothing
Set xhrRequest = Nothing
End Function
Please note that this code by itself violates the Terms of Use of Google's API. "The Google Maps Directions API may only be used in conjunction with displaying results on a Google map; using Directions data without displaying a map for which directions data was requested is prohibited."1
Instead of putting the data all in one string, Split the string into an array, then loop through the array like this:
Set objNotes = IE.Document.getElementbyID("distances")
Dim x as Integer
Dim aDist() as Variant
aDist = Split(objNotes.Value, vbNewLine) 'May need to be vbCr or vbLf or vbCrLf
For x = 0 to Ubound(aDist) - 1
debug.print aDist(x)
Next x

Excel vba macro to connect to a particular webpage - search and retrieve data

I have column, say column A containing 1500 rows each having a string (Hexadecimal encodes). What I need is connect to a particular website search paste the string, press on decode, copy the result and paste it back to column B.
Any help would be of great help. I am new here.
Example:
String in Column A: 5468616e6b732061206c6f7420696e20616476616e6365
Website to search in: http://encodertool.com/hexadecimal
Copy from excel cell and paste in tab (under heading): ENTER AN Hexadecimal CONTENT TO DECODE
Then hit DECODE
Then Copy from DECODING RESULT
Finally paste back in ColumnB in my excel sheet.
Looking forward for an answer.
Thanks a million in advance.
Are you doing this as an exercise in automating the browser? Seems like you could more easily do it directly in VBA
From: http://bytes.com/topic/access/answers/874752-convert-hex-string
Sub tester()
Debug.Print fConvertHexToString( _
"5468616e6b732061206c6f7420696e20616476616e6365")
End Sub
Public Function fConvertHexToString(strHexString As String) As String
Dim intLenOfString As Integer
Dim intCounter As Integer
Dim strBuild As String
'Hex String must have a valid length, and it must be an even length
If Len(strHexString) = 0 Or Len(strHexString) Mod 2 <> 0 Then Exit Function
intLenOfString = Len(strHexString)
For intCounter = 1 To Len(strHexString)
If intCounter Mod 2 <> 0 Then 'need Hex pairs
'Retrieve the Value of the Hex Pair, then Convert to a Character,
'then Append to a Base String
strBuild = strBuild & Chr$(Val("&H" & Mid$(strHexString, intCounter, 2)))
End If
Next
fConvertHexToString = strBuild
End Function
Something like this. I have just run a mock test and it works. Give it a try. You can modify the code to your needs. This is a plain code. Code can be enhanced as well. But this does what you ask for
Dim ie As InternetExplorer
Dim doc As HTMLDocument
Sub start()
Dim ran As Range
Dim cel As Excel.Range
Set ran = Worksheets("Sheet1").Range("A1:A4") 'Change Your input range here
For Each cel In ran
If cel.Value <> Empty Then
Set ie = New InternetExplorerMedium 'open iE
ie.navigate ("http://encodertool.com/hexadecimal") 'Navigate to IE
ie.Visible = True
'Wait untill IE is loaded
Do
' Wait till the Browser is loaded
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
doc.getElementById("input_4").innerText = cel.Value ' Enter input value
test ' Click button
cel.Offset(0, 1).Value = doc.getElementById("output_4").innerText ' save Output value
End If
ie.Quit
Next cel
End Sub
'Click the Decode button
Sub test()
Set cl_button= doc.getElementsByTagName("a")
For Each one In cl_button
If one.getAttribute("onclick") = "ajaxfct('fcts.php','4')" Then
one.Click
Exit For
End If
Next one
End Sub
Before running the code, add reference to HTML object library & Internet controls. Also change the range of your input. I have set it to A1:A4 . Change to it whatever. Make sure there are no blank cells in the Range. ALso, If you dont want the browser to be displayed set
ie.visible = false
This is one way of doing it. THere are many simpler and effective ways of doing it