How to automate a dynamically changing web page using Excel VBA? - vba

I have been trying to automate a web page since two weeks but I could not proceed further after 3rd page.
First I'm logging into login page by giving credentials and then I would click a link from 2nd page. Until this point I'm fine; but after that again I need to click another link from the 3rd page that I'm not able to, Even I was not able to read the proper innerhtml of that particular page. The innerhtmal varies from the source code of that page. Using the source code I have taken the id/name to get the element but no use. The problem I'm seeing is the DOCUMENT object is not taking the inner details of 3rd page. When I tried to print the links of that page it printed me some common links in that page which would be available in all the pages instead of printing all the links in that particular page. I guess this might happen because the page frame varies with respect to the FromDate & ToDate. Pardon me if I'm wrong. Do we need to change every time the "ie.document" object with respect to the navigation of web page? Because I think it sticks with the same when the page loaded 1st time.
Below is my code:
Public Sub Test ()
Dim shellWins As ShellWindows
Dim ie As InternetExplorer
Dim doc As HTMLDocument
Dim frm As HTMLFrameElement
Dim frms As HTMLElementCollection
Dim strSQL As String
Dim Login As Boolean
strSQL = "https://website.com"
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate strSQL
Do Until .ReadyState = 4: DoEvents: Loop
Set doc = ie.document
Dim link As Object
For Each link In doc.Links
'Debug.Print link.innerText
If link.innerText = "Click Here" Then
link.Click
Exit For
End If
Next link
Do While ie.Busy: DoEvents: Loop
Login_Pane:
For Each link In doc.Links
If link.innerText = "Leave & Attendance" Then
'Debug.Print doc.body.innerHTML
link.Click
Login = True
Exit For
End If
Next link
If Login <> True Then
Application.Wait (Now + TimeValue("00:00:02"))
Application.SendKeys "<USERNAME>", True
Application.SendKeys "{TAB}"
Application.Wait (Now + TimeValue("00:00:02"))
Application.SendKeys "<PASSWORD>", True
Application.SendKeys "{ENTER}"
GoTo Login_Pane
End If
Do While ie.Busy: DoEvents: Loop
Dim link As Object
For Each link In doc.Links
Debug.Print link.innerText
' Above line code should print all the links in that page_
_but unfortunatly it is not displaying as it is in the source code.
' instead printing half of the links which are commonly_ _available in all pages.
' This page has three frames
Next link
End With
'IE.Quit
End Sub
i'm unable to post the image of that page to make you understand more, Anyways i'll try my best.
when i use this below code i can only able to get the links from the upper portion of the page.
Set doc = ie.document
Dim text As Object
For Each text In doc.Links
Debug.Print text.innerText
Next text
Below to that portion of the page i have option to enter FromDate & ToDate, by giving dates to this textboxes i'll be able to see the details according to the dates (by default page displayes the details from 1st of the curent month to the current date of the month).
So, here i'm not getting the links/or other details. And i think the details of this sections are not stored in the ie.document object.
And this particular section alone has different URL from the main page.
Thanks.

A couple of thoughts:
For a page that dynamically loads you need to use Application.Wait (5 seconds or so) instead of Do Until .ReadyState = 4: DoEvents: Loop. The latter does not work if you have javascript being executed.
Using SendKeys should always be avoided as it is not robust.Inspect the element with a DOM explorer to get the ID or name.

Related

How to access HTML elements in ASPX menu page?

I am trying to submit form details. I am unable to share HTML details so tried to explain below.
I have a menu control page https://www.abcmenu.aspx.
This page calls the url https://www.abc-employee.aspx using javascript:void(0), which bring a form in same page when I click on employee menu item.
However, the page does not refresh nor does another page load, and the URL in the address bar remains unchanged.
Here is a sample view of the website:
I need to fill the form details and hit submit button.
The below code gives run time error stating object required.
Set htmldoc = ie.document
Dim emp as mshtml.ihtmlinputelement
Set emp = htmldoc.getelementbyid("fld_emp")
emp.value = 357690
Dim subm as mshtml.ihtmlelememt
Set subm = htmldoc.getelementbyid("btnk_sub")
Subm.click
I tried to debug.print all elements under form tag, but it does not return the elements in the form.
When I execute the code, it returns only the main menu page details and not form elements.
Here is the code I tried to print HTML elements
Dim htmla as mshtml.ihtmlelement
Dim htmlas as mshtml.ihtmlelementcollection
For each htmla in htmlas
Debug.print htmla.innertext
Next htmla
Why am I not able to access HTML elements inside form that was opened in the main menu page?
If you are trying to access iframe elements on the ASP.Net web page using VBA then you can refer to the example below may help you to solve your issue.
Sub demo()
Dim URL As String
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "https://example.com"
IE.navigate URL
Do While IE.readyState = 4: DoEvents: Loop
Do Until IE.readyState = 4: DoEvents: Loop
Dim elemCollection As IHTMLElementCollection
Debug.Print (IE.document.getElementsByTagName("iframe")(0).contentDocument.getElementsByName("fname")(0).Value)
Set IE = Nothing
End Sub
Output:

Can't find any logic how the faulty script works flawlessly?

I've written a script in vba using IE to get the titles of different hotel names from a webpage. The hotel names traverse multiple pages through pagination.
My scraper can keep clicking on the next button successfully while parsing the titles from each page until ther is no more click left to perform. The parser is doing is job just perfect. All I wish to know is a simple logic I've asked below.
My question: How the content of each page is rightly coming through even when I didn't use this Set Htmldoc = IE.document line just after the .click? When a click is initiated, the scraper goes to a new page with new content. How come it gets updated with new content from each page as my defined do loop comes after with IE block?
This is the script:
Sub GetTitles()
Const Url As String = "https://www.tripadvisor.com/Hotels-g147237-Caribbean-Hotels.html"
Dim IE As New InternetExplorer, Htmldoc As HTMLDocument, post As Object, R&
With IE
.Visible = True
.navigate Url
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set Htmldoc = .document
End With
Do
For Each post In Htmldoc.getElementsByClassName("listing") ''how this "Htmldoc" gets updated
With post.getElementsByClassName("property_title")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
End With
Next post
If Not Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']") Is Nothing Then
Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']").Click
Application.Wait Now + TimeValue("00:00:05")
''I didn't use anything like "Set Htmldoc = IE.document" but it still works flawlessly
Else:
Exit Do
End If
Loop
IE.Quit
End Sub
The script is not faulty. Though, you are using it without fully understanding is certainly troublesome.
When you do this Set Htmldoc = .document you are setting the IE's document for later use.
When you do this Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']").Click javascript comes in play and updates the content of the page (i.e document).
You may believe that the document has changed but its only being updated. In reality,there is no navigation happening at all.
Add the following and see how the page/document remains the same, just the content changes.
'/ Url before Next button click
Debug.Print "Before Click " & Htmldoc.Url
Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']").Click
'/ Url after Next button click
Debug.Print "After Click " & Htmldoc.Url
Since the document, once set remains the same and the updated content has same layout/DOM (that is how mostly programmers code, most likely all the pages are being rendered using a template) hence your code works perfectly fine. Net to net for your do loop, nothing changed.
Set Htmldoc = .document
gets a pointer to the DOM. When it changes the Htmldoc is pointing at the new content. No need to do a new Set Htmldoc

Can't click on some dots to scrape information

I've written a script in vba in combination with IE to click on some dots available on a map in a web page. When a dot is clicked, a small box containing relevant information pops up.
Link to that website
I would like to parse the content of each box. The content of that box can be found using class name contentPane. However, the main concern here is to generate each box by clicking on those dots. When a box shows up, it looks how you can see in the below image.
This is the script I've tried so far:
Sub HitDotOnAMap()
Const Url As String = "https://www.arcgis.com/apps/Embed/index.html?webmap=4712740e6d6747d18cffc6a5fa5988f8&extent=-141.1354,10.7295,-49.7292,57.6712&zoom=true&scale=true&search=true&searchextent=true&details=true&legend=true&active_panel=details&basemap_gallery=true&disable_scroll=true&theme=light"
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim post As Object, I&
With IE
.Visible = True
.navigate Url
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
Application.Wait Now + TimeValue("00:0:07") ''the following line zooms in the slider
HTML.querySelector("#mapDiv_zoom_slider .esriSimpleSliderIncrementButton").Click
Application.Wait Now + TimeValue("00:0:04")
With HTML.querySelectorAll("[id^='NWQMC_VM_directory_'] circle")
For I = 0 To .Length - 1
.item(I).Focus
.item(I).Click
Application.Wait Now + TimeValue("00:0:03")
Set post = HTML.querySelector(".contentPane")
Debug.Print post.innerText
HTML.querySelector("[class$='close']").Click
Next I
End With
End Sub
when I execute the above script, it looks like it is running smoothly but nothing happens (I meant, no clicking) and it doesn't throw any error either. Finally it quits the browser gracefully.
This is how a box with information looks like when a dot gets clicked.
Although I've used hardcoded delay within my script, they can be fixed later as soon as the macro starts working.
Question: How can I click each of the dots on that map and collect the relevant information from the popped-up box? I only expect to have any solution using Internet Explorer
The data are not the main concern here. I would like to know how IE work in such cases so that I can deal with them in future cases. Any solution other than IE is not I'm looking for.
No need to click on each dots. Json file has all the details and you can extract as per your requirement.
Installation of JsonConverter
Download the latest release
Import JsonConverter.bas into your project (Open VBA Editor, Alt + F11; File > Import File)
Add Dictionary reference/class
For Windows-only, include a reference to "Microsoft Scripting Runtime"
For Windows and Mac, include VBA-Dictionary
References to be added
Download the sample file here.
Code:
Sub HitDotOnAMap()
Const Url As String = "https://www.arcgis.com/sharing/rest/content/items/4712740e6d6747d18cffc6a5fa5988f8/data?f=json"
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim post As Object, I&
Dim data As String, colObj As Object
With IE
.Visible = True
.navigate Url
While .Busy = True Or .readyState < 4: DoEvents: Wend
data = .document.body.innerHTML
data = Replace(Replace(data, "<pre>", ""), "</pre>", "")
End With
Dim JSON As Object
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("operationalLayers")(1)("featureCollection")("layers")(1)("featureSet")
For Each Item In colObj("features")
For j = 1 To Item("attributes").Count - 1
Debug.Print Item("attributes").Keys()(j), Item("attributes").Items()(j)
Next
Next
End Sub
Output

VBA: New (or Redefined?) Internet Explorer Object In Same Window

I'm creating a macro that will navigate to a login page, log in, navigate to another page and scrape data, and then loop through 100-200 more pages scraping data from each.
So far I've gotten it to the point of logging in, navigating to the second page, and scraping the first bit of data. But so far the only way I can get it to work is if the second page opens in a new window. Since I ultimately have to go through 100-200 pages, I'd rather not use a new window for each one.
For this example let's just say that the only data I'm trying to scrape is the page title.
Option Explicit
Sub admin_scraper()
Dim ie As Object
Dim doc As Object
' Get through log in page
Set ie = CreateObject("internetexplorer.application")
With ie
.navigate "http://example.com/login" 'Page title is "Page 1"
.Visible = True
End With
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
ie.document.forms(0).all("Username").Value = "user"
ie.document.forms(0).all("Password").Value = "abc123"
ie.document.forms(0).submit
'Navigate to second page and pull page title
Set ie = CreateObject("internetexplorer.application") '***Line in question
With ie
.navigate "http://example.com/Products" 'Page title is "Page 2"
.Visible = True
End With
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
Set doc = ie.document
Debug.Print doc.Title
End Sub
*** If I include this line the code works as expected (console prints "Page 2"), but it opens the second page in a new window. If I don't include this line, the second page opens smoothly in the same window, but the console prints "Page 1."
Any way I can get it to open each new page in the same window while making sure it pulls data from the new page? Or if it has to be in a new window, any way to automatically close the old window each time?

VBA skipping code directly after submitting form in IE

Currently I have 2 pieces of code that work separately, but when used together they don't work properly.
The first code asks the user to input information which is stored. It then navigates to the correct webpage where it uses the stored user input information to navigate via filling and submitting a form. It arrives at the correct place.
The second code uses a specific URL via ie.navigate "insert url here" to navigate to the same place as the first code. It then scrapes URL data and stores it in a newly created sheet. It does this correctly.
When merging them I replace the navigation segment from the second code with the first code, but then it only stores the first 5 of 60 URLs as if it hadn't fully loaded the page before scraping data. It seems to skip the code directly after ie.document.forms(0).submit which is supposed to wait for the page to load before moving on to the scraping..
extra info: the button wasn't defined so I cannot just click it so I had to use ie.document.forms(0).submit
Summary of what I want the code to do:
request user input
store user input
open ie
navigate to page
enter user input into search field
select correct search category from listbox
submit form
'problem happens here
scrape url data
store url data in specific excel worksheet
The merged code:
Sub extractTablesData()
Dim ie As Object, obj As Object
Dim Var_input As String
Dim elemCollection As Object
Dim html As HTMLDocument
Dim Link As Object
Dim erow As Long
' create new sheet to store info
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
Set ie = CreateObject("InternetExplorer.Application")
Var_input = InputBox("Enter info")
With ie
.Visible = True
.navigate ("URL to the webpage")
While ie.readyState <> 4
DoEvents
Wend
'Input Term 1 into input box
ie.document.getElementById("trm1").Value = Var_input
'accessing the Field 1 ListBox
For Each obj In ie.document.all.Item("FIELD1").Options
If obj.Value = "value in listbox" Then
obj.Selected = True
End If
Next obj
' button undefined - using this to submit form
ie.document.forms(0).submit
'----------------------------------------------------------------
'seems to skip this part all together when merged
'Wait until IE is done loading page
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to website…"
DoEvents
Loop
'----------------------------------------------------------------
Set html = ie.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("HL").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Application.StatusBar = “”
Application.ScreenUpdating = True
End With
End Sub
I've been stuck for quite some time on this and haven't found any solutions on my own so I'm reaching out. Any help will be greatly appreciated!
You mentioned you think the website might not be fully loaded. This is a common problem because of the more dynamic elements on a webpage. The easiest way to handle this is to insert the line:
Application.Wait Now + Timevalue("00:00:02")
This will force the code to pause for an additional 2 seconds. Insert this line below the code which waits for the page to load and this will give Internet Explorer a chance to catch back up. Depending on the website and the reliability of your connection to it I recommend adjusting this value anywhere up to about 5 seconds.
Most websites seem to require additional waiting like this, so handy code to remember when things don't work as expected. Hope this helps.
I solved this by using a completely different method. I used a query table with strings to go where I wanted.
Sub ExtractTableData()
Dim This_input As String
Const prefix As String = "Beginning of url"
Const postfix As String = "end of url"
Dim qt As QueryTable
Dim ws As Worksheet
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
This_input = InputBox("enter key info to go to specific url")
Set ws = ActiveSheet
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & prefix & This_input & postfix, _
Destination:=Worksheets("HL").Range("A1"))
qt.RefreshOnFileOpen = True
qt.WebSelectionType = xlSpecifiedTables
'qt.webtables is key to getting the specific table on the page
qt.WebTables = 2
qt.Refresh BackgroundQuery:=False
End Sub