VBA: New (or Redefined?) Internet Explorer Object In Same Window - vba

I'm creating a macro that will navigate to a login page, log in, navigate to another page and scrape data, and then loop through 100-200 more pages scraping data from each.
So far I've gotten it to the point of logging in, navigating to the second page, and scraping the first bit of data. But so far the only way I can get it to work is if the second page opens in a new window. Since I ultimately have to go through 100-200 pages, I'd rather not use a new window for each one.
For this example let's just say that the only data I'm trying to scrape is the page title.
Option Explicit
Sub admin_scraper()
Dim ie As Object
Dim doc As Object
' Get through log in page
Set ie = CreateObject("internetexplorer.application")
With ie
.navigate "http://example.com/login" 'Page title is "Page 1"
.Visible = True
End With
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
ie.document.forms(0).all("Username").Value = "user"
ie.document.forms(0).all("Password").Value = "abc123"
ie.document.forms(0).submit
'Navigate to second page and pull page title
Set ie = CreateObject("internetexplorer.application") '***Line in question
With ie
.navigate "http://example.com/Products" 'Page title is "Page 2"
.Visible = True
End With
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
Set doc = ie.document
Debug.Print doc.Title
End Sub
*** If I include this line the code works as expected (console prints "Page 2"), but it opens the second page in a new window. If I don't include this line, the second page opens smoothly in the same window, but the console prints "Page 1."
Any way I can get it to open each new page in the same window while making sure it pulls data from the new page? Or if it has to be in a new window, any way to automatically close the old window each time?

Related

Can't find any logic how the faulty script works flawlessly?

I've written a script in vba using IE to get the titles of different hotel names from a webpage. The hotel names traverse multiple pages through pagination.
My scraper can keep clicking on the next button successfully while parsing the titles from each page until ther is no more click left to perform. The parser is doing is job just perfect. All I wish to know is a simple logic I've asked below.
My question: How the content of each page is rightly coming through even when I didn't use this Set Htmldoc = IE.document line just after the .click? When a click is initiated, the scraper goes to a new page with new content. How come it gets updated with new content from each page as my defined do loop comes after with IE block?
This is the script:
Sub GetTitles()
Const Url As String = "https://www.tripadvisor.com/Hotels-g147237-Caribbean-Hotels.html"
Dim IE As New InternetExplorer, Htmldoc As HTMLDocument, post As Object, R&
With IE
.Visible = True
.navigate Url
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set Htmldoc = .document
End With
Do
For Each post In Htmldoc.getElementsByClassName("listing") ''how this "Htmldoc" gets updated
With post.getElementsByClassName("property_title")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).innerText
End With
Next post
If Not Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']") Is Nothing Then
Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']").Click
Application.Wait Now + TimeValue("00:00:05")
''I didn't use anything like "Set Htmldoc = IE.document" but it still works flawlessly
Else:
Exit Do
End If
Loop
IE.Quit
End Sub
The script is not faulty. Though, you are using it without fully understanding is certainly troublesome.
When you do this Set Htmldoc = .document you are setting the IE's document for later use.
When you do this Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']").Click javascript comes in play and updates the content of the page (i.e document).
You may believe that the document has changed but its only being updated. In reality,there is no navigation happening at all.
Add the following and see how the page/document remains the same, just the content changes.
'/ Url before Next button click
Debug.Print "Before Click " & Htmldoc.Url
Htmldoc.querySelector(".standard_pagination span[onclick*='pagination_next']").Click
'/ Url after Next button click
Debug.Print "After Click " & Htmldoc.Url
Since the document, once set remains the same and the updated content has same layout/DOM (that is how mostly programmers code, most likely all the pages are being rendered using a template) hence your code works perfectly fine. Net to net for your do loop, nothing changed.
Set Htmldoc = .document
gets a pointer to the DOM. When it changes the Htmldoc is pointing at the new content. No need to do a new Set Htmldoc

How to pause Excel VBA and wait for user interaction before loop

I have a VERY basic script I am using with an excel spreadsheet to populate a form. It inserts data into the fields then waits for me to click the submit button on each page. Then it waits for the page to load and fills in the next set of fields. I click submit, next page loads, etc. Finally, on the last two pages I have to do some manual input that I can't do with the spreadsheet. Code thus far is shown below.
My question is, at the end of what I have there now, how can I make the system wait for me to fill out that last page, then once I submit it realize that it has been submitted and loop back to the beginning, incrementing the row on the spreadsheet so that we can start over and do the whole thing again for the next student?
As you will probably be able to tell I am not a programmer, just a music teacher who does not savor the idea of filling out these forms manually for all 200 of my students and got the majority of the code you see from a tutorial.
Function FillInternetForm()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
'create new instance of IE. use reference to return current open IE if
'you want to use open IE window. Easiest way I know of is via title bar.
IE.Navigate "https://account.makemusic.com/Account/Create/?ReturnUrl=/OpenId/VerifyGradebookRequest"
'go to web page listed inside quotes
IE.Visible = True
While IE.busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.All("BirthMonth").Value = "1"
IE.Document.All("BirthYear").Value = "2000"
IE.Document.All("Email").Value = ThisWorkbook.Sheets("queryRNstudents").Range("f2")
IE.Document.All("Password").Value = ThisWorkbook.Sheets("queryRNstudents").Range("e2")
IE.Document.All("PasswordConfirm").Value = ThisWorkbook.Sheets("queryRNstudents").Range("e2")
IE.Document.All("Country").Value = "USA"
IE.Document.All("responseButtonsDiv").Click
newHour = Hour(Now())
newMinute = Minute(Now())
newSecond = Second(Now()) + 3
waitTime = TimeSerial(newHour, newMinute, newSecond)
Application.Wait waitTime
IE.Document.All("FirstName").Value = ThisWorkbook.Sheets("queryRNstudents").Range("a2")
IE.Document.All("LastName").Value = ThisWorkbook.Sheets("queryRNstudents").Range("b2")
IE.Document.All("Address1").Value = "123 Nowhere St"
IE.Document.All("City").Value = "Des Moines"
IE.Document.All("StateProvince").Value = "IA"
IE.Document.All("ZipPostalCode").Value = "50318"
End Function
I would use the events of the IE, more specifically the form, something like this, using MSHTML Controls library.
Private WithEvents IEForm As MSHTML.HTMLFormElement
Public Sub InternetExplorerTest()
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
Set ie = New SHDocVw.InternetExplorer
ie.Visible = 1
ie.navigate "http://stackoverflow.com/questions/tagged/vba"
While ie.readyState <> READYSTATE_COMPLETE Or ie.Busy
DoEvents
Wend
Set doc = ie.document
Set IEForm = doc.forms(0)
End Sub
Private Function IEForm_onsubmit() As Boolean
MsgBox "Form Submitted"
End Function

Interacting with IE tabs once opened

What I'm trying to do here is automate the opening of several helpdesk tickets simultaneously in separate IE tabs. I create a list of ticket numbers in Excel and then loop through the ticket numbers, opening each one.
My code seems to work fine when I open each one in a separate IE instance, but since I've tried to open them in separate tabs of one IE instance, I get an error on the second loop. Here is what I have so far:
Set Tickets = Sheet5.Range("a1", Range("a1").End(xlDown))
Set ie = New InternetExplorerMedium
ie.Visible = 1
apiShowWindow ie.hwnd, SW_MAXIMIZE
For Each Ticket In Tickets
If Ticket <> "" And Not Ticket Like "IM*" And Not Ticket Like "ARS*" And Not Ticket Like "C*" Then
'Load Mantis Page
If Tabbed = False Then
ie.Navigate "http://URL"
Else:
ie.Navigate "http://URL", CLng(2048)
End If
Do
DoEvents
Loop Until ie.ReadyState = 4
'LoginCheck
Set LoginExists = ie.document.getElementById("username")
If LoginExists Is Nothing Then
GoTo SearchForTicket
Else: GoTo Login
End If
Login:
Call ie.document.getElementById("username").SetAttribute("value", "xx")
Call ie.document.getElementById("password").SetAttribute("value", "xx")
ie.document.getElementById("login_form").Submit
Do
DoEvents
Loop Until ie.ReadyState = 3
GoTo SearchForTicket
'Search for Mantis ticket
SearchForTicket:
Application.Wait (Now + TimeValue("0:00:03"))
ie.document.All("bug_id").Value = Ticket
Set AllButtons = ie.document.getElementsByTagName("input")
For Each Button In AllButtons
If Button.Value = "Jump" Then
Button.Click
Exit For
End If
Next
End If
Tabbed = True
Next
It works the first time around, and opens IE, navigates to the page and searches for the ticket. The second time around, it opens the new tab and navigates to the page, but when it tries to search for the second ticket, I get an error saying:
Object doesn't support this property or method"
On line:
ie.document.All("bug_id").Value = Ticket
I've been searching for an answer with no luck so far. Any help would be appreciated.

How to automate a dynamically changing web page using Excel VBA?

I have been trying to automate a web page since two weeks but I could not proceed further after 3rd page.
First I'm logging into login page by giving credentials and then I would click a link from 2nd page. Until this point I'm fine; but after that again I need to click another link from the 3rd page that I'm not able to, Even I was not able to read the proper innerhtml of that particular page. The innerhtmal varies from the source code of that page. Using the source code I have taken the id/name to get the element but no use. The problem I'm seeing is the DOCUMENT object is not taking the inner details of 3rd page. When I tried to print the links of that page it printed me some common links in that page which would be available in all the pages instead of printing all the links in that particular page. I guess this might happen because the page frame varies with respect to the FromDate & ToDate. Pardon me if I'm wrong. Do we need to change every time the "ie.document" object with respect to the navigation of web page? Because I think it sticks with the same when the page loaded 1st time.
Below is my code:
Public Sub Test ()
Dim shellWins As ShellWindows
Dim ie As InternetExplorer
Dim doc As HTMLDocument
Dim frm As HTMLFrameElement
Dim frms As HTMLElementCollection
Dim strSQL As String
Dim Login As Boolean
strSQL = "https://website.com"
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate strSQL
Do Until .ReadyState = 4: DoEvents: Loop
Set doc = ie.document
Dim link As Object
For Each link In doc.Links
'Debug.Print link.innerText
If link.innerText = "Click Here" Then
link.Click
Exit For
End If
Next link
Do While ie.Busy: DoEvents: Loop
Login_Pane:
For Each link In doc.Links
If link.innerText = "Leave & Attendance" Then
'Debug.Print doc.body.innerHTML
link.Click
Login = True
Exit For
End If
Next link
If Login <> True Then
Application.Wait (Now + TimeValue("00:00:02"))
Application.SendKeys "<USERNAME>", True
Application.SendKeys "{TAB}"
Application.Wait (Now + TimeValue("00:00:02"))
Application.SendKeys "<PASSWORD>", True
Application.SendKeys "{ENTER}"
GoTo Login_Pane
End If
Do While ie.Busy: DoEvents: Loop
Dim link As Object
For Each link In doc.Links
Debug.Print link.innerText
' Above line code should print all the links in that page_
_but unfortunatly it is not displaying as it is in the source code.
' instead printing half of the links which are commonly_ _available in all pages.
' This page has three frames
Next link
End With
'IE.Quit
End Sub
i'm unable to post the image of that page to make you understand more, Anyways i'll try my best.
when i use this below code i can only able to get the links from the upper portion of the page.
Set doc = ie.document
Dim text As Object
For Each text In doc.Links
Debug.Print text.innerText
Next text
Below to that portion of the page i have option to enter FromDate & ToDate, by giving dates to this textboxes i'll be able to see the details according to the dates (by default page displayes the details from 1st of the curent month to the current date of the month).
So, here i'm not getting the links/or other details. And i think the details of this sections are not stored in the ie.document object.
And this particular section alone has different URL from the main page.
Thanks.
A couple of thoughts:
For a page that dynamically loads you need to use Application.Wait (5 seconds or so) instead of Do Until .ReadyState = 4: DoEvents: Loop. The latter does not work if you have javascript being executed.
Using SendKeys should always be avoided as it is not robust.Inspect the element with a DOM explorer to get the ID or name.

VBA to open URL, wait for 5 seconds, then open another URL

I have a webpage I want to open (URL is generated out of cell values in Excel), but going to that URL directly requires logging in to open. But if I first open the mainpage of the server, I have automatic login, and then I'm able to open the first URL without the need for user/pass.
So far I have this code, which opens IE with both URLs at the same time, in the same window, but different tabs, exactly as I want it to do, except URL2 requires the login.
To get around the login, I would like to add a pause between the navigate and navigate2, so the first page can complete loading before the second URL opens.
Can anyone help me with this?
Edit:
I have tried the suggestions from below, but it still needs the login. I have tried another solution, which is not optional, but it works. It consists of two buttons, running different macros, where the first one opens the main page to get the login, and the second one opens the next URL.
I have written them as follows:
First one:
Sub login()
Dim IE As Object
Const navOpenInNewTab = &H800
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate "http://www.example.mainpage.com"
End Sub
Second one:
Sub search()
Dim IE As Object
Const navOpenInNewTab = &H800
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate2 "http://www.example.mainpage.com" & Range("w1").Cells.Value & Range("W2").Cells.Value, CLng(navOpenInNewTab)
End Sub
Is it possible to have a third macro running the other two with a delay between them?
Original code:
Sub open_url()
Dim IE As Object
Const navOpenInNewTab = &H800
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate "http://www.example.mainpage.com"
'here I would like to add a pause for 5 seconds
IE.Navigate2 "http://www.example.mainpage.com" & Range("w1").Cells.Value & Range("W2").Cells.Value, CLng(navOpenInNewTab)
End Sub
Maybe it would be better to wait until the first page is fully loaded:
IE.Navigate "http://www.example.mainpage.com"
Do While IE.Busy Or Not IE.readyState = IE_READYSTATE.complete: DoEvents: Loop
IE.Navigate2 "http://www.example.mainpage.com" & Range("w1").Cells.Value & Range("W2").Cells.Value, CLng(navOpenInNewTab)
Note that the ReadyState enum READYSTATE_COMPLETE has a numerical value of 4. This is what you should use in the case of late binding (always the case in VBScript).
Do you mean something like:
Application.Wait(Now + TimeValue("00:00:05"))