I'm using MSAccess VBA with Selenium VBA, Chromedriver
I am able to get to a website, login, find the button to click to start download, and get that download to save to the location I want.
I want to track the progress.
I have opened a new Chrome window (tab) and navigated to chrome:\\downloads\ and switched my driver window to that window.
I've used the following code I found on stack overflow, in a loop to monitor the progress.
downloadPercentage = wd.ExecuteScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
It returns this error ...cannot return properties of null...
My download is present on the page. I can get the file name.
When I open the developer tab on the downloads page and enter the same query selector path into the console I get the same thing. It returns null.
If I manually (or maybe even using VBA haven't tried) click the button for a second download, then all of a sudden that same code returns a value of 100. (It may catch it at a lower percent. The download is too fast for me to catch that in debug mode.)
What would cause the selector to not be present for one download, but then present for the next?
Here's the code that's in question.
Function getDownloadedFileName(wd as ChromeDriver) As String
Dim startTime As Date
'I'm using this method because opening a second ChromeDriver instance and going to the chrome://downloads/ page returns a clean slate (no downloads) and this method works for me.
wd.ExecuteScript ("window.open()")
wd.SwitchToNextWindow
wd.Get "chrome://downloads/"
startTime = Now()
Dim downloadPercentage As Integer
Do While DateDiff("s", startTime, Now()) < 120 And downloadPercentage < 100
'This is the line that returns the Javascript error ... Cannot read properties of null ...
downloadPercentage = wd.ExecuteScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
If (downloadPercentage = 100) Then
getDownLoadedFileName = wd.ExecuteScript("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
Exit Do
End If
Loop
wd.SwitchToPreviousWindow
End Function
I'll appreciate any help on this. Thanks!
I've created a script in vba using IE to keep clicking on the Load more hits button located at the bottom of a webpage until there is no such button is left.
Here is how my script can populate that button: In the site's landing page there is a dropdown named Type. The script can click on that Type to unfold the dropdown then it clicks on some corporate bond checkbox among the options. Finally, it clicks on the apply button to populate the data. However, that load more hits button can be visible at the bottom now.
My script can follow almost all the steps exactly what I described above. The only thing I am struggling to solve is that the script seems to get stuck after clicking on that button 3/4 times.
How can I rectify my script to keep clicking on that Load more hits button until there is no such button is left?
Website link
I've tried so far:
Sub ExhaustLoadMore()
Dim IE As New InternetExplorer, I As Long
Dim Html As HTMLDocument, post As Object, elem As Object
Dim CheckBox As Object, btnSelect As Object
With IE
.Visible = True
.navigate "https://www.boerse-stuttgart.de/en/tools/product-search/bonds"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
Do: Loop Until Html.querySelectorAll(".bsg-loader-ring__item").Length = 0
Html.querySelector("#bsg-filters-btn-bgs-filter-3").Click
Do: Set CheckBox = Html.querySelector("#bsg-checkbox-3053"): DoEvents: Loop While CheckBox Is Nothing
CheckBox.Click
Set btnSelect = Html.querySelector("#bsg-filters-menu-bgs-filter-3 .bsg-btn__label")
Do: Loop While btnSelect.innerText = "Close"
btnSelect.Click
Do: Loop Until Html.querySelectorAll(".bsg-loader-ring__item").Length = 0
Do: Set elem = Html.querySelector(".bsg-table__tr td"): DoEvents: Loop While elem Is Nothing
Do
Set post = Html.querySelector(".bsg-searchlist__load-more button.bsg-btn--juna")
If Not post Is Nothing Then
post.ScrollIntoView
post.Click
Application.Wait Now + TimeValue("00:00:05")
Else: Exit Do
End If
Loop
End With
End Sub
I've tried with selenium but that seems to be way slower. However, it keeps clicking on the load more button after a long wait in between even when no hardcoded wait within it. In case of selenium: I wish to have any solution which might help reduce it's execution time.
Sub ExhaustLoadMore()
Const Url$ = "https://www.boerse-stuttgart.de/en/tools/product-search/bonds"
Dim driver As New ChromeDriver, elem As Object, post As Object
With driver
.get Url
Do: Loop Until .FindElementsByCss(".bsg-loader-ring__item").count = 0
.FindElementByCss("#bsg-filters-btn-bgs-filter-3", timeOut:=10000).Click
.FindElementByXPath("//label[contains(.,'Corporate Bond')]", timeOut:=10000).Click
.FindElementByXPath("//*[#id='bsg-filters-menu-bgs-filter-3']//button", timeOut:=10000).Click
Do: Loop Until .FindElementsByCss(".bsg-loader-ring__item").count = 0
Set elem = .FindElementByCss(".bsg-table__tr td", timeOut:=10000)
Do
Set post = .FindElementByCss(".bsg-searchlist__load-more button.bsg-btn--juna", timeOut:=10000)
If Not post Is Nothing Then
post.ScrollIntoView
.ExecuteScript "arguments[0].click();", post
Do: Loop Until .FindElementsByCss("p.bsg-searchlist__info--load-more").count = 0
Else: Exit Do
End If
Loop
Stop
End With
End Sub
I have studied a bit your website, and since I could not say all of this into a single comment I have decided to post an answer (even though it doesn't provide with a concrete solution, but just with an "answer" and maybe some tips).
The answer to your question
How can I rectify my script to keep clicking on that Load more hits button until there is no such button is left?
Unfortunately, it's just not your fault. The website you are targeting is working through WebSocket communication between the web client (your browser) and the web server providing with the prices you are trying to scrape. You can see it as follows:
Imagine it like this:
When you first load your webpage, the web socket is initialized and the first request is sent (Web client: "Hey server, give me the first X results", Web server: "Sure, here you go").
Every time you click on the "Load more results" button, the Web client (important: re-using the same WS connection) keeps on asking for X new results to the web server.
So, the communication keeps on going on for some time. At some point, out of your control, it happens that the web socket just dies. It's enough to look at the JavaScript console while clicking on the "Load more results" button: you will see the request going through until at some point you don't just see a NullPointerException raised:
If you click on the last line of the stack before the exception, you will see that it's because of the web socket:
The error speaks clearly: cannot read .send() on null, meaning that _ws (the web socket) is gone.
Starting from now, you can forget about your website. When you click on the button "Load more results", the web client will ask the web socket to deliver the new request to the web server, but the web socket is gone so goodbye communications between the two, and so (unfortunately) goodbye the rest of your data.
You can verify this by just going a bit upper in the stack:
As you can see above, we have:
A message logged in the console saying "performSearch params ...") just before posting the new data request
The post of the new data request
A message logged in the console saying "performed search with result ...") just after posting the new data request
While the web socket is still alive, everytime you click on "Load more results" you will see these two messages in the console (with other messages in between printed over the rest of their code):
However, after the first crash of the web socket, no matter how many times you try to click on the button you will only get the first message (web client sends the request) but never will get the second message (request gets lost in the void):
Please note this corresponds to your behavior observed in VBA:
the script seems to get stuck after clicking on that button 3/4 times.
It doesn't get stuck, actually your script keeps on executing correctly. It's the website that times out.
I have tried to figure out why the web socket crashes, but no luck. It just seems a timeout (I've had this a lot more while debugging their JavaScript, so my breakpoints were causing the timeout) but I can't make you sure it's the only cause. Since you're not controlling the process between the web client and the web server, all you can do is to hope that it doesn't timeout.
Also, I believe using Selenium automatically sets some longer timeouts (because of the long execution time) and this somehow allows you to keep the web socket more tolerant with respect to the timeouts.
The only way I found to restore the connection after a crash of the web socket is completely reload the web page and restart the process from scratch.
My suggestions
I think you might go with building an XHR request and sending through JavaScript, because their API (through which the web client/web socket deliver the request to the web server) is pretty exposed in their front-end code.
If you open their file FinderAPI.js, you will see they've left the endpoints and API configurations harcoded:
var FinderAPI = {
store: null,
state: null,
finderEndpoint: '/api/v1/bsg/etp/finder/list',
bidAskEndpoint: '/api/v1/prices/bidAsk/get',
instrumentNameEndpoint: '/api/products/ProductTypeMapping/InstrumentNames',
nameMappingEndpoint: '/api/v1/bsg/general/namemapping/list',
apiConfig: false,
initialize: function initialize(store, finderEndpoint) {
var apiConfig = arguments.length > 2 && arguments[2] !== undefined ? arguments[2] : false;
this.store = store;
this.state = store.getState();
this.apiConfig = apiConfig;
this.finderEndpoint = finderEndpoint;
},
This means you know the URL to which you should send your POST request.
A request also requires a Bearer Token to be validated by the server. Lucky you, they have also forgot to protect their tokens providing (GORSH) a GET end point to get the token:
End-point: https://www.boerse-stuttgart.de/api/products
Response:
{"AuthenticationToken":"JgACxn2DfHceHL33uJhNj34qSnlTZu4+hAUACGc49UcjUhmLutN6sqcktr/T634vaPVcNzJ8sHBvKvWz","Host":"frontgate.mdgms.com"}
You'll just have to play around with the website a little bit to figure out what is the body of your POST request, then create a new XmlHttpRequest and send those values inside it to retrieve the prices directly in your VBA without opening the webpage and robotic-scraping.
I suggest you start with a breakpoint on the file FinderAPI.js, line 66 (the line of code is this.post(this.finderEndpoint, params), params should lead you to the body of the request - I remember you can print the object as string with JSON.stringify(params)).
Also, please note that they use a pagination of 50 results each time, even though their API supports up to 500 of them. In other words, if you get to sweep the value 500 (instead of 50) into their pagination property sent to the API for the request:
... then you will get 500 results per time instead of 50, so reducing by 10 the time your code will spend scraping the webpage in case you decide not to go deeper into the XHR solution.
Could you try to change
Do
Set post = Html.querySelector(".bsg-searchlist__load-more button.bsg-btn--juna")
If Not post Is Nothing Then
post.ScrollIntoView
post.Click
Application.Wait Now + TimeValue("00:00:05")
Else: Exit Do
End If
Loop
to:
Set post = Html.querySelector(".bsg-searchlist__load-more button.bsg-btn--juna")
If Not post Is Nothing Then
post.ScrollIntoView
While Not post Is Nothing
Debug.Print "Clicking"
post.Click
Application.Wait Now + TimeValue("00:00:05")
Wend
Debug.Print "Exited Click"
End If
(untested)
I am using Selenium Webdriver to load a specific feature from an application, a rich text editor (actually a custom release of CKEditor) and the code below works perfectly for that... except that I would like to release Selenium objects (and geckodriver.exe/marionnette black cmd window) since the desired page was loaded. But either .Close(), .Quit() or .Dispose() methods will wipe out the Firefox window as well...
Is there a way to dismiss Selenium Webdriver and keep Firefox running by its own?
Thank you very much
Private Sub LoadResource()
Dim FFD As New OpenQA.Selenium.Firefox.FirefoxDriver()
'Set timeout of 60 seconds for steps to complete successfully
Dim WDW As New OpenQA.Selenium.Support.UI.WebDriverWait(FFD, TimeSpan.FromSeconds(60))
'navigate to login page
FFD.Navigate.GoToUrl("https://www.myapplication.com/login")
'Wait until application loads main page (this means login was successful)
WDW.Until(Function() FFD.Url = "https://www.myapplication.com/")
'Load built-in rich text editor Rich text
FFD.Navigate.GoToUrl("https://www.myapplication.com/editor?document=1080199")
'Wait for successful loading of the editor page
WDW.Until(Function() FFD.Url = "https://www.myapplication.com/editor?document=1080199")
'That's all.
'here I'd like to release Firefox to keep running and get rid of WebDriver's objects and resources, if possible.
End Sub
This is based on Kirhgoph's comment and seems to work well:
Private Sub LoadResource()
Dim FFD As New OpenQA.Selenium.Firefox.FirefoxDriver()
Dim GDP As Process = Process.GetProcessesByName("geckodriver").Last
Dim WDW As New OpenQA.Selenium.Support.UI.WebDriverWait(FFD, TimeSpan.FromSeconds(60))
FFD.Navigate.GoToUrl("https://www.myapplication.com/login")
WDW.Until(Function() FFD.Url = "https://www.myapplication.com/")
FFD.Navigate.GoToUrl("https://my.application.com/editor?document=1080199")
WDW.Until(Function() FFD.Url = "https://www.myapplication.com/editor?document=1080199")
GDP.CloseMainWindow()
GDP.WaitForExit()
FFD.Quit()
End Sub
I have VBA code in Excel that is supposed to login to a website and download some files using Selenium. I have my code working using the ChromeDriver and am trying to modify it to work with the PhantomJSDriver so I can do something else while the program runs (it runs for ~45 minutes). The issue is that when I try to have Selenium click on the login button I get a timeout error:
Run-time error '101':
WebRequestTimeout:
No response from the server within 30000 seconds
The interesting thing is that after it times out, I can use the immediate window to take a screenshot and it's clear that the button was clicked and the browser has advanced to the next page.
Dim D As New PhantomJSDriver
With D
.ExecuteScript ("window.resizeTo(1920,1080)")
.SendKeys MyKeys.Control, "0" 'Set zoom to 100% (causes errors if not 100%)
.Get "LoginPage.com"
.FindElementByName("username").SendKeys "UserName"
.FindElementByXPath("/html/body/div[#class='centreContent']/form[#id='loginForm']/input[#id='passwordDummy']").Click
.FindElementByXPath("/html/body/div[#class='centreContent']/form[#id='loginForm']/input[#id='password']").SendKeys "Password"
.TakeScreenShot.SaveAs "C:\Users\110SidedHexagon\Downloads\Capture.png" '<---Takes screenshot of login screen with uesername and password filled in
.FindElementByName("loginSubmitButton", 0.1).Click '<---Error occurs here
<--Using the immediate window taking a picture after the error breaks code execution shows login was successful-->
End With
It means that after the button is clicked, the new loaded page doesn't return a completed state within 30 seconds.
It could be due to a dead resource within the page.
You could try to increase the server timeout:
Dim driver As New PhantomJSDriver
driver.Timeouts.Server = 60000 ' 60 seconds
driver.Get "https://..."
driver.FindElementByName("loginSubmitButton").Click
Or you could define a timeout to load the page and skip the error:
Dim driver As New PhantomJSDriver
driver.Timeouts.PageLoad = 20000 ' 20 seconds
driver.Get "https://..."
On Error Resume Next
driver.FindElementByName("loginSubmitButton").Click
On Error Goto 0
To get the latest version in date working with the above example:
https://github.com/florentbr/SeleniumBasic/releases/latest
We have a Excel-List of URLs with a lot of parameters.
The problem is: The first time you follow a link, you get redirected to a ADFS-Login, which cuts some of the Parameters, since they have a maximum URL-length.
My question: Is there a possibility to tell excel (be it via VBA or default) to use an existing Session?
I tried some shennenigans, for example via Chrome: Find the Window handle for a Chrome Browser or to take an existing IE-Window: http://www.mrexcel.com/forum/excel-questions/553580-visual-basic-applications-macro-already-open-ie-window.html While I get an existing Window, it seems like it always gets redirected and the URL cut. Is there anyhow a possibility to make this?
Please try this and post feedback
Open Sheet1
In Column A, from row 2 create your list of URLS
Insert ActiveXControl Microsoft Web Browser WebBrowser1
Size the control to your needs
Insert Control Button outside the bounds of the browser
Change name of the button to NextButton
Open Code Editor (Alt+F11)
In Sheet1 place the below code
Dim currentURLRow As Integer ''Sheet level variable
Sub NextButton_Click()
On Error Resume Next
Dim url As String
''VBA evaluates second expression even when the first of OR is true. So on error resume next helps here
If currentURLRow = 0 Or Trim(Cells(currentURLRow, 1)) = "" Then
''First time or loop back
currentURLRow = 2
Else
currentURLRow = currentURLRow + 1
End If
On Error GoTo 0 ''reset error so you know of any (good) errors
url = Cells(currentURLRow, 1)
''Sheet1.WebBrowser1.Silent = True ''Uncomment this if you are seeing lot of script errors that you dont want to see
WebBrowser1.Navigate url
Debug.Print WebBrowser1.Document.body.InnerHTML ''' Here you can do magic if the urls you are navigating are serialisable to objects :)
End Sub
Now the first time you navigate to the site, you should be prompted for user name and password, on click of next, your session to saved.