rspec/capybara/selenium testing - downloading/visit external urls and checking for content type

rspec/capybara/selenium testing - downloading/visit external urls and checking for content type - testing

I understand that in order for capybara to visit external urls, the driver has to be one that supports that (such as selenium). But when selenium is used, I can no longer check for the content type via page.response_headers since that is not supported by the selenium driver. Is there any alternative to response_headers, or am I simply looking at the wrong set of tools?

I use the capybara-webkit driver and I can confirm that page.response_headers works for external sites.

Related

What is the difference between Selenium server and WebDriver (not RC, not Remote, not 4.0)?

Please note that I am not talking about Selenium RC or RemoteWebDriver or the old Selenium server or the standalone server. I am also not talking about Selenium 4.0 so let us not get into that because it simply dodges answering the question.
When we say Selenium-WebDriver, as per my understanding, it is an integration of Selenium code + WebDriver project (which is also the name of the W3C WebDriver specification).
Here is my understanding of the difference so far:
Selenium = the library which supports multiple programming languages. In each language the user creates instances of ChromeDriver / FirefoxDriver etc. which are client libraries that allow the user to write high-level code. This code is converted by these client libraries into an API request with JSON body containing javascript commands. This is sent to the Selenium server through API requests.
The API requests which will be sent to the Selenium server is either converted to the W3C WebDriver format, or maybe the client libraries already do that. I'm not sure which is it.
The selenium server converts the request or simply forwards it to the driver application that is running (Selenium server is supposed to act like a reverse proxy so I think that it simply forwards it) into a format that is understood by the drivers (the W3C specification).
The drivers now interact with the browser. How do they do that? Well, the browsers come built-in with JS libraries to help drive automation. And these drivers are external applications that take REST API requests, and are able to call these JS libraries based on these requests.
When we say WebDriver, it consists of 2 parts. One, the W3C specification. Two, it is also the same name that Selenium has kept for it's client. You are actually creating an instance of "WebDriver" (or "RemoteWebDriver" to be more specific) while writing code. So "WebDriver" is more than just a W3C specification over here. It is also the name of the Selenium client implementation. (I have no idea why they decided to keep the same name which made it so confusing - it's like creating a microservice called as "REST API microservice").
Qs A: Is my understanding of the concepts correct? I cannot get a clear answer anywhere because all answers/discussions on the web are simply dodging answering the question by quoting from the official docs (which are pretty vaguely worded).
Qs B: Does the Selenium server modify the response in any way before sending it to the driver? If yes, then can we say that in this case the "Selenium server" has become the client and driver has become the server?
Qs C: I had read somewhere that the browser drivers also act like a RESTful service. Is this true?

how selenium can identify if there is a browser upgraded?

Since browser auto-update runs on the system unless we manually turn it off,
how to automate in selenium to identify if there is a browser upgraded?

If the browser updates e.g. Chrome, you will need to have the compatible binary for that browser. There's a good answer and overview on this here.
To avoid this, I personally used ephemeral instances of Chrome, specifically SauceLabs and containerized tests using docker-selenium.

Relaxing Microsoft Edge CSP while running tests (webdriver) (Content-Security-policy)

I'm trying to relax Microsoft Edge CSP while running a test using proctractor (webdriver, chromedriver).
So the solution can be either
flag like "--disable-csp" which dose not exist according to my search results.
setting for webdriver/protractor to do so.
load an extension that dose that ( Like in chrome
Relaxing Chrome's CSP while running tests (webdriver) (Content-Security-policy))
I could not find any solution but to setup a proxy that filters the header.
any ideas?

I figured the best approach would be to use the solution from the same question for Chrome and just to convert the Chrome extension to a Microsoft-Edge one.
Follow the linked question to the point you got.
Convert the Chrome extension to an Edge extension using this tool
Use the extensionPath option to load the extension the.

Does htmlunit creates browser instances on the machine where it is running?

I am using htmlunit for web scraping - logging to a website on behalf of the users, settings something in their profile and then come back.
Just using pure Htmlunit and no selenium framework.
Now my question:
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_11);
Does this statement - creates a browser instance on the machine where i am executing the code or what it does?
I am using BrowserVersion.INTERNET_EXPLORER_11 as this is an accepted browser at that website.
How Selenium is different than htmlunit - i know we can use htmlunit as a webdriver in Selenium. Does Selenium needs a native browser instance on the machine where the code is getting executed? Does Selenium creates browser instances?
My use case is - I will be having multiple users accessing this application. I know WebClient in htmlunit is not thread safe(so have to code it as Spring proto type bean).
Is there any suggestions regarding this?
Any help is greatly appreciated.

HTMLUnit is a headless browser. So no window will be created if used with Selenium either. Setting the BrowserVersion will just tell HTMLUnit to present itself to the server as if it were a given browser (AFAIK, it will just change the User-Agent but might perform additional internal processing depending on the version). I guess this answers most of the questions but the last one.
Regarding asking for suggestions on how to implement this I would try to avoid logging in to a website that way. If the website does not provide an API for this then it is likely that it is agains the Terms Of Service. Assuming it is not, you will have to create new WebClient instances for each user each time the data needs to be extracted from the other site.

Restricting Selenium/Webdriver/HtmlUnit to a certain domain

While using selenium/webdriver for web scraping, I realized the target site has google analytics script running. Is there a way to restrict selenium/webdriver/htmlunit to avoid certain urls/domains ?
Thanks,

I think it is impossible becouse Selenium is actually adapter for several implementation. So he can't deny to load some scripts to firefox or chrome. Perhaps you can check driver api(firefox profile, htmlunit configuration file) to accomplish this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas