Why does Selenium uses a lot of memory - selenium

I want to know selenium with firefox question,
I used Firefox version 56.0.2,selenium3.5.1,and geckodriver 0.19.1,server ubuntu(x64) os,firefox --headless mode
I find when I run my app for a long time, the firefox memory will increase a lot, such 400MB or more, and when I let firefox open about:blank, the memory don not decrease
I want to know how to decrease the firefox memory (do not kill the firefox process), only use the selenium to control firefox or start firefox with some config
I want to open "about:blank" or other URL to reduce memory,but I find it doesn't work;

No,Selenium itself doesn't uses any memory. It's the WebDriver and Web Browser processes which consumes the memory. For an example when you create a new instance of any WebDriver variant to start a relevant Web Browsing Session, both the process consumes memory.
Now the different Browser Client variants will follow different and distinct methods and style to initiate, manage and teardown the browser internal process. So memory consumption will be different for the different browsers.
Answering your questions :
When I run my app for a long time, the firefox memory will increase a lot : In-coarse of handling a active browsing session, the browser binary have to keep track a lot of memory (stack memory / heap memory) resources time to time. Hence memory consumption can go up/down depending on situation.
I want to know how to decrease the firefox memory : No, you have no control over the memory consumption of a browser.
Solution
Web Browsers have evolved a lot recently. Each of the Web Browser variants e.g. Mozilla, Chrome and Internet Explorer are continuously working on a more momery efficient browser process. You can take the following steps for your Automated Tests to consume the optimum memory :
Keep the JDK Version updated to the latest versions as each release aims towards optimum memory utilization.
Keep the Selenium Version updated to the latest versions as each release aims towards better memory management.
Keep the WebDriver Version updated to the latest versions as each release aims towards better memory management.
Keep the Web Browser Version updated to the latest versions as each release aims towards better memory utilization.
If the base version of your Web Browser is too older, you may consider to uninstall the Web Browser through Revo Uninstaller and install a recent stable and GA version of the Web Browser .
Use CCleaner tool before and after your Test Execution to wipe off the OS system chores.
Perform the Test Execution in an isolated system free from Manual Intervention
Keep the Test System within the Test Lab well equipped with the Hardware Requirements to execute the Test Suites.

Related

Security Considerations - ChromeDriver - Webdriver for Chrome

I was wondering if anyone had more information on what the specific risks for using chromedriver as was concerned by this statement.
"If possible, run ChromeDriver with a test account that has no access to sensitive local or network data. ChromeDriver should never be run with a privileged account."
Would like to know what the specific risks are when using a privileged account and what if any preventative measures can be taken to protect against them.
Thank you in advance!
How Google Chrome Browser Works
In the article Chrome Browser Security #STEPHANIE CRAWFORD mentioned, Google has leveraged its power as a search engine by creating its Safe Browsing technology which will automatically warn you if Chrome detects that a site you're visiting contains malware or phishing.
Chrome deploys this security measure through a unique security feature termed as Sandboxing. Sandboxing implies, separating each process out into independent spaces to see how they function individually. Chrome handles its workload as a series of multiple processes rather than as part of one large browser process. Each time you open a Web page, Chrome launches one or more new processes to run the scripts on that page. Also, each Chrome extension and app runs in its own process. Chrome implements sandboxing through its multi-process architecture. The security advantage in sandboxing comes with Chrome being able to control the access token for each process. These access token for a process allows that process access to important information about your system, like its files and registry keys. Chrome intercepts each access token from the processes launched from the browser, and it modifies that token to limit its access to that information. So, Chrome's sandboxing helps block web pages that try to install malware, capture your personal information or obtain data from your hard drive. The drawback of sandboxing is that, it can't catch everything. A sandboxed process might still be able to access less secure file systems. It's also likely to miss protecting registry keys and files managed by third party software, like a game or chat program that isn't native to the system.
WebDriver driven Chrome
While initiating a WebDriver controled Chrome Browsing Context using Selenium recently we had been advocating to use a certain command line argument:
--no-sandbox: Disables the sandbox for all process types that are normally sandboxed.
See:
WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
unknown error: session deleted because of page crash from unknown error: cannot determine loading status from tab crashed with ChromeDriver Selenium
No Sandbox
There are a couple of more Sandbox related flags available which enables the sandboxed processes to run without a job object assigned to them. This flag is required to allow Chrome to run in RemoteApps or Citrix. This flag can reduce the security of the sandboxed processes and allow them to do certain API calls like shut down Windows or access the clipboard. Also we lose the chance to kill some processes until the outer job that owns them finishes.
--allow-no-sandbox-job: Disables usage of sandbox job.
--allow-sandbox-debugging: Allows debugging of sandboxed processes.
--disable-gpu-sandbox: Disables the GPU process sandbox.
--disable-namespace-sandbox: Disables usage of the namespace sandbox.
--disable-seccomp-filter-sandbox: Disable the seccomp filter sandbox (seccomp-bpf) (Linux only).
--disable-setuid-sandbox: Disable the setuid sandbox (Linux only).
--disable-win32k-lockdown: Disables the Win32K process mitigation policy for child processes.
--enable-audio-service-sandbox: enable the audio service sandbox.
--gpu-sandbox-allow-sysv-shm: Allows shmat() system call in the GPU sandbox.
--gpu-sandbox-failures-fatal: Makes GPU sandbox failures fatal.
--no-sandbox-and-elevated: Disables the sandbox and gives the process elevated privileges (Windows only).
Sandbox
Sandbox leverages the OS-provided security to allow code execution that cannot make persistent changes to the computer or access information that is confidential. The architecture and exact assurances that the sandbox provides are dependent on the operating system.
windows implementation principles:
Do not re-invent the wheel: It is tempting to extend the os kernel with a better security model. Don't. Let the operating system apply its security to the objects it controls. On the other hand, it is just okay to create application-level objects (abstractions) that have a custom security model.
Principle of least privilege: This should be applied both to the sandboxed code and to the code that controls the sandbox. In other words, the sandbox should work even if the user cannot elevate to super-user.
Assume sandboxed code is malicious code: For threat-modeling purposes, we consider the sandbox compromised (that is, running malicious code) once the execution path reaches past a few early calls in the main() function. In practice, it could happen as soon as the first external input is accepted, or right before the main loop is entered.
Be nimble: Non-malicious code does not try to access resources it cannot obtain. In this case the sandbox should impose near-zero performance impact. It's ok to have performance penalties for exceptional cases when a sensitive resource needs to be touched once in a controlled manner. This is usually the case if the OS security is used properly.
Emulation is not security: Emulation and virtual machine solutions do not by themselves provide security. The sandbox should not rely on code emulation, code translation, or patching to provide security.
linux implementation
macos implementation

Many process of Google Chrome (32 bit)

When 2 tests are running in Chrome, i have observed that too many Google Chrome(32 Bit) processes are running in Task manager, Is this a correct behavior of Chome Driver
When multiple automated tests are getting executed through Google Chrome you must have observed that there are potentially dozens of Google Chrome processes running which can be observed through Windows Task Manager's Processes tab.
Snapshot:
As per the article SOLVED: Why Google Chrome Has So Many Processes for a better user experience Google Chrome initiates a lot of windows background processes for each tab that have been opened by your Automated Tests. Google tries to keep the browser stable by separating each web page into as many processes as it deems fit to ensure that if one process fails on a page, that particular process(es) can be terminated or refreshed without needing to kill or refresh the entire page.
However, from 2018 onwards Google Chrome was actually redesigned to create a new process for each of the following entities:
Tab
HTML/ASP text on the page
Plugin those are loaded
App those are loaded
Frames within the page
In a Chromium Blog Multi-process Architecture it is mentioned:
Google Chrome takes advantage of these properties and puts web apps and plug-ins in separate processes from the browser itself. This means that a rendering engine crash in one web app won't affect the browser or other web apps. It means the OS can run web apps in parallel to increase their responsiveness, and it means the browser itself won't lock up if a particular web app or plug-in stops responding. It also means we can run the rendering engine processes in a restrictive sandbox that helps limit the damage if an exploit does occur.
As a conclusion, the many processes you are seeing is pretty much in line with the current implementation of google-chrome
Outro
You can find a relevant discussion in How to quit all the Firefox processes which gets initiated through GeckoDriver and Selenium using Python

Can using Selenium WebDriver for automated web crawling be dangerous?

I'd like to crawl a set of random websites received from a URL generator, using Selenium's ChromeDriver with Crawljax to do static code analysis on the captured DOM states.
Is this potentially unsafe for the machine doing the crawling?
My concern is that one of the randomly generated sites is malicious and that execution of JavaScript from ChromeDriver (which is used to capture the new DOM states) infects the machine running the test somehow. Should I be running this in some kind of sandboxed environment?
--edit--
If it matters, the crawler is implemented entirely in Java.
Simple answer, no. Only if your afraid of cookies, and even if you are, your machine isn't.
It's hard to say it's very secure,you should aware of that there is no absolute secure in network.Recently,a chrome RCE has been put out,details:
SSD Advisory – Chrome Turbofan Remote Code Execution – SecuriTeam Blogs
Maybe this can effect on Selenium's ChromeDriver
But you can do some enforce on your system,such as change your firewall mode to white list,only allow your python script and selenium to access internet on port 80,443.
Even if your system pwned by RCE,the malicious code still can't access internet,unless it inject to you python process(I think it's very hard to do with js script in Browser RCE).
Another option:Install HIPS,if your python script want to do anything else but crawl web page(such as start an other process) or read/write some other files,you will know it and decide what to do.
In my oppion,do your crawl thing in a VM and do some enforce on firewall(Windows firewall or Linux iptables),shutdown useless services in windows.That's enough.
In a word,it's diffcult to find the balance between security and convenience and do not believe your system is unbreakable

selenium grid execution slows down with time

I am running multiple data validation test on selenium grid (only chrome browsers) on CentOS stack. I notice that initially the tests complete real quick. However, with time, the execution slows down considerably.
I am trying to validate data from a csv file with data on a web application. I have around 100K records in the csv file. For each record, below are the list of event:
launch remote driver(chrome) instance
Open the web application and login
search for the keywords in the csv file on the application and validate the results(output in csv VS output on the web application)
close the remote driver instance
I have configured 7 nodes using CentOS and each node has 10 browser instances.
Also, I am using ThreadPoolExecutor for submitted each thread. So at any given time, I will have 70 threads running where each thread is a webdriver instance.
I am not sure if this is a code level issue or infrastructure related issue. Can someone point me in the right direction of how I can find the root cause for this slowness and rectify it.
I have tried to monitor system resources for one of the nodes and see that the java process takes around 55% CPU and 10% memory. while each browser takes 10% CPU and 4% memory
Selenium grid will be slow when time increases as selenium grid is running on jvm and it will occupy more memory. There are many factors will affect the performance of the browser like no of browser in a node, node configuration, grid configuration and you web server performance. For better grid performance, you have to restart grid hub and nodes once in a while.

How to stress test simulating heavy load using Selenium

I have a system to test, which is a video ads distribution technology. I need to load every video like 1-2 mins to serve the ads. The videos are played in a Flash client and streamed as FLV streams like in YouTube.
The reason why I need to test it only via browsers -- and every other method won't work -- is to stress test both the video streaming servers and the ads servers simultaneously and displaying ads in real-time.
I have used Selenium, WatiN, Automation Anywhere and many other automation tools. However, when I am trying to start like 10000 browsers on my machine (32GB RAM, 16-core CPU), none of them are able to do the job.
With Selenium, I am able to start the maximum FireFox instances so far, but that's still too low: half of the instances don't run the test.
Any suggestions to do with Selenium?
You aren't going to run 10,000 browsers on your machine. That would give 3.2MB of physical memory per browser instance and I'm pretty sure FireFox just won't like that.
You could create a JMeter script that hits your server with many threads. It won't interact with the UI but would simulate the load of many clients hitting whatever URLs you tell it. I believe it also includes the ability to record a session and play it back for easy setup of your sessions.
Selenium isn't really optimized for load/stress testing, especially if you're running your browsers locally. Running 1000+ browsers is going to choke even the beefiest server. Though RAM is an obvious bottleneck, you also have limited CPU resources and bandwidth. The latter being a primary concern if you are loading videos.
Not to mention you'd be testing from a single IP with 10k browsers, so load balancing may not kick in properly, as well as the actual distribution of video ads to specific virtual users.
If you want to stick with existing Selenium tests, I've had good experiences with BrowserMob. They basically have a huge grid to do real browser load-testing, distributed across AWS.
Another recommendation would be an actual performance testing tool. I'd recommend Soasta CloudTest. They have a free version that runs 100 users so you can see if it will be a good fit for you. I have found that scripting for CloudTest is relatively simple.
Disclaimer: My experiences with both companies have been as a paying customer and I have never worked for either.
If you are using Windows machine then as per my experience there is a limit on number of browser window instances to be opened. As per my test last time, it does restrict between 100-150 browser windows.
I would recommend you using headless robot, which doesn't require opening browser window. I think latest version of Selenium has that capability. But it seems to be more like a load test as you are trying to simulate 10,000+ user instances, I would recommend you using load testing tool like JMeter or LoadRunner.
It looks to me that you are trying to verify what the client will see based on high traffic, no?
In that case, Joel is quite correct. If you absolutely have to see what the client sees, you could use threaded hits and just dump the results in a database. That'll show you anything the client would see anyway, and it's a lot easier to sort through than thousands of browser instances.
Either way, your client will not see errors if there are no errors present on the server side. If you're testing functionality in bandwidth restricted environments, CPU-intensive environments, or memory-intensive environments, those are much easier achieved than running thousands of browser instances.
Your post smells of some form of ad-based fraud to me, but either way: have you considered using different web browsers besides Firefox? PhantomJS is a headless webkit-based browser that is compatible with Selenium. It supports all the core browser features like DOM handling, CSS selectors, Javascript and Canvas. I do not know if it supports Flash.
This post has a decent list of other headless and automatable web-browsers that you might consider.
Also, if each browser instance is instantiating a Flash plugin, don't neglect the possibility that the issue could be with Flash and not Firefox. Alternatively, why instantiate several different Firefox processes? Can you accomplish what you want through the use of tabs instead?
The in-house way to this wiht selenium is using browsermob proxy and multiple broswser agents to recreate the experience of different users, changing the ip is more difficult because it requires changing your home network.
Here is a good example