LR 12.55/TruClient vusers are stuck in Init state not going to running - vugen

I created a TruClient Web (IE) protocol script in LR12.55, when I try to run the script with 50 users, only some would go into running state (in between 25-37) and the rest would stuck in init forever.
I tried to change the Controller -> Options-> Timeout and changed Init timeout from default 180 to 999 however it does not resolve the issue. Can anybody comment on how to resolve this????

TruClient runs a real browser for each vuser (virtual-user), so system resource consumption is higher the API-level testing.
It is possible that 50 vusers is too much for your load-generator machine.
I'd suggest checking CPU and memory levels during the run. If either is over 80% utilization, you should split your load between multiple load-generator machines.
If resources are not fully utilized, the failures should be analyzed to determine the root cause.

To further e-Dough's excellent response, you should expect not to execute these virtual users on the same hardware as the controller. You should expect at least three load generators to be involved, two as primary load and one as a control set. This is in addition to the controller.
Your issue does manifest as the classical, "system out of resources" condition. Consider the same best practices for monitoring your load generator health as you would in monitoring your application under test infrastructure. You want to have monitors for your classical finite resource model components ( CPU, DISK, MEMORY and NETWORK) plus additional sub components, such as a breakout of System and Application under CPU, to understand where and how your system is performing. You want to be able to eliminate false negatives on scalability where your load generators are so unhealthy that they are distorting your test results - Virtual users showing the application is slow when in fact the Virtual Users are slow because the machine in use is resource constrained.

Related

How to delete an instance if cpu is low?

I am running managed Instance groups whose overall c.p.u is always below 30% but if i check instances individually then i found some are running at 70 above and others are running as low as 15 percent.
Keep in mind that Managed Instance Groups don't take into account individual instances as whether a machine should be removed from the pool or not. GCP's MIGs keep a running average of the last 10 minutes of activity of all instances in the group and use that metric to determine scaling decisions. You can find more details here.
Identifying instances with lower CPU usage than the group doesn't seem like the right goal here, instead I would suggest focusing on why some machines have 15% usage and others have 70%. How is work distributed to your instances, are you using the correct strategies for load balancing for your workload?
Maybe your applications have specific endpoints that cause large amounts of CPU usage while the majority of them are basic CRUD operations, having one machine generating a report and displaying higher usage is fine. If all instances render HTML pages from templates and return the results one machine performing much less work than the others is a distribution issue. Maybe you're using a RPS algorithm when you want a CPU utilization one.
In your use case, the best option is to create an Alert notification that will alert you when an instance goes over the desired CPU usage. Once you receive the notification, you will then be able to manually delete the VM instance. As it is part of the Managed Instance group, the VM instance will automatically recreate.
I have attached an article on how to create an Alert notification here.
There is no metric within Stackdriver that will call the GCE API to delete a VM instance .
There is currently no such automation in place. It should't be too difficult to implement it yourself though. You can write a small script that would run on all your machines (started from Cron or something) that monitors CPU usage. If it decides it is too low, the instance can delete itself from the MIG (you can use e.g. gcloud compute instance-groups managed delete-instances --instances ).

jmeter Load Test Serevr down issues

I was used a load of 100 using ultimate thread group for execution in NON GUI Mode .
The Execution takes place around 5 mins. only . After that my test environment got shut down. I am not able to drill down the issues. What could be the reason for server downs. my environment supports for 500 users.
How do you know your environment supports 500 users?
100 threads don't necessarily map to 100 real users, you need to consider a lot of stuff while designing your test, in particular:
Real users don't hammer the server non-stop, they need some time to "think" between operations. So make sure you add Timers between requests and configure them to represent reasonable think times.
Real users use real browsers, real browsers download embedded resources (images, scripts, styles, fonts, etc) but they do it only once, on subsequent requests the resources are being returned from cache and no actual request is being made. Make sure to add HTTP Cache Manager to your Test Plan
You need to add the load gradually, this way you will be able to state what was amount of threads (virtual users) where response time start exceeding acceptable values or errors start occurring. Generate a HTML Reporting Dashboard, look into metrics and correlate them with the increasing load.
Make sure that your application under test has enough headroom to operate in terms of CPU, RAM, Disk space, etc. You can monitor these counters using JMeter PerfMon Plugin.
Check your application logs, most probably they will have some clue to the root cause of the failure. If you're familiar with the programming language your application is written in - using a profiler tool during the load test can tell you the full story regarding what's going on, what are the most resources consuming functions and objects, etc.

Google Compute Engine VM constantly crashes

On the Compute Engine VM in us-west-1b, I run 16 vCPUs near 99% usage. After a few hours, the VM automatically crashes. This is not a one-time incident, and I have to manually restart the VM.
There are a few instances of CPU usage suddenly dropping to around 30%, then bouncing back to 99%.
There are no logs for the VM at the time of the crash. Is there any other way to get the error logs?
How do I prevent VMs from crashing?
CPU usage graph
This could be your process manager saying that your processes are out of resources. You might wanna look into Kernel tuning where you can increase the limits on the number of active processes on your VM/OS and their resources. Or you can try using a bigger machine with more physical resources. In short, your machine is falling short on resources and hence in order to keep the OS up, process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.
How process manager/kernel decides to quit a process varies in many ways. It could simply be that a process has consistently stayed up for way long time to consume too many resources. Also, one thing to note is that OS images that you use to create a VM on GCP is custom hardened by Google to make sure that they can limit malicious capabilities of processes running on such machines.
One of the best ways to tackle this is:
increase the resources of your VM
then go back to code and find out if there's something that is leaking in the process or memory
if all fails, then you might wanna do some kernel tuning to make sure your processes have higer priority than other system process. Though this is a bad idea since you could end up creating a zombie VM.

Running load tests from home network

I need to perform a load test using loadrunner to simulate load generated from external network (My home network) on servers placed in some organization in the same region.
The application which will be tested is a web site (Not Heavy one) which users can be logged into and get personal information.
I am very concerned that my home network bandwidth wouldn't be enough to generate the following load :
I need to simulate 250 Web concurrent users which will perform about 30,000 transactions in an hour.
My home network specs and statistics:
Download - 75M - 7.5 Megabyte/sec
Upload - 3.5 M - 350Kbyte / sec
From your experience is this would be enough to generate the desired load? If not what can be done to simulate load from external network?
One Load Generator is never enough from a process perspective. Consider at least three, two for primary load and one for a control set. So, right off of the bat you are likely to have issues.
Mentioned previously. Go to the cloud: Amazon, CloudAzure, GoDaddy, Rackspace, 1&1, etc... all have virtual machines that you can use for performance testing hosts running load generator software. More locations is better as this minimizes the influence of one host network over another if you are looking for representative experiences. Odds are your site will be on one backbone and some of your load generators may have to peer over from another backbone. This is not bad as this provides a more realistic view of your end user experiences from different locations.
Check your end user agreement from your home. Unless you have a business class agreement from your home such traffic may appear to be a DDOS event, setting off alarms at your service provider. Don't be surprised if you find yourself suddenly cut off from the internet without warning. I have seen this happen before with people attempting to generate load from their homes against a site.
As you can see in the comments, the amount of load you can generate is affected not only by the network bandwidth but also by the script itself and the LG machine specifications. What I mean is that there is no definitive answer to your question without taking all the parameters into account.
What you should do is create an account on one of the popular cloud providers (Amazon, Azure, HP) and create a machine with the exact specifications you need based on the parameters as you know them. Most of these services allow you to increase the machine size and the bandwidth if needed for some extra pay.
Good luck!

BIND9.7. When several named processes are running, how to judge which process is providing the service?

For example, I execute "sudo named" several times, so there are several named processes running. When I use "pidof named", I get several pids.
I want to calculate the CPU usage rate of the BIND process,so I need to get some parameters from "/proc/pid/stat", so I need the pid of the named process which is really providing the domain resolution service.
What's the difference between the named process which is providing the service and the others? Could you give me a detailed explanation?
thanks very much~
(It's my first time to use stackoverflow , to use English to ask quetions , please ignore those syntax errors.)
There should be just one named running, the scripts that manage the service ensure that. You shouldn't start it like that, you should use what your distribution uses to start it, probably something along the lines of service bind start (that is probably a RedHat-ism), or /etc/rc.d/bind start (for bog-standard SysVinit).
I was responsible for DNS for quite some time here. Some tips:
DNS is a very critical service, configure and monitor with extreme care. Do read up on setting up and managing this, don't go ahead until you are absolutely clear.
Get somebody as a backup for the case that you aren't available, and make sure they understand the previous point.
DNS isn't CPU intensive (OK, with signed domains and that newfangled stuff that might have changed), it is memory intensive (and network intensive, or at least sensitive to delays). Our main DNS server was running for months at a time, and clocked up some half hour of CPU time during that kind of period IIRC.
Separate your master server (responsible for the domain(s) from the servers queried by clients (caching servers). There have been vulnerabilities where malformed questions or "answers" to questions that hadn't been asked soiled the database
The master server will have all the domain information in RAM, make sure you have got enough of it
Make sure all machines under your jurisdiction use the same caching server. It makes no sense for more than one, that destroys the idea of cache.
The caching servers collect immense amounts of data over time. This data rarely is performance critical, so configure them with plenty of swap space to accommodate overflows.
Bind issues as many named processes as many CPUs you have:
man named:
-n #cpus
Create #cpus worker threads to take advantage of multiple CPUs. If not specified, named will try to determine the number of CPUs present and create one thread per CPU. If it is unable to determine the number of CPUs, a single worker thread will be created.
External source:
https://unix.stackexchange.com/questions/140986/multiple-named-processes-for-bind9-in-debian