I was trying to do a capacity test on an apache web server, but there are some result I can't understand: according to the theory part of capacity planning, I should see three different regions on the plot of throughput in/out.
In the first region the expected result is the line y=x, meaning that the web server can follow my requests and reply to all with the code 200-OK (Thus, the throughput I request is equal to the throughput I get).
In the second region the expected result is the line y=k, where k is that throughput that indicates the saturation of the web server (Thus, the throughput I get can't go further k).
In the third region the expected result is a curve that goes from k to zero, that shows the degradation of web server, which for memory or CPU leaks starts to reject requests.
I have tried to replicate the experiment with a Virtual Machine running an instance of Apache as a server and the Physical Machine running an instance of Apache JMeter as a client. The result that I get is only the first two points, but also if I request a very very huge number of samples/seconds as throughput, I always get the saturation value.
Why I can't get the server going down, even if the CPU is 0% idle and the remaining memory is about 10MB? Or maybe is this the correct behavior and my hypothesis was incorrect?
Thank you in advance.
Related
Currently I am working my way into the topic of load and performance testing. In our planning, however, the customer now wants to have indicators for the load and performance test named. Here I am personally however over-questioned. What exactly are the performance indicators within a load and performance test?
You can separate the Performance indicators based on Client Side and Server Side Indicators:
1. Client Side Indicators : JMeter Dashboard
Average Response Time
Minimum Response Time
Maximum Response Time
90th Percentile
95th Percentile
99th Percentile
Throughput
Network Byte Send
Network Byte Received
Error% and different types of Error received
Response Time Over Time
Active Threads Over Time
Latencies Over Time
Connect Time Over Time
Hits Per Second
Codes Per Second
Transactions Per Second
Total Transactions Per Second etc.
You can also obtain Composite Graphs for better understanding.
2. Server Side Indicators :
CPU Utilization
Memory Utilization
Disk Details
Filesystem Details
Network Trafic Details
Network Socket
Network Netstat
Network TCP
Network UDP
Network ICMP etc.
3. Component Level Monitoring :
Language Specific likes Java, .Net, Python etc.
Database Server
Web Server
Application Server
Broker Statistics
Load Balancers etc.
Just to name a few.
I'm trying to deploy a scalable web application on google cloud.
I have kubernetes deployment which creates multiple replicas of apache+php pods. These have cpu/memory resources/limits set.
Lets say that memory limit per replica is 2GB. How do I properly configure apache to respect this limit?
I can modify maximum process count and/or maximum memory per process to prevent memory overflow (thus the replicas will not be killed because of OOM). But this does create new problem, this settings will also limit maximum number of requests that my replica could handle. In case of DDOS attack (or just more traffic) the bottleneck could be the maximum process limit, not memory/cpu limit. I think that this could happen pretty often, as these limits are set to worst case scenario, not based on average traffic.
I want to configure autoscaler so that it will create multiple replicas in case of such event, not only when the cpu/memory usage is near limit.
How do I properly solve this problem? Thanks for help!
I would recommend doing the following instead of trying to configuring apache to limit itself internally:
Enforce resource limits on pods. i.e let them OOM. (but see NOTE*)
Define an autoscaling metric for your deployment based on your load.
Setup a namespace wide resource-quota. This enforeces a clusterwide limit on the resources pods in that namespace can use.
This way you can let your Apache+PHP pods handle as many requests as possible until they OOM, at which point they respawn and join the pool again, which is fine* (because hopefully they're stateless) and at no point does your over all resource utilization exceed the resource limits (quotas) enforced on the namespace.
* NOTE: This is only true if you're not doing fancy stuff like websockets or stream based HTTP, in which case an OOMing Apache instance takes down other clients that are holding an open socket to the instance. If you want, you should always be able to enforce limits on apache in terms of the number of threads/processes it runs anyway, but it's best not to unless you have solid need for it. With this kind of setup, no matter what you do, you'll not be able to evade DDoS attacks of large magnitudes. You're either doing to have broken sockets (in the case of OOM) or request timeouts (not enough threads to handle requests). You'd need far more sophisticated networking/filtering gear to prevent "good" traffic from taking a hit.
I want to avoid hot spot only in case of client requests. What criteria should I take into account?
Some papers define this threshold in 500 QPS (read) but i want something that based on some metrics in a real scenario. In my case when client request reach a threshold on a master node, i migrate the keys to other master (that do not exceed this threshold) and redirect the client there and the number of requests.
Can i define in Redis a threshold based on number of requests in every instance?
After several experiments, i find a solution. The threshold selected on the basis of the response time. As presented in below Figure, the response time significantly increased in case of Request Rate > 20000.
My machine has the following configuration:
Ubuntu 14.04 LTS 64-bit
Intel® CoreTM i5-4570 CPU # 3.20GHz × 4
7,7 GiB RAM
For this test, I have a simple Java servlet that reads data in and calculates the CRC32 for it. When making serial requests of 512MB each, I get about 600MB/sec. That makes sense since I can't use all 24 cores available to me to calculate a CRC. The program driving this I/O is sitting on the local box to eliminate the possibility of networking issues. I am running Tomcat 8.0.24.0 on FreeBSD using OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode).
Next, I attempt the same test with 6 concurrent requests, expecting that the performance per request might be lower than 600MB/sec, but that the aggregate performance across all 6 requests would be significantly higher.
What I see is the CPU has some idle time at ALL times (so it doesn't appear that I'm CPU-bound). I also see that all processing threads in Tomcat are running concurrently as anticipated. However, it looks like I'm only getting around 800MB/sec in aggregate. The threads in Tomcat spend most of their time waiting to read from the socket, as shown below.
I would appreciate any thoughts on how to improve Tomcat throughput / why so much time is spent waiting for more data (which I assume is what's going on below).
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1386)
at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1388)
at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:251)
at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:232)
at org.apache.coyote.http11.InternalNioInputBuffer.fill(InternalNioInputBuffer.java:133)
at org.apache.coyote.http11.InternalNioInputBuffer$SocketInputBuffer.doRead(InternalNioInputBuffer.java:177)
at org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:110)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:416)
at org.apache.coyote.Request.doRead(Request.java:469)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:342)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:395)
at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:367)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:190)
...
During peak hours of our service, I notice the CPU load goes up to 100%. However, when I SSH into the machine and use top or htop, I don't ever see the CPU usage go above 25%. This instance is a dedicated load balancer running HAProxy.
Here is a screenshot of the Dashboard: https://gyazo.com/010715208f81ec97c1bc9b78123fe4d4
Here is a screenshot of top in the instance: https://gyazo.com/afa7098331f7a3f66c018041b9be686d
During these peak times, I noticed there are some latency and this is not caused by database or my other server instances as I checked their loads and were far below threshold. I was wondering if Compute Engine is throttling by mistaking it is at 100% CPU usage or something?
Has anyone had a similar experience?