I want to avoid hot spot only in case of client requests. What criteria should I take into account?
Some papers define this threshold in 500 QPS (read) but i want something that based on some metrics in a real scenario. In my case when client request reach a threshold on a master node, i migrate the keys to other master (that do not exceed this threshold) and redirect the client there and the number of requests.
Can i define in Redis a threshold based on number of requests in every instance?
After several experiments, i find a solution. The threshold selected on the basis of the response time. As presented in below Figure, the response time significantly increased in case of Request Rate > 20000.
My machine has the following configuration:
Ubuntu 14.04 LTS 64-bit
Intel® CoreTM i5-4570 CPU # 3.20GHz × 4
7,7 GiB RAM
Related
I have a python application writing pubsub msg into Bigquery. The python code use the google-cloud-bigquery library and the TableData.insertAll() method quota is 10,000 requests per second per table.Quotas documentation.
Cloud Run container auto scaling is set to 100 with 1000 requests per container.So technically, I should be able to reach 10 000 requests/sec right? With the BQ insert API being the biggest bottleneck.
I only have a few 100 requests per sec at the moment, with multiple service running at the same time.
CPU and RAM at 50%.
Now confirming your project structure, and a few details given in the comments; I would then review the Pub/Sub quotas and limits, especially the Quota and the Resource limits, both tables where you can check this information depending on the size and the Throughput quota units sections tells you how to calculate quota usage.
I would answer your question as a yes, you are able to reach 10,000 req/sec. And as in this question depending on the byte size you can have 10,000 row inserts unless the recommendation is 500.
The concurrency in Cloud Run can be modified in case you need to change it.
Currently I am working my way into the topic of load and performance testing. In our planning, however, the customer now wants to have indicators for the load and performance test named. Here I am personally however over-questioned. What exactly are the performance indicators within a load and performance test?
You can separate the Performance indicators based on Client Side and Server Side Indicators:
1. Client Side Indicators : JMeter Dashboard
Average Response Time
Minimum Response Time
Maximum Response Time
90th Percentile
95th Percentile
99th Percentile
Throughput
Network Byte Send
Network Byte Received
Error% and different types of Error received
Response Time Over Time
Active Threads Over Time
Latencies Over Time
Connect Time Over Time
Hits Per Second
Codes Per Second
Transactions Per Second
Total Transactions Per Second etc.
You can also obtain Composite Graphs for better understanding.
2. Server Side Indicators :
CPU Utilization
Memory Utilization
Disk Details
Filesystem Details
Network Trafic Details
Network Socket
Network Netstat
Network TCP
Network UDP
Network ICMP etc.
3. Component Level Monitoring :
Language Specific likes Java, .Net, Python etc.
Database Server
Web Server
Application Server
Broker Statistics
Load Balancers etc.
Just to name a few.
While load/performance testing of API on ELB in AWS using JMeter, I see
AWS cloud watch Latency metric = 10 ms (seems good) and in JMeter's Summary Report Average metric = 3000 ms (seems bad).
The API returns 1MB of JSON data, but I don't understand why there is so much difference in numbers and is this api performance acceptable?
If the SLA said to have 100 ms API response time.
You are looking into different metrics:
Latency: JMeter measures the latency from just before sending the request to just after the first response has been received.
Elapsed time: JMeter measures the elapsed time from just before sending the request to just after the last response has been received.
So Latency is included into response time, it is so-called Time To First Byte and Elapsed Time is the Time to Last Byte. My expectation is that you should be sticking to what JMeter reports so you won't be confused with the metrics coming from different sources, JMeter is at least open source therefore you have the confidence regarding how the metrics are calculated.
If response time of 3 seconds is too high you can start looking into the reasons for this which could be:
Your API server is simply overloaded, check out CPU, RAM, Network, Disk usage using i.e. aforementioned Amazon CloudWatch or JMeter PerfMon Plugin
Your application configuration might not be ready for high loads. The majority of web/application/database servers defaults are suitable for application development and debugging only (same applies to JMeter) so most probably you will need to tune infrastructure.
Your application uses non-optimal algorithms. Use profiler tools to inspect where it spends time, what are the "heaviest" methods, how long database calls last, etc.
Also if your application is behind the ELB JMeter can cache IP address of one of the entry nodes and all your requests will be hitting only one host. To avoid this situation add DNS Cache Manager to your Test Plan.
References:
JMeter Glossary
JMeter Best Practices
The DNS Cache Manager: The Right Way To Test Load Balanced Apps
I was trying to do a capacity test on an apache web server, but there are some result I can't understand: according to the theory part of capacity planning, I should see three different regions on the plot of throughput in/out.
In the first region the expected result is the line y=x, meaning that the web server can follow my requests and reply to all with the code 200-OK (Thus, the throughput I request is equal to the throughput I get).
In the second region the expected result is the line y=k, where k is that throughput that indicates the saturation of the web server (Thus, the throughput I get can't go further k).
In the third region the expected result is a curve that goes from k to zero, that shows the degradation of web server, which for memory or CPU leaks starts to reject requests.
I have tried to replicate the experiment with a Virtual Machine running an instance of Apache as a server and the Physical Machine running an instance of Apache JMeter as a client. The result that I get is only the first two points, but also if I request a very very huge number of samples/seconds as throughput, I always get the saturation value.
Why I can't get the server going down, even if the CPU is 0% idle and the remaining memory is about 10MB? Or maybe is this the correct behavior and my hypothesis was incorrect?
Thank you in advance.
I have a lot of clients (around 4000).
Each client pings my server every 2 seconds.
Can these ping requests put a load on the server and slow it down?
How can I monitor this load?
Now the server response slowly but the processor is almost idle and the free memory is ok.
I'm running Apache on Ubuntu.
Assuming you mean a UDP/ICMP ping just to see if the host is alive, 4000 hosts probably isn't much load and is fairly easy to calculate. CPU and memory wise, ping is handled by you're kernel, and should be optimized to not take much resources. So, you need to look at network resources. The most critical point will be if you have a half-duplex link, because all of you're hosts are chatty, you'll cause alot of collisions and retransmissions (and dropped pings). If the links are all full duplex, let's calculate the actual amount of bandwidth required at the server.
4000 client #2 seconds
Each ping is 72 bytes on the wire (32 bytes data + 8 bytes ICMP header + 20 bytes IP header + 14 bytes Ethernet). * You might have some additional overhead if you use vlan tagging, or UDP based pings
If we can assume the pings are randomly distributed, we would have 2000 pings per second # 72 bytes = 144000 bytes
Multiple by 8 to get Bps = 1,152,000 bps or about 1.1Mbps.
On a 100Mbps Lan, this would be about 1.1% utilization just for the pings.
If this is a lan environment, I'd say this is basically no load at all, if it's going across a T1 then it's an immense amount of load. So you should basically run the same calculation on which network links may also be a bottle neck.
Lastly, if you're not using ICMP pings to check the host, but have an application level ping, you will have all the overhead of what protocol you are using, and the ping will need to go all the way up the protocol stack, and you're application needs to respond. Again, this could be a very minimal load, or it could be immense, depending on the implementation details and the network speed. If the host is idle, I doubt this is a problem for you.
Yes, they can. A ping request does not put much CPU load on, but it certainly takes up bandwidth and a nominal amount of CPU.
If you want to monitor this, you might use either tcpdump or wireshark, or perhaps set up a firewall rule and monitor the number of packets it matches.
The other problem apart from bandwidth is the CPU. If a ping is directed up to the CPU for processing, thousands of these can cause a load on any CPU. It's worth monitoring - but as you said yours is almost idle so it's probably going to be able to cope. Worth keeping in mind though.
Depending on the clients, ping packets can be different sizes - their payload could be just "aaaaaaaaa" but some may be "thequickbrownfoxjumpedoverthelazydog" - which is obviously further bandwidth requirements again.