I am testing an api in Jmeter from a Linux loadgenerator machine. When executed in non GUI mode, I am seeing a latency of 35s. But when did a ping command from LG server to the app server, the time was just in milli sec-
35 sec latency from the view results tree-
Both the servers are in the same network. Then why is there so much latency.
You're looking into 2 different metrics.
Ping sends an ICMP packet which just indicates success of failure in communicating between 2 machines.
Latency includes:
Time to establish connection
Time to send the request
Time required for the server to process the request
Time to get 1st byte of the response
So in other words Latency is Time to first byte and if your server needs 35 seconds to process the request it indicates a server-side issue rather than a network issue.
More information:
JMeter Glossary
Understanding Your Reports: Part 1 - What are KPIs?
Related
I have both a boost ssl client and server. Part of my testing is to use the client to send a small test file (~40K) many times to the server, using the pattern socket_connect, async_send, socket_shutdown each time the file is sent. I have noticed on both the client and the server that the ssl_socket.shutdown() call can take upto 10 milliseconds to complete. Is this typical behavior?
The interesting behavior is that the 10 millisecond completion time does not appear until I have executed the connect/send/shutdown pattern about 20 times.
I have an nginx between the clients of my web app and my nodejs express server. some of my clients have packet loss.
I'm testing with 20% packet loss (inbound or outbound - via Clumsy on windows, with chrome). I'm requesting a big resource on a simple express web server (500k bytes).
I'm encountering a very slow response time - each chunk (about 16k) sometimes arrives within < 10 ms, sometimes > 100 ms and in every test I have few chunks that arrives after a few seconds (even up to 20-40 seconds).
my upstream (express) served the whole resource to the ngnix within 0.6 seconds (according to the log per chunk in my express and the ngnix access log).
for reference - when requesting videoPlayback resource from youtube, and testing with 20% packet loss - the chunks are about the same size and each one arrives within less than 100 ms. so although I'm expecting a slower response time - 20 seconds for one chunk (not the whole resource) is a problem.
I found no errors in the ngnix error_log, no packet loss on the ngnix machine, played with the ngnix buffers (no buffers, bigger buffers) - no result there. no writing to the machine disc. no timeouts in the ngnix side.
any idea? any other relevant ngnix configuration? maybe the ngnix machine kernel tcp congestion control configuration? thanks!!
tl;dr:
When a google cloud HTTPS load balancer opens a tcp stream (with a "Connection: keep-alive" header in the request), are there any guarantees around how long (at max) that stream will be kept open to the backend server?
longer:
I deployed a Go http server behind an HTTPS load balancer and quickly ran into a lot of issues because I had set an aggressive (10s) read deadline on my socket connections, which meant that my server often closed connections in the middle of reading subsequent requests. So clearly I'm doing that wrong, but at the same time I don't want to not set ANY deadlines on my sockets, because I want to guard against the possibility of these servers slowly leaking dead connections over time, eating up all my file descriptors.
As such, it would be nice if, for example, the load balancers automatically close any tcp streams that they have open after 5 minutes. That way I can set my server's read deadline to (e.g.) 6 minutes and I can be sure that I'll never interrupt any requests - the deadline will only be invoked in exceptional cases (e.g. the FIN packet from the load balancer was not received by my server).
I was unable to get an official answer on this from Google enterprise support, but from my experiments (analyzing multi-hour tcpdumps) it looks like the load balancer will close connections after ~10 minutes of idleness (meaning no tcp data packets for 10 minutes).
Per here, idle TCP connections to Compute Instances are timed out after 10 minutes, which would seem to confirm your hypothesis.
I'm using JMeter to test our backend services by using OS Samplers. I'm using CURL in the OS samplers to generate the load of a 4 step process.
POST the certificate to receive the token
POST the token to receive the session
GET session info
POST renew session
The issue is that I'm facing is JMeter reports much higher levels of response time than the service logs. We need to identify where the extra time (+125 ms with 1 concurrent user) is coming from during the transaction execution. The test environment is all on the same VLAN without firewalls or proxy servers between the two client and target servers. The median latency between the two servers is .3 ms with the average being 1.2 ms (with a small sample size). Speaking to the dev team they state that the service logs don't log the very first moment when request is received but can't see how it could be more than just a few ms difference. Data from a few tests which increases the throughput and the overhead roughly remains constant would be consistent with that assumption.
So we focusing on seeing if JMeter is causing the extra overhead, at this point. One assumption is that JMeter begins the transaction time when it begins to generate the CURL request and the packaging of the request is included in this timing. So we want to remove the CURL OS Sampler from the test and replace it with a HTTP Sampler.
When converting the JMeter OS Sampler CURL request to HTTP Sampler HTTPS request we're running into an error JMeter: Non HTTP response message: Connection to URL refused. As stated above we first post the certificate, post the token and then steps 3 and 4. The HTTP Sampler is failing on the 2nd step when posting the acquired token from the first step. We've verified that the acquired token is good by continuing on error and processing the 2nd step original CURL POST request. So there are 2 things to here. 1. the error message says it never completes the handshake so it doesn't get to the point of processing the message. 2. the following CURL request using the same information completes the handshake and correctly processes the transaction.
Making the conversion boils down to a question of "Why would sending a OS Sampler CURL command complete and a HTTP Sampler fail to complete a handshake?
OS Sampler CURL command is configured as:
curl -k -d "" -v -H "{token}" {URL}
HTTP Sampler is configured as:
IP: {URL}
PORT: {Port#}
Implementation: HttpClient4
Protocol: HTTPS
Method: POST
Path: {path}
Use KeepAlive: Check
Header Manager: {token}
There are two separate questions in your post. Regarding the first:
How are you measuring latency between the servers? If you're using ping, you're measuring the round trip time for 1 send and receive. A HTTP POST is usually more than that including TCP back and forth to handshake and then sending out content - which again depending on size can be split across several packets - HTTP responses usually are larger than requests. There is also a possibility of latency being a tad bit higher for larger payload packets compared to a simple ping.
This might not account for the whole difference you're seeing (Like you've noted, some of it comes from the delay in launching curl), but is still something that contributes to increased overall latency. You should use a network analyzer of some sort, at the very least a sniffer like WireShark to understand the chattiness or number-of back and forth turns for each HTTP step you're using.
Webkit is telling me that a page's load time, the page being served via EC2, is 651ms. 502ms of that was "latency", and 149 was "download". What could the 502ms of latency be? Is that the time it takes to render the page on EC2 and send it back to the client?
Typically time required for a web request consist of
1. DNS lookup
2. TCP handshake time + request(two round trip for fresh connection)
3. Time to generate the page (Server side time).
4. Download time.
1+2+3 is latency.
Since ping time in your case has very high variance it can be due to network either on your side or ec2 side or in-between. Can you ping other ec2 boxes/ or other boxes from your home/office and try to isolate the issue is its on which side.
Just add those pings to the question let me see if I can help.