Webkit is telling me that a page's load time, the page being served via EC2, is 651ms. 502ms of that was "latency", and 149 was "download". What could the 502ms of latency be? Is that the time it takes to render the page on EC2 and send it back to the client?
Typically time required for a web request consist of
1. DNS lookup
2. TCP handshake time + request(two round trip for fresh connection)
3. Time to generate the page (Server side time).
4. Download time.
1+2+3 is latency.
Since ping time in your case has very high variance it can be due to network either on your side or ec2 side or in-between. Can you ping other ec2 boxes/ or other boxes from your home/office and try to isolate the issue is its on which side.
Just add those pings to the question let me see if I can help.
Related
lets say there are 2 web services. The goal is, that the app gateway routes the requests to both of them. If one of them is down, it should cache all the requests. Once it is up again, which can happen hours later, all the requests cached in the meantime should be send to it in the correct sequence. This is to preserve both services in the same state. Is something like this possible with an application gateway? Or with any other webserver/tool?
Thanks!
u can do that but u need some configuration HTTP Load Balancing
Load Balancer Overview
The capacity of a single server is limited. Once a website gains more and more attraction the instance serving the site comes to a point where it can not handle any more users. The website starts to slow down or even become unavailable as the server goes down from the traffic.
This is the point where a load balancer enters the game. It allows to spread the “load” that all those visitors and their requests create to be “balanced” over a series of different instances.
In case of increasing load on a setup, capacity can easily be increased by adding more instances to the load balancers backend. This allows to scale your infrastructure without any downtime or delays whilst waiting for DNS zones to be updated.
My configuration is apache and tomcat behind an aws elb. Apache is configured with no keepalive, and set with max clients to a low number due to each query being very cpu intensive. I'll load test the machine with queries. Then the number of available requests goes to zero as can be seen by curl -s localhost/server-status?auto not responding immediately. When I stop the loadtest I can see the scoreboard from curl -s localhost/server-status?auto still is full of R's even though from the tomcat logs it is clear nothing is happening. Does anyone have an idea on what possible causes there might be?
If your apache display 'R' in the status it means there is open TCP connections from ELB to apache (just an open TCP connection, no data sent yet).
There is no official complete documentation on this subject (how the numbers of pre-opened connections is optimized ), but the amazon documentation state (at this page: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html ) that:
Classic Load Balancers use pre-open connections but Application Load Balancers do not.
So, the answer is: it is an optimization from amazon (TCP connection cost a little bit to be openned).
tl;dr:
When a google cloud HTTPS load balancer opens a tcp stream (with a "Connection: keep-alive" header in the request), are there any guarantees around how long (at max) that stream will be kept open to the backend server?
longer:
I deployed a Go http server behind an HTTPS load balancer and quickly ran into a lot of issues because I had set an aggressive (10s) read deadline on my socket connections, which meant that my server often closed connections in the middle of reading subsequent requests. So clearly I'm doing that wrong, but at the same time I don't want to not set ANY deadlines on my sockets, because I want to guard against the possibility of these servers slowly leaking dead connections over time, eating up all my file descriptors.
As such, it would be nice if, for example, the load balancers automatically close any tcp streams that they have open after 5 minutes. That way I can set my server's read deadline to (e.g.) 6 minutes and I can be sure that I'll never interrupt any requests - the deadline will only be invoked in exceptional cases (e.g. the FIN packet from the load balancer was not received by my server).
I was unable to get an official answer on this from Google enterprise support, but from my experiments (analyzing multi-hour tcpdumps) it looks like the load balancer will close connections after ~10 minutes of idleness (meaning no tcp data packets for 10 minutes).
Per here, idle TCP connections to Compute Instances are timed out after 10 minutes, which would seem to confirm your hypothesis.
I am writing a TCP/IP server that handlers persistent connections. I'll be using TLS to secure the communication and have a question about how to do this:
Currently I have a load balancer (AWS ELB) in front of a single server. In order for the load balancer to do the TLS termination for the duration of the connection it must hold on to the connection and forward the plain text to the application behind it.
client ---tls---> Load Balancer ---plain text---> App Server
This works great. Yay! My concern is that I'll need a load balancer in front of every app server because, presumably, the number of connections the load balancer can handle is the same as the number of connections the app server can handle (assuming the same OS and NIC). This means that if I had 1 load balancer and 2 app servers, I could wind up in a situation where the load balancer is at full capacity and each app server is at half capacity. In order to avoid this problem I'd have to create a 1 to 1 relationship between the load balancers and app servers.
I'd prefer the app server to not have to do the TLS termination because, well, why recreate the wheel? Are there better methods than to have a 1 to 1 relationship between the load balancer and the app server to avoid the capacity issue mentioned above?
There are two probable flaws in your presumption.
The first is the assumption that your application server will experience the same amount of load for a given number of connections as the load balancer. Unless your application server is extremely well-written, it seems reasonable that it would run out of CPU or memory or encounter other scaling issues before it reached the theoretical maximum ~64K concurrent connections IPv4 can handle on a given IP address. If that's really true, then great -- well done.
The second issue is that a single load balancer from ELB is not necessarily a single machine. A single ELB launches a hidden virtual machine in each availability zone where you've attached the ELB to a subnet, regardless of the number of instances attached, and the number of ELB nodes scales up automatically as load increases. (If I remember right, I've seen as many as nodes 8 running at the same time -- for a single ELB.) Presumably the class of those ELB instances could change , too, but that's not a facet that's well documented. There's not a charge for these machines, as they are included in the ELB price, so as they scale up, the monthly cost for the ELB doesn't change... but provisioning qty = 1 ELB does not mean you get only 1 ELB node.
Can anybody suggest how to update a web-service /wcf service holding live transactions without any downtime?
Please don't suggest updating at off peak hours, because our service runs 24x7 and transactions may run any period of time.
So what is the best way to update such service with some changes so that current transactions don't get affected?
I use load balancing to prevent errors and keep the site up during updates. For this to work you need at least two servers behind a load balancer that sends traffic to whichever server is up. My procedure for updating my sites is:
Tell "Server A" to start serving an error page at the URL that the load balancer pings. This tells the load balancer to stop sending traffic to this server.
Wait 30 seconds or so until traffic stops hitting the server.
Update the code on this server.
Tell the server to stop serving an error page at the ping URL.
Wait until that server is getting traffic again.
Repeat steps 1-5 with "Server B".
From my experience... i use a Git server and a local machine to edit the code to make sure it all works, and then push to the server, then pull on the live site, and then it has 0 down time....
Without 0 seconds downtime, It is not possible.
Least you can do is Update the web-services, test that thoroughly at say staging server. Verify and be sure updates are working perfect.
Update the services- part (a xyz.war file in my case) and re-start the server for getting this into effect.
If there is any DB or other changes involved in this process, i make a script of that and test the same at staging server before going live.
Just one suggestion: Take proper backup of current live server for the otherwise case.