HAproxy passive health checking

HAproxy passive health checking - load-balancing

I'm new to haproxy and load balancing. I want to see what happens when a backend host is turned off while the proxy is running.
The problem is, if I turn off one of the backends and refresh the browser the page immediateltly exposes a 503 error to the user. After the next page load, it no longer gets the error since presumably that backend has been removed from the pool.
As a test I have set up two backend Flask apps and configured HAProxy to balance them like so:
backend app
mode http
balanace roundrobin
server app1 127.0.0.1:5001 check
server app2 127.0.0.1:5002 check
My understanding according to this:
https://www.haproxy.com/doc/aloha/7.0/haproxy/healthchecks.html#check-parameters
is that every 2 seconds a the backend hosts are pingged to see if they are up. Then they are removed from the pool if they are down. The 5xx error happens between the time I kill the backend and the 2 seconds.
I would think there is a way to get around this 5xx error by having HAProxy perform a little logic such that if a request from the frontend fails, it would then remove that failed backend from the pool and then switch to another and make another request. This way the user would never see the failure.
Is there a way to do this, or should I try something else so that my user does not get an error?

By default haproxy will retry 3 times (retries) with 1s intervals to the same backend. In order to allow to take another backend you should set option redispatch.
Also consider to (carefully, it can be hamrful):
decrease fall (default is 3),
decrease error-limit (default is 10) and set on-error to mark-down or sudden-death
tune healthcheck intervals with inter/fastinter/downinter
Note: Haproxy retries only on connection errors (e.g. ECONNNREFUSED like in your case), it will not resend/resubmit request/data.

Related

What can I do to fix a 504 gateway timeout error?

I have been using jquery to try and pull data from an API. However I am getting a 504 error. Even when I am using postman to test the data this happens. Can anyone suggest what I need to do to get around this?

I recently experienced this issue on one of my app's that was making an ambitious call to it's Firebase Database - it was grabbing a very large record, which took over 60 seconds (the default timeout) to retrieve.
For those experiencing this error, that have access to their app / site's hosting environment, which is proxying through NGINX, this issue can be fixed by extending the timeout for API requests.
In your /etc/nginx/sites-available/default or /etc/nginx/nginx.conf add the following variables:
proxy_connect_timeout 240;
proxy_send_timeout 240;
proxy_read_timeout 240;
send_timeout 240;
Run sudo nginx -t to check the syntax, and then sudo service nginx restart.
This should effectively quadruple the time before NGINX will timeout your API requests (the default being 60 seconds, our new timeout being 240 seconds).
Hope this helps!

There is nothing you can do.
You are sending a request to a server. This particular request fails, because the server sends a request to a proxy, and gets a timeout error. Your server reports this back to you as status 504.
The only way to fix it is to fix the proxy (make it respond in a timely manner), or to change the server to not rely on that proxy. Both are outside your area.
You cannot prevent such errors. What you can do is find out what user experience there should be when such a problem happens, and implement it. BTW. If you get 504 errors, then you should also expect timeout errors. Say you make a request to your server with 60 second timeout, and your server makes a request to the proxy with 60 second timeout. Because both timeouts are the same, sometimes your server will receive the proxy timeout and send it to you (status 504), but sometimes your request to the server will time out just before that, and you get a timeout error.

One way to fix this is, Changing proxy settings to "NO PROXY" in the browser
It works in Firefox browser, Others I don't know.
Follow the steps:
1. Go to preferences (top right corner 3 lines, search in drop down)
Search for proxy (Ctrl+F)
Go inside the settings
select "NO PROXY" in the choice buttons, below "Configure Proxy Access to the Internet"
refresh web page.

Apache Request/Response Too Slow

I have an Apache server with 16GB of Ram. The script cap.php returns a very small chunk of data (500B). It starts a mysql connection and makes a simple query.
However, the response from the server is, in my opinion, too lengthy.
I attach a screenshot of the Developer Tool Panel in Chrome.
Beside SSL and the TTFB there is a strange delay of 300ms (Stalled).
If I try a curl from the WebServer:
curl -w '\nLookup time:\t%{time_namelookup}\nConnect time:\t%{time_connect}\nPreXfer time:\t%{time_pretransfer}\nStartXfer time:\t%{time_starttransfer}\n\nTotal time:\t%{time_total}\n' -k -H 'miyazaki' https://127.0.0.1/ui/cap.php
Lookup time: 0.000
Connect time: 0.000
PreXfer time: 0.182
StartXfer time: 0.266
Total time: 0.266
Does anyone know what that is?

Eventually, I found that if you use SSL it is really better and it does really matter to switch on the KeepAlive directive into Apache. See the picture below.

According to the Chrome documentation:
Stalled/Blocking
Time the request spent waiting before it could be sent. This time is inclusive of any time spent in proxy negotiation.
Additionally, this time will include when the browser is waiting for
an already established connection to become available for re-use,
obeying Chrome's maximum six TCP connection per origin rule.
So this appears to be a client issue with Chrome talking to the network rather than a server config issue. As you are only making one request I think we can rule out the TCP limit per origin (unless you have lots of other tabs using up these connections) so would guess either limitations on your PC (network card, RAM, CPU) or infrastructure issues (e.g. You connect via a proxy and it takes time to set up that connection).
Your curl request doesn't seem to show this delay as it has just a 0.182 wait time to send the request (which is easily explained with https negotiation) and then a 0.266 total time to download (including the 0.182). This compares with 0.700 seconds when using Chrome so don't understand why you say "total time is similar" when to me it's clearly not?
Finally I do not understand your follow up answer. It looks to me like you have made request, presumably after a recent other request as this has skipped the whole network connection stage (including any grey stalling, blue DNS lookup, orange initial connection and purple https connection). So of course this quicker. But it's not comparing like for like with your first screenshot in your question and is not addressing your question.
But yes you absolutely should be using keep-alives (they are on by default in most web server so usually takes extra efforts to turn them off) and https resumption techniques (not on by default unless you explicitly add this to your https config) to benefit any additional requests sent shortly after the first. But these will not benefit the first connection of the session.

Same Session is being used for Multiple requests

We have developed an application where after a successful request, session has to be destroyed. This is working fine, when we have a single tomcat.
But, this is not happening when we use multiple tomcats under Apache simple load balancer (We are using Load Balancer, for balancing the requests between two tomcats, which are hosting the same application).
The SessionID that is created and have processed successfully, can be used for one more transaction, after which it is getting killed.
Moreover, the SessionID value is being appended with either 'n1' or 'n2' (SessionID-n1). I am not sure of why is this happening.
Please help me to resolve this issue.
We have a configuration setup as below:
Load Balancer
/ \
Cluster1 Cluster2
| |
Tomcat1 Tomcat2
Thanks,
Sandeep

If you have configured each Tomcat node to have a "jvmRoute", then the string you specify there will be appended to the session identifier. This can help your load-balancer determine which back-end server should be used to serve a particular request. It sounds like this is exactly what you have done. Check your CATALINA_BASE/conf/server.xml file for the word "jvmRoute" to confirm.
If you only use a session for a single transaction, why are you bothering to create the session in the first place? Is a request == transaction?
If you are sure you are terminating the session when the transaction is complete, you should be okay even if the client wants to try to make a new request with the same session id. It will no longer be valid and therefore useless to the client.
It's not clear from your question whether there is an actual problem with the session because you claim it is "getting killed" which sounds like what you want it to do. If you provide more detail on the session expiration thing, I'll modify my answer accordingly.

Play 2.1.0 + Apache 2.2 Reverse proxy => 502 proxy error when idle

Config
We have a play 2.1.0 with angularjs setup in a production mode.
We have reverse proxy load balancer setup with apache 2.2 something like mentioned in here
http://www.playframework.com/documentation/2.1.0/HTTPServer
This whole app is running in an iframe inside navigated from a jboss application.
Problem
Most of the time it works and sometimes when the connection is left idle for 2/3 hours, untouched, no one hit the reverse proxy url to load the jboss/play, then we are getting the 502 proxy error in the iframe content after a few mins wait.
Play receives the request, but somehow decides not to respond at all. This occurs only for the first time or couple of time after the wakeup. Then when we refresh the page play receives the request and responds it properly.
Tried
We get a tcpdump on the play port and it we have got all the requests being received, but no response sent from play for the failed scenario. Whereas the same request got responded by play subsequent times.
X-Forwarded-For: ,X-Forwarded-Host: X-Forwarded-Server: .. Connection: Keep-Alive - all these headers are being sent in the lost response tcpdump.
Tried KeepAlive, with timeouts in the proxy server, not much help. Why the play didn't respond for the initial connections after idle state, is there any conf we can set to keep it alive?
Workaround
Polling the play server url constantly every half an hour from the same server makes this issue not reproducible.
Still any help/suggestions would be really appreciated to fix this issue..

I tried to solve this problem myself. Approaches like the answers mentioned here and here did not change anything.
I then decided to go for nginx again which I have been using with Play applications before. The setup is to be found here. Since then the problem is gone.

Difference in response time between http vs https

I tested my web site with 100 users with http and https. The response time obtained in https is much higher compared to the response time obtained in http. The response time of https is nearly four times greater than http. Can anyone explain me why the response time is higher in https compared to http? or do i need to change any SSL property in jmeter system.properties? Thanks in Advance..!

SSL Handshake assumes 4 requests for establishing a connection so first request should be something like 4x times longer than in case of HTTP. See The SSL handshake diagram for more info
However if you receive 4 times performance degradation for all requests - that doesn't sound right.
There are following JMeter properties which control SSL flows:
https.sessioncontext.shared - controls whether SSL session contexts are created per thread (if it's set to false) or shared (if it's set to true)
https.use.cached.ssl.context - controls if cached SSL context is being reused between iterations
These properties live in jmeter.properties file under /bin folder of your JMeter installation. It's also possible to override them using -J command line key as follows:
jmeter -Jhttps.sessioncontext.shared=true -Jhttps.use.cached.ssl.context=true
See Apache JMeter Properties Customization Guide for more details.
If above setting won't help you'll need to review your test plan and perhaps profile application to see where this extra time is spent.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas