We have two recently upgraded Plone 4.3.2 instances behind a haproxy load balancer which itself is behind Apache.
We limit each Plone instance to serving two concurrent requests using haproxy configuration.
We recently encountered an issue whereby a client sent 4 byte-range requests in quick succession for a PDF that each took between 6 and 8 minutes to get a response. This locked up all available requests for 6 minutes and so haproxy timed out other requests in the queue. The PDF is stored an ATFile object in Plone which I believe should have been migrated to blob storage in our recent upgrade.
My question is what steps should we take to prevent a similar scenario in the future?
I'm also interested in:
how to debug why the byte-range requests on an otherwise lightly loaded server should take so long to respond
how plone.app.blob deals with byte-range requests
is it possible to configure Apache such that byte-range requests are served from its cache but not from the back-end server
As requested here is the haproxy.cfg with superfluous configuration stripped out.
global
maxconn 450
spread-checks 3
defaults
log /dev/log local0
mode http
option http-server-close
option abortonclose
option redispatch
option httplog
timeout connect 7s
timeout client 300s
timeout queue 120s
timeout server 300s
listen cms 127.0.0.1:18181
id 3
balance leastconn
option httpchk
http-check send-state
timeout check 10s
acl cms_edit url_dom xxx.xxx.xxx.xxx
acl cms_not_ok nbsrv() lt 2
block if cms_edit cms_not_ok
server cms_instance1 app:18081 check downinter 10s maxconn 2 rise 1 slowstart 300s
server cms_instance2 app:18082 check downinter 10s maxconn 2 rise 1 slowstart 300s
You can install https://pypi.python.org/pypi/Products.LongRequestLogger and check its log file to see where the request gets stuck.
I've opted to disable byte-range requests to the back-end Zope server. I've added the following to the CMS listen section in haproxy.
reqidel ^Range:.*
Related
Our developers have recently said they added code for load balancing to work in our webapp. So I've spent the last 2 days and I'm at a loss on why its not working. First I found that the Site ID's where different on both IIS Servers so that is fixed. Anyways when balancing is setup the ASP.NET_SessionId keeps resetting as if its not getting stored properly in the Database. I have looked at the table and there is an entry in there.
Using HAproxy and have 2 Windows Servers 2016 IIS10. Looking at both servers the code below is what the devs said they added.
<sessionState timeout="15" mode="SQLServer" sqlConnectionString="Integrated Security=SSPI;data source=DB1.resource.local;initial catalog=prodAPP;" allowCustomSqlDatabase="true" />
<machineKey decryptionKey="REMOVEDFORSECURITY" validationKey="REMOVEDFORSECURITY" />
The backend for HAProxy
backend prodAPP
balance roundrobin
mode http
option httpchk GET https://prodAPP.com
http-check disable-on-404
option http-server-close
option forwardfor
option redispatch
http-request set-header X-Forwarded-Port %[dst_port]
server webprod01 10.10.10.10:443 check ssl verify none fall 3 rise 5 inter 60000 weight 10
server webprod02 10.10.10.11:443 check ssl verify none fall 3 rise 5 inter 60000 weight 10
Any ideas on what I can try to get this thing working?
EDIT/UPDATE
So I thought about taking the proxy/load balancer out of the mix and tested by just pointing my host file at webserver1 logging into the webapp and moving around. Then changing the host file to point to webserver2 and saved the host file. Then clicked a linked and continued to use the webapp without the ASP.NET_SessionId resetting.
So, what is going on with the proxy that is making the ASP.NET_SessionId reset?
EDIT/UPDATE #2
After figuring out my host file wasnt actually changing with properly the chrome session socket seem to stick to that original host. Only way to properly get it to change servers was to disconnect the virtual nic on the server.
Anyways its an issue with the web servers that I can tell now, but I'm still confused on why they are each thinking the session is different and generating a new key.
I am looking for a haproxy (HAProxy version 1.5.18) configuration which will allow websocket loadbalancing as well as RabbitMQ load balancing. I have tried many options but none seem to work, below is my haproxy config file:
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 15s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
timeout tunnel 3600s
frontend http_web *:80
mode http
default_backend rgw
backend rgw
balance roundrobin
server rgw1 173.36.22.49:8080 maxconn 10000 weight 10 cookie rgw1 check
server rgw2 10.42.139.69:8080 maxconn 10000 weight 10 cookie rgw2 check
listen stats :9000
mode http
stats enable
stats realm Haproxy\ Statistics
stats uri /haproxy_stats # Stats URI
stats auth websocketadmin:websocketadmin
listen ampq
bind *:61613
mode tcp
option clitcpka
server rabbit1 10.42.6.112:61613 check inter 1s rise 3 fall 1
server rabbit2 10.42.6.113:61613 check inter 1s rise 3 fall 1
server rabbit3 10.42.6.114:61613 check inter 1s rise 3 fall 1
server rabbit4 10.42.6.115:61613 check inter 1s rise 3 fall 1
Haproxy doesn't give any error, it prints the below message, but it doesn't work, i cannot connect to websocket or connect to Rabbitmq. But as soon as i remove "listen ampq", everything starts working fine.
Sep 8 21:00:40 localhost haproxy[3184]: Proxy http_web started.
Sep 8 21:00:40 localhost haproxy[3184]: Proxy rgw started.
Sep 8 21:00:40 localhost haproxy[3184]: Proxy stats started.
The problem was the port 61613, which was already taken by another process. So i had to change to a new port and add it in the firewall rules and it is working now.
We are getting lots of 408 status code in apache access log and these are coming after migration from http to https .
Our web server is behind loadbalancer and we are using keepalive on and keepalivetimeout value is 15 sec.
Can someone please help to resolve this.
Same problem here, after migration from http to https. Do not panic, it is not a bug but a client feature ;)
I suppose that you find these log entries only in the logs of the default (or alphabetically first) apache ssl conf and that you have a low timeout (<20).
As of my tests these are clients establishing pre-connected/speculative sockets to your web server for fast next page/resource load.
Since they only establish the initial socket connection or handshake ( 150 bytes or few thousands) the connect to the ip and do not specify a vhost name, and got logged in the default/firs apache conf log.
After few secs from the initial connection they drop the socket if not needed or the use is for faster further request.
If your timeout is lower than these few secs you get the 408 if is higher apache doesn't bother.
So either you ignore them / add a different default conf for apache, or you rise the timeout having more apache processes busy waiting from the client to drop or use the socket.
see https://bugs.chromium.org/p/chromium/issues/detail?id=85229 for some related discussions
I open a socket connection to Apache server however I don't send any requests waiting for a specific time to do it. How long can i expect Apache to keep this idle socket connection alive?
Situation is that Apache server has limited resources and connections require to be allocated in advance before they all gone.
After request is sent server advertise its timeout policy:
KeepAlive: timeout=15,max=50
If consequent request is sent in longer then 15 seconds it gets 'server closed connection' error. So it does enforce the policy.
However, it seems that if no requests are sent after connection was opened Apache will not close it even for as long as 10 minutes.
Can someone shed some light on behavior of Apache in such situation.
According to Apache Core Features, TimeOut Directive the default timeout is 300 seconds but it's configurable.
For keep-alive connections (after the first request) the default timeout is 5 sec (see Apache Core Features, KeepAliveTimeout Directive). In Apache 2.0 the default value was 15 seconds. It's also configurable.
Furthermore, there is a mod_reqtimeout Apache Module which provides some fine-tuning settings.
I don't think that any of the mentioned values are available for http clients via http headers or any other forms. (Except the keep-alive value of course.)
I am using appsession config element for sticky session. I have 5 weblogic instances 3 of them are active and serving load now when load increases i start additional 2 instances. Now HAProxy marks them "Helthy" but does not transfer any traffic to it because it sticky.
How do I transfer existing sessions to new weblogic servers. I am using Terracotta for session clustering so it does not matter which server is serving the request. Below is my config for HAProxy.
# this config needs haproxy-1.1.28 or haproxy-1.2.1
global
log 127.0.0.1 local0
maxconn 1024
daemon
# debug
#quiet
defaults
log global
mode http
option httplog
option httpchk
option httpclose
retries 3
option redispatch
contimeout 5000
clitimeout 50000
srvtimeout 50000
stats uri /admin?stats
stats refresh 5s
listen terracotta 0.0.0.0:10001
# balance url_param JSESSIONID
balance roundrobin
option httpchk OPTIONS /Townsend
server L1_1 10.211.55.1:7003 check
server L1_2 10.211.55.2:7004 check
server L1_3 10.211.55.3:7004 check
appsession JSESSIONID len 52 timeout 3h
Then if it does not matter which server serves the request, disable stickiness and remove the appsession line. You must understand that stickiness is the opposite of load-balancing. If your issue is that you don't scale, don't stick first.