Apache mod_proxy content-length vs chunked encoding - apache

I have apache configured to proxy for a (old, creaky, and basically http 1.0) web server that does not deliver a content-length header, just relying on closing the connection to mark end of data.
Under apache 2.2, mod_proxy handled this by using Transfer-Encoding: chunked, and then delivering the data as fast as the remote server would deliver the data. Under apache 2.4, mod_proxy handles this by waiting for the entire response from the remote server, then delivers the page with a content-length. As the backend server could potentially take 30+ seconds to gradually fill a page of results, the older behavior is preferable. There is no obvious configuration change that would have resulted in this behavior; I've tried out proxy-sendchunked but it doesn't seem to help (as described, it relates to data being uploaded by POST requests, which isn't the issue here).
Is this configurable and I've just missed it?

Related

Keep-Alive vs proxy-send-timeout vs proxy-read-timeout vs keep-alive express default request header in Nginx ingress-nginx-controller for Kubernetes

In Kubernetes cluster I have a an Nginx ingress controller that I believe it is termed the "proxy" to the underlying express server pod service.
As such, there are controls from the Nginx controller / load balancer that must be taken into account. Such as, the keep-alive timeout function.
With SSE you have to have the keep alive active and I have that set on my express server at 30 minutes.
What I am not sure of is how does the nginx service work in coordination of the express server since they are doing a similar function.
Does it even matter if I set the express server header timeout? Can I remove that and consolidate the functionality to the load balancer?
For the proxy-send and proxy-read timeouts I am assuming the send (which is described as a write) is what the SSE functions as. In terms of a websocket I would assume both would be used and the read would be a client message sent (read) would be the proxy read variety. Not sure if that is what correct.
In that case, of the proxy vs keep-alive on the nginx controller, again, is the keep-alive necessary because it seems to be a more granular control and usage based rather than overall based control? Meaning, can I just remove the keep alive setting because if my proxy timeout times are shorter the keep-alive, in theory, would never get reached.
I have an illustration of the setup.

What does HTTP/2 mean for a reverse proxy server?

How HTTP/2 affects the implementation of a proxy server? Especially, for example, when a client sends an HTTP/2 request to a content server that only supports HTTP/1.x, should the proxy server tranform the HTTP/2 request to HTTP/1.x request before directing the client request to the content server? And upon receiving response from the content server, should the proxy server transform the response to the HTTP/2 format before sending it back to the client?
As dsign discussed, your understanding is correct.
However I thought it was worth pointing out there are still massive advantages to HTTP/2 at your edge connection (i.e. Your reverse proxy) as the issues HTTP/2 solves (primarily latency) are less of an issue over the typically shorter, typically high bandwidth hop from the reverse proxy to the content server.
For example if you have a 100ms delay to the reverse proxy at the edge and only 1ms delay between the reverse proxy and the content server, then the fact the content server is speaking HTTP/1.1 to the proxy server probably won't have much impact on performance due to the super fast 1ms latency. So the end client (speaking HTTP/2 to the reverse proxy) still sees a massive performance boast over HTTP/1.1.
Yes, it is exactly as you say. Conversion from HTTP/2 to HTTP/1.1 must happen in one direction, and from HTTP/1.1 to HTTP/2 must happen in the other case.
In practice this means that although HTTP/2 the protocol doesn't need a traditional text-based parser, a comprehensive HTTP/2 server needs an HTTP/1.1 parser, not only to work with clients which are HTTP/1.1 only (among them crawlers) but also for talking to inner applications.
By use, one of the most important application protocols is FastCGI. FastCGI also requires parsing of HTTP/1.1 responses from the application and conversion to HTTP/2 responses to the client.

Varnish cluster and Last-Modified

In our setup we have 3 varnish servers in front of a cluster of backend servers. Varnish is load-balancing using round-robin. Resently we started using bundles for css and js. Each generated bundle gets a Last-Modified header from the backend server. We store the bundles for 24 hours.
The problem is, that when varnish retrieves the bundle from a backend server, the Last-Modified is slightly different depending on wich backend server is hit at which time. The result is, that approx. 50% of the browser requests get a 200 response instead of a 302 when asking with a conditional "If-Modified-Since" header.
I'm looking for some suggestions on how to solve this. Manipulating headers leaving the server could be a solution, but somehow it seems wrong. Having one backend server is unfortunately also not an option, due to scalability and deployment issues.

Apache returns 502 Bad Gateway with mod_proxy and large file

I am seeing a problem when sending a file through Apache configured as proxy to a local application using the Mongoose webserver.
My setup:
Apache (port 80) <-> mod_proxy <-> Mongoose (port 9090)
mod_proxy is configured to transfer certain URLs from port 80 to localhost:9090.
Mongoose only accepts authenticated responses. This works OK for normal (small) requests. With large file transfers however Apache returns a 502 Bad Gateway response.
What happens (well, actually just my analysis of what happens) is that when our client (a .net client, expect 100 enabled) tries to send a file it sends the headers followed directly by the contents of the file.
Mongoose receives the headers of the transfer, detects that it is not authenticated and returns a 401 Unauthorized and closes the connection. Now Apache (which is still receiving and processing the file transfer) cannot forward the data any more and returns a 502 Bad Gateway (The proxy server received an invalid response from an upstream server).
When sniffing on the external interface I see that the .net client sends the headers, followed within 20 msec by the contents, without receiving a 100 Continue. When the receive is finished Apache returns the 502.
When sniffing the internal interface I see that the header and body are combined into one tcp packet of 16384 bytes. Mongoose replies within a few msecs with the 401 and closes the connection.
It looks like Apache detects the close of the connection but ignores the 401 and does not forward this. Is there a possibility to have Apache correctly forward the 401 instead of replying with the 502?
For the moment I changed our application to just read all data from the connection if a 401 is detected, but this is just a workaround as this comes down to sending the complete file twice. As the files can be hundreds of megabytes this can give quite some stress on our system.
We are using Apache 2.2.9 (Debian) on an ARM system.
You are probably experiencing the Apache bug filed here https://bz.apache.org/bugzilla/show_bug.cgi?id=55433
Related links:
http://osdir.com/ml/bugs-httpd/2013-08/msg00119.html
http://osdir.com/ml/bugs-httpd/2013-08/msg00094.html
http://dev.eclipse.org/mhonarc/lists/jetty-users/msg04580.html
PS: I've hit the same issue, and it's rather an obscure bug (both to find info on it, and the bug itself). FWIW, nginx does not present the same behaviour.

disable request buffering in nginx

It seems that nginx buffers requests before passing it to the updstream server,while it is OK for most cases for me it is very bad :)
My case is like this:
I have nginx as a frontend server to proxy 3 different servers:
apache with a typical php app
shaveet(a open source comet server) built by me with python and gevent
a file upload server built again with gevent that proxies the uploads to rackspace cloudfiles
while accepting the upload from the client.
#3 is the problem, right now what I have is that nginx buffers all the request and then sends that to the file upload server which in turn sends it to cloudfiles instead of sending each chunk as it gets it (those making the upload faster as i can push 6-7MB/s to cloudfiles).
The reason I use nginx is to have 3 different domains with one IP if I can't do that I will have to move the fileupload server to another machine.
As soon as this [1] feature is implemented, Nginx is able to act as reverse proxy without buffering for uploads (bug client requests).
It should land in 1.7 which is the current mainline.
[1] http://trac.nginx.org/nginx/ticket/251
Update
This feature is available since 1.7.11 via the flag
proxy_request_buffering on | off;
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
According to Gunicorn, they suggest you use nginx to actually buffer clients and prevent slowloris attacks. So this buffering is likely a good thing. However, I do see an option further down on that link I provided where it talks about removing the proxy buffer, it's not clear if this is within nginx or not, but it looks as though it is. Of course this is under the assumption you have Gunicorn running, which you do not. Perhaps it's still useful to you.
EDIT: I did some research and that buffer disable in nginx is for outbound, long-polling data. Nginx states on their wiki site that inbound requests have to be buffered before being sent upstream.
"Note that when using the HTTP Proxy Module (or even when using FastCGI), the entire client request will be buffered in nginx before being passed on to the backend proxied servers. As a result, upload progress meters will not function correctly if they work by measuring the data received by the backend servers."
Now available in nginx since version nginx-1.7.11.
See documentation
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
To disable buffering the upload specify
proxy_request_buffering off;
I'd look into haproxy to fulfill this need.