disable request buffering in nginx - file-upload

It seems that nginx buffers requests before passing it to the updstream server,while it is OK for most cases for me it is very bad :)
My case is like this:
I have nginx as a frontend server to proxy 3 different servers:
apache with a typical php app
shaveet(a open source comet server) built by me with python and gevent
a file upload server built again with gevent that proxies the uploads to rackspace cloudfiles
while accepting the upload from the client.
#3 is the problem, right now what I have is that nginx buffers all the request and then sends that to the file upload server which in turn sends it to cloudfiles instead of sending each chunk as it gets it (those making the upload faster as i can push 6-7MB/s to cloudfiles).
The reason I use nginx is to have 3 different domains with one IP if I can't do that I will have to move the fileupload server to another machine.

As soon as this [1] feature is implemented, Nginx is able to act as reverse proxy without buffering for uploads (bug client requests).
It should land in 1.7 which is the current mainline.
[1] http://trac.nginx.org/nginx/ticket/251
Update
This feature is available since 1.7.11 via the flag
proxy_request_buffering on | off;
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering

According to Gunicorn, they suggest you use nginx to actually buffer clients and prevent slowloris attacks. So this buffering is likely a good thing. However, I do see an option further down on that link I provided where it talks about removing the proxy buffer, it's not clear if this is within nginx or not, but it looks as though it is. Of course this is under the assumption you have Gunicorn running, which you do not. Perhaps it's still useful to you.
EDIT: I did some research and that buffer disable in nginx is for outbound, long-polling data. Nginx states on their wiki site that inbound requests have to be buffered before being sent upstream.
"Note that when using the HTTP Proxy Module (or even when using FastCGI), the entire client request will be buffered in nginx before being passed on to the backend proxied servers. As a result, upload progress meters will not function correctly if they work by measuring the data received by the backend servers."

Now available in nginx since version nginx-1.7.11.
See documentation
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
To disable buffering the upload specify
proxy_request_buffering off;

I'd look into haproxy to fulfill this need.

Related

HTTPS request served by Apache2 slower than via Reverse Proxy

My situation:
A website is hosted using a default apache2 installation on an ubuntu server.
Served on port 443 using HTTPS and a self-signed certificate (for developping).
Now I have a simple service written in golang that listens at port 8080 and acts as a Reverse Proxy to take https requests, forward them to apache locally and return the response back to the client. This webservice doesn't cache any files and only forwards requests.
Code: https://play.golang.org/p/tnfKVWyLuZQ
My "problem":
Calling apache directly, i.e. https://foo.com/bar/ is remarkably slower (200-400ms) than calling the website through my reverse proxy, i.e. https://foo.com:8080/bar/
Why is it slower to call apache2 directly? I expected to have overhead using a reverse proxy, not a speedup. -> Comparison for example page: https://i.imgur.com/TqznM2v.png
UPDATE: Sketch to show the current setup:
Current Setup
Regarding the encoding: The Encoding is consistent in both situations: Encoding header and Content-Length is in both cases (Situation 1 vs 2) the same, the client also receives the file size. Not sure why in the HAR Viewer it only displays the uncompressed size in the second case. If checking in Chrome I can see the compressed size in both case.
Update #2: I came to the conclusion that the golang implementation handles multiple requests from the same client in a short time more efficiently than apache2 in it's default configuration. Sicne I only test with few clients I can't say how well it scales - I imagine the webservice will fall behind when under load.
I see this as closed, thanks all for the help.
As far as i can see. There are two possible reasons.
The apache reverse proxy handled some cache contain static file like images, css or javascript.
When you browse a https url. A process named "ssl certificate uninstall" will happen, and it can cause huge server load. So if the web application and the ssl certificate are deployed on the same server, the load may cause high-latency. Generally, we use a special device named load-balancer to uninstall ssl certificate, Just like a reverse proxy.

Nginx serving application and ExpressJS just as backend

I think it's pretty common to use nginx to proxy connections to ExpressJS, so all is done through ExpressJS.
I was thinking, why not use nginx to server the application since it's more simple to setup things like rewrites and let ExpressJS as backend only and then the application communicate to ExpressJS directly on 3000 port.
Is it a bad idea? If not, how often people does this ?
It's very common. But having your front end code directly talk to the node server adds complexity.
You have to handle CORS issues on the node server, including preventing cross site form submissions. See here Properly Understanding CORS with Same Host / Different Port & Security.
SSL is also going to be a bit more complicated. You'll need a wild card certificate.
However, there are some big advantages to using something like ngnix to host your assets. In addition to the ones you enumerated, it sets you up to go serverless. You can host your app out of an S3 bucket our through another content delivery network.

Delay issue with Websocket over SSL on Amazon's ELB

I followed the instructions from this link:
How do you get Amazon's ELB with HTTPS/SSL to work with Web Sockets? to set up ELB to work with Websocket (having ELB forward 443 to 8443 on TCP mode). Now I am seeing this issue for wss: server sends message1, client does not receive it; after few seconds, server sends message2, client receives both messages (both messages are around 30 bytes). I can reproduce the issue fairly easily. If I set up port forwarding with iptable on the server and have client connecting directly to the server (port 443), I don't have the problem Also, the issue seems to happen only to wss. ws works fine.
The server is running jetty8.
I checked EC2 forums and did not really find anything. I am wondering if anyone has seen the same issue.
Thanks
From what you describe, this pretty likely is a buffering issue with ELB. Quick research suggests that this actually is the issue.
From the ELB docs:
When you use TCP for both front-end and back-end connections, your
load balancer will forward the request to the back-end instances
without modification to the headers. This configuration will also not
insert cookies for session stickiness or the X-Forwarded-* headers.
When you use HTTP (layer 7) for both front-end and back-end
connections, your load balancer parses the headers in the request and
terminates the connection before re-sending the request to the
registered instance(s). This is the default configuration provided by
Elastic Load Balancing.
From the AWS forums:
I believe this is HTTP/HTTPS specific but not configurable but can't
say I'm sure. You may want to try to use the ELB in just plain TCP
mode on port 80 which I believe will just pass the traffic to the
client and vice versa without buffering.
Can you try to make more measurements and see how this delay depends on the message size?
Now, I am not entirely sure what you already did and what failed and what did not fail. From the docs and the forum post, however, the solution seems to be using the TCP/SSL (Layer 4) ELB type for both, front-end and back-end.
This resonates with "Nagle's algorithm" ... the TCP stack could be configured to bundling requests before sending them over the wire to reduce traffic. This would explain the symptoms, but worth a try

Socket RPC with Tornado in an established Apache/PHP environment

We've got an establish Apache/PHP site, with a Flash front-end. We're going to start to need to implement some sort of socket communication, or 'long-polling', to push updates to the flash app. Since this obviously isn't going to be a good situation for Apache, or PHP, I'd like to use Tornado for this aspect of the functionality, but I also don't want to run Tornado on another port, since the Flash app will be running on the client machine, we don't want to have to deal with restrictive firewalls blocking the socket connections.
Ideally I'd like to run a proxy which can forward most requests to Apache, and other requests to Tornado. I saw some suggestions for using Apache as the first-contact proxy, forward requests to Tornado when necessary, but I've also seen this discounts a lot of the async capabilities of Tornado.
I thought, why no use Tornado as the first-contact for port 80 and have it proxy back to Apache? I couldn't find anything on this at all and am wondering if this is even possible?
Another option would be to use something like lighttpd as the proxy and have it decide whether to pass things along to Apache or to Tornado, but does this kind of setup make sense? Or what about Nginx?
Any suggestions, advice or corrections on my understanding of things would be greatly appreciated!
This is called a reverse proxy and it's very easy to configure nginx to perform this. (lighttpd should also be able to do this job well, but I have no experience using it).
The tornado documentation has an example nginx configuration
One thing to note when using a reverse proxy is that the connection to your upstream server will now be originating from the proxy, not the client. The de facto standard is to put information about the original request in certain http headers. In the example from the tornado docs, the X-Real-IP header is set to the IP of the original client and X-Scheme is set to the scheme of the original request (http/https for example).
This may require some modifications to your upstream server. With tornado this is done by constructing the HTTPServer with the xheaders argument set to True. This will instruct the server to try and pull the IP address and scheme from the X-headers. Note that if you use this with a server that isn't behind a reverse proxy that sets the appropriate headers than you are open to IP Address spoofing.

Can I use Apache mod_proxy as a connection pool, under the Prefork MPM?

Summary/Quesiton:
I have Apache running with Prefork MPM, running php. I'm trying to use Apache mod_proxy to create a reverse proxy that I can re-route my requests through, so that I can use Apache to do connection pooling. Example impl:
in httpd.conf:
SSLProxyEngine On
ProxyPass /test_proxy/ https://destination.server.com/ min=1 keepalive=On ttl=120
but when I run my test, which is the following command in a loop:
curl -G 'http://localhost:80/test_proxy/testpage'
it doesn't seem to re-use the connections.
After some further reading, it sounds like I'm not getting connection pool functionality because I'm using the Prefork MPM rather than the Worker MPM. So each time I make a request to the proxy, it spins up a new process with its own connection pool (of size one), instead of using the single worker that maintains its own pool. Is that interpretation right?
Background info:
There's an external server that I make requests to, over https, for every page hit on a site that I run.
Negotiating the SSL handshake is getting costly, because I use php and it doesn't seem to support connection pooling - if I get 300 page requests to my site, they have to do 300 SSL handshakes to the external server, because the connections get closed after each script finishes running.
So I'm attempting to use a reverse proxy under Apache to function as a connection pool, to persist the connections across php processes so I don't have to do the SSL handshake as often.
Sources that gave me this idea:
http://httpd.apache.org/docs/current/mod/mod_proxy.html
http://geeksnotes.livejournal.com/21264.html
First of all, your test method cannot demonstrate connection pooling since for every call, a curl client is born and then it dies. Like dead people don't talk a lot, a dead process cannot keep a connection alive.
You have clients that bothers your proxy server.
Client ====== (A) =====> ProxyServer
Let's call this connection A. Your proxy server does nothing, it is just a show off. The handsome and hardworking server is so humble that he hides behind.
Client ====== (A) =====> ProxyServer ====== (B) =====> WebServer
Here, if I am not wrong, the secured connection is A, not B, right?
Repeating my first point, on your test, you are creating a separate client for each request. Every client needs a separate connection. Connection is something that happens between at least two parties. One side leaves and connection is lost.
Okay, let's forget curl now and look together at what we really want to do.
We want to have SSL on A and we want A side of traffic to be as fast as possible. For this aim, we have already separated side B so it will not make A even slower, right?
Connection pooling? There is no such thing as connection pooling at A. Every client comes and goes making a lot of noise. Only thing that can help you to reduce this noise is "Keep-Alive" which means, keeping connection alive from a client for some short period of time so this very same client can ask for other files that will be required by this request. When we are done, we are done.
For connections on B, connections will be pooled; but this will not bring you any performance since on one-server setup you did not have this part of the noise production.
How do we help this system run faster?
If these two servers are on the same machine, we should get rid of the show-off server and continue with our hardworking webserver. It adds a lot of unnecessary work to the system.
If these are separate machines, then you are being nice to web server by taking at least encyrption (for ssl) load from this poor guy. However, you can be even nicer.
If you want to continue on Apache, switch to mpm_worker from mpm_prefork. In case of 300+ concurrent requests, this will work much better. I really have no idea about the capacity of your hardware; but if handling 300 requests is difficult, I believe this little change will help your system a lot.
If you want to have an even more lightweight system, consider nginx as an alternative to Apache. It is very easy to setup to work with PHP and it will have a better performance.
Other than front-end side of things, also consider checking your database server. Connection pooling will make real difference here. Be sure if your PHP installation is configured to reuse connections to database.
In addition, if you are hosting static files on the same system, then move them out either on another web server or do even better by moving static files to a cloud system with CDN like AWS's S3+CloudFront or Rackspace's CloudFiles. Even without CloudFront, S3 will make you happy. Rackspace's solution comes with Akamai!
Taking out static files will make your web server "oh what happened, what is this silence? ohhh heaven!" since you mentioned this is a website and web pages have many static files for each dynamically generated html page most of the time.
I hope you can save the poor guy from the killer work.
Prefork can still pool 1 connection per backend server per process.
Prefork doesn't necessarily create a new process for each frontend request, the server processes are "pooled" themselves and the behavior depends on e.g. MinSpareServers/MaxSpareServers and friends.
To maximise how often a prefork process will have a backend connection for you, avoid very high or low maxspareservers or very high minspareservers as these will result in "fresh" processes acceptin new connections.
You can log %P in your LogFormat directive to help get an idea if how often processes are being reused.
The Problem in my case was, the the connection pooling between reverse proxy and backend server was not taking place because of the Backend Server Apache closing the SSL connection at the end of each HTTPS request.
The backend Apache Server was doing this becuse of the following Directive being present in the httpd.conf:
SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
This directive does not make sense when the backend server is connected via a reverse proxy and this can be removed from the backend server config.