Apache returns 502 Bad Gateway with mod_proxy and large file

Apache returns 502 Bad Gateway with mod_proxy and large file - apache

I am seeing a problem when sending a file through Apache configured as proxy to a local application using the Mongoose webserver.
My setup:
Apache (port 80) <-> mod_proxy <-> Mongoose (port 9090)
mod_proxy is configured to transfer certain URLs from port 80 to localhost:9090.
Mongoose only accepts authenticated responses. This works OK for normal (small) requests. With large file transfers however Apache returns a 502 Bad Gateway response.
What happens (well, actually just my analysis of what happens) is that when our client (a .net client, expect 100 enabled) tries to send a file it sends the headers followed directly by the contents of the file.
Mongoose receives the headers of the transfer, detects that it is not authenticated and returns a 401 Unauthorized and closes the connection. Now Apache (which is still receiving and processing the file transfer) cannot forward the data any more and returns a 502 Bad Gateway (The proxy server received an invalid response from an upstream server).
When sniffing on the external interface I see that the .net client sends the headers, followed within 20 msec by the contents, without receiving a 100 Continue. When the receive is finished Apache returns the 502.
When sniffing the internal interface I see that the header and body are combined into one tcp packet of 16384 bytes. Mongoose replies within a few msecs with the 401 and closes the connection.
It looks like Apache detects the close of the connection but ignores the 401 and does not forward this. Is there a possibility to have Apache correctly forward the 401 instead of replying with the 502?
For the moment I changed our application to just read all data from the connection if a 401 is detected, but this is just a workaround as this comes down to sending the complete file twice. As the files can be hundreds of megabytes this can give quite some stress on our system.
We are using Apache 2.2.9 (Debian) on an ARM system.

You are probably experiencing the Apache bug filed here https://bz.apache.org/bugzilla/show_bug.cgi?id=55433
Related links:
http://osdir.com/ml/bugs-httpd/2013-08/msg00119.html
http://osdir.com/ml/bugs-httpd/2013-08/msg00094.html
http://dev.eclipse.org/mhonarc/lists/jetty-users/msg04580.html
PS: I've hit the same issue, and it's rather an obscure bug (both to find info on it, and the bug itself). FWIW, nginx does not present the same behaviour.

Related

Is it normal for TCP request to randomly "get lost in the internet"?

I created and manage a SOAP API built in ASP.NET ASMX. The API processes about 10,000 requests per day. Most days, about 3 request sent by the client (we only have 1 client) do not reach the web server (IIS). There is no discernible pattern.
We are actually using 2 web servers that sit behind a load balancer. From the IIS logs, I am 100% confident that the requests are not reaching either web server.
The team that manages the network and the load balancer have not been able to 'confirm or deny' whether the problem is occurring at the load balancer. They suggested it's normal for request to sometimes "get lost in the internet", and said that we should add retry logic to the API.
The requests are using TCP (and TLS). The client has confirmed that there is no problem occurring on their end.
My question is: is it normal for TCP requests to "get lost in the internet" at the frequency we are seeing (about 3 out of 10,000 per day).
BTW, both the web server and the client are located in the same country. For what it's worth, the country in question is an anglopshere country, so it's not the case that our internet infrastructure is shoddy.

There is no such thing as a TCP request getting lost since there is no such thing as a TCP request in the first place. There is a TCP connection and within this there is a TLS tunnel and within this the HTTP protocol is spoken - and only at this HTTP level there is the concept of request and response which then is visible in the server logs.
Problems can occur in many places, like failing to establish the TCP connection in the first place due to no route (i.e. no internet) or too much packet loss. There can be random problems at the TLS level caused by bit flips which cause integrity errors and thus connection close. There can be problems at the HTTP level, for example when using HTTP keep-alive and the server closing an idle connection while at the same time the client is trying to send another request. And probably more places.
The client has confirmed that there is no problem occurring on their end.
I have no idea what exactly this means. No problem would be if the client is sending the request and getting a response. But this is obviously not the case here, so either the client is failing to establish the TCP connection, failing at the TLS level, failing while sending the request, failing while reading the response, getting timeouts ... - But maybe the client is simply ignoring some errors and thus no problem is visible at the clients end.

Apache web serving serving incomplete pages

I have an Apache web server which several sites are connecting to.
From most sites it is accessible and serves content properly (sites being remote and connected via Cisco VPNs), but there is one site, where the server will serve an incomplete page when requested for the login page of the application we are running.
It does not matter what this application is I guess since it is working fine on 10 other sites, just not on this one.
I am getting exactly 907 bytes of the page (the last 907 bytes out of 4000 bytes).
Wireshark reports that the server response is not the first packet and that there is a packet missing before the capture started. Needless to say I waited minutes to start the browser after the capture started so there is no way a packet really was lost because of Wireshark still trying to start up.
Any idea where I look to resolve this?
As it works everywhere it seems to indicate something goes wrong on the network. Where would I look for such an awkward behaviour where 3000 bytes of a web server response get swallowed?

There is no indication on the Apache logs that anything special has occured on the failed pages.

iOS ASIHTTPSRequest switching between HTTP/HTTPS connections

I have an iOS app which is using ASIHTTPRequest to talk to a REST server. The server supports connections on port 80 (HTTP) and port 443 (HTTPS) - I'm using a GeoTrust/RapidSSL certificate on port 443. The user can configure the app to choose what protocol they want to use. I'm monitoring the traffic on the server using WireShark and what I'm finding is that occasionally if the user switches between HTTP and HTTPS, when they next submit a request then I can see traffic for both protocols, then every request after that is for the newly selected protocol only.
Also when the app is shutdown, there are a few packets sent which I guess is some kind of cleanup. The type of these final packets (HTTP/HTTPS) depends on what protocol the app has been using. If the app has been set to use both HTTP and HTTPS during the same app session, then both HTTP and HTTPS packets are sent when the app is shutdown. These scenarios don't seem right to me and suggest that my ASIHTTPRequest is not being completely cleared down. I am getting an occasional error when my request completes with the response 'HTTP/0.9 200 OK' but doesn't return any data and I think this is caused by trying to communicate with port 443 using HTTP.
Can anybody confirm my suspicions are true? Is there some command I should be using after an ASIHTTPRequest to clear it down so the next request can be sent on a different protocol?

What you are seeing is sounds like what HTTP persistent connections are meant to do; see http://en.wikipedia.org/wiki/HTTP_persistent_connection and so on.
There's nothing you need to do, none of this is doing any harm. The few http packets you see when switching protocols is just the old socket getting closed down I believe - I presume you are just seeing packets to TCP port 80, and aren't seeing any packets with data / actual http requests.

disable request buffering in nginx

It seems that nginx buffers requests before passing it to the updstream server,while it is OK for most cases for me it is very bad :)
My case is like this:
I have nginx as a frontend server to proxy 3 different servers:
apache with a typical php app
shaveet(a open source comet server) built by me with python and gevent
a file upload server built again with gevent that proxies the uploads to rackspace cloudfiles
while accepting the upload from the client.
#3 is the problem, right now what I have is that nginx buffers all the request and then sends that to the file upload server which in turn sends it to cloudfiles instead of sending each chunk as it gets it (those making the upload faster as i can push 6-7MB/s to cloudfiles).
The reason I use nginx is to have 3 different domains with one IP if I can't do that I will have to move the fileupload server to another machine.

As soon as this [1] feature is implemented, Nginx is able to act as reverse proxy without buffering for uploads (bug client requests).
It should land in 1.7 which is the current mainline.
[1] http://trac.nginx.org/nginx/ticket/251
Update
This feature is available since 1.7.11 via the flag
proxy_request_buffering on | off;
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering

According to Gunicorn, they suggest you use nginx to actually buffer clients and prevent slowloris attacks. So this buffering is likely a good thing. However, I do see an option further down on that link I provided where it talks about removing the proxy buffer, it's not clear if this is within nginx or not, but it looks as though it is. Of course this is under the assumption you have Gunicorn running, which you do not. Perhaps it's still useful to you.
EDIT: I did some research and that buffer disable in nginx is for outbound, long-polling data. Nginx states on their wiki site that inbound requests have to be buffered before being sent upstream.
"Note that when using the HTTP Proxy Module (or even when using FastCGI), the entire client request will be buffered in nginx before being passed on to the backend proxied servers. As a result, upload progress meters will not function correctly if they work by measuring the data received by the backend servers."

Now available in nginx since version nginx-1.7.11.
See documentation
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
To disable buffering the upload specify
proxy_request_buffering off;

I'd look into haproxy to fulfill this need.

Connection Reuse with Curl, Apache and mod_wsgi

I am deploying a mod_wsgi application on top of Apache, and have a client program that uses Curl.
On the CURL api on the user side, I have it attempt to reuse connection, but looking at the connections from wireshark, I see that for every HTTP request/response, a new connection is made.
At the end of every HTTP request, the HTTP response header has "Connection: Close"
Is this the same as Keep-Alive? What do I need to do on the Apache/Mod_wsgi side to enable connection re-use?

You would not generally need to do anything to Apache as support for keep alive connections would normally be on by default. Look at the KeepAlive directive in Apache configuration to work out what it is set to.
On top of that, for keep alive connections to work the WSGI application must be setting a content length in the response, or returning a list for the response where the list contains only a single string. In this latter case mod_wsgi will automatically add a content length for the response. The response would generally also need to be a successful response as most error responses would cause connection to be closed regardless.
Even having done all that, the issue is whether the ability of curl to fetch multiple URLs even makes use of keep alive connections. Obviously separate invocations of curl will not be able to, so that you are even asking this questions suggests you are trying to use that feature of curl. Only other option would be if you were using a custom client linked to libcurl and using its library and so you meant libcurl.
Do note that if access to Apache is via a proxy, the proxy may not implement keep alive and so stop the whole mechanism from working.
To give more information, need to know about how you are using curl.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas