Does a reverse proxy such as Apache Knox preserve request order? - reverse-proxy

For instance, let us assume that I have a system whose entry point is Apache Knox that redirects requests to Apache NiFi. Let us further assume that I send messages A and B to the server and Apache Knox receives firstly message A and then message B. Is it possible that Apache Knox changes the order of the messages in such a way that Apache NiFi will receive firstly message B and then message A?

No, Apache Knox will not change the message order but at the same time there is no guarantee that the messages will be delivered in a specific order. Apache Knox uses the httpClient to forward requests to the backend that is being proxied after rewriting.

Related

health check for apache knox

I want to create a health check mechanism to make sure I remove unhealthy Knox instances that are configured behind a load balancer.
Normal ping to the underlying instances will help check whether the machine is reachable or not. But it will not help determine if the gateway is healthy/running to serve incoming requests to that instance.
I can make a request to Knox through the LB, but it will goto only one instance and there is no way of knowing it.
I want to know if there is any way to determine the same? Or is there a mechanism that is provided in Knox itself though which I can make a http (non-secure, as direct https calls to the instance is not permitted) call to the gateway server and determine?
Thanks!!
I am not sure which Load balancer you are using. From the "health check" I am assuming you are using Elastic Load Balancer.
Create a health check with tcp protocol. It will only check whether those port are open or not. If the knox is not running those instances will go to out of service and the incoming requests will be re directed to the instances which are in service .
PFB the screenshots for the same.
I don't know how your load balancer is configured but you could try pinging knox_host:knox_port directly, this would at-least tell you whether knox is up and running (and listening).
If you would want to know whether Knox is healthy (specifically your topology) then you can try issuing a test request periodically and look for the response code 200.
e.g.
curl -i -u guest:guest-password -X GET \
'http://<direct-knox>:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS'
Hope that helps !

Apache returns 502 Bad Gateway with mod_proxy and large file

I am seeing a problem when sending a file through Apache configured as proxy to a local application using the Mongoose webserver.
My setup:
Apache (port 80) <-> mod_proxy <-> Mongoose (port 9090)
mod_proxy is configured to transfer certain URLs from port 80 to localhost:9090.
Mongoose only accepts authenticated responses. This works OK for normal (small) requests. With large file transfers however Apache returns a 502 Bad Gateway response.
What happens (well, actually just my analysis of what happens) is that when our client (a .net client, expect 100 enabled) tries to send a file it sends the headers followed directly by the contents of the file.
Mongoose receives the headers of the transfer, detects that it is not authenticated and returns a 401 Unauthorized and closes the connection. Now Apache (which is still receiving and processing the file transfer) cannot forward the data any more and returns a 502 Bad Gateway (The proxy server received an invalid response from an upstream server).
When sniffing on the external interface I see that the .net client sends the headers, followed within 20 msec by the contents, without receiving a 100 Continue. When the receive is finished Apache returns the 502.
When sniffing the internal interface I see that the header and body are combined into one tcp packet of 16384 bytes. Mongoose replies within a few msecs with the 401 and closes the connection.
It looks like Apache detects the close of the connection but ignores the 401 and does not forward this. Is there a possibility to have Apache correctly forward the 401 instead of replying with the 502?
For the moment I changed our application to just read all data from the connection if a 401 is detected, but this is just a workaround as this comes down to sending the complete file twice. As the files can be hundreds of megabytes this can give quite some stress on our system.
We are using Apache 2.2.9 (Debian) on an ARM system.
You are probably experiencing the Apache bug filed here https://bz.apache.org/bugzilla/show_bug.cgi?id=55433
Related links:
http://osdir.com/ml/bugs-httpd/2013-08/msg00119.html
http://osdir.com/ml/bugs-httpd/2013-08/msg00094.html
http://dev.eclipse.org/mhonarc/lists/jetty-users/msg04580.html
PS: I've hit the same issue, and it's rather an obscure bug (both to find info on it, and the bug itself). FWIW, nginx does not present the same behaviour.

disable request buffering in nginx

It seems that nginx buffers requests before passing it to the updstream server,while it is OK for most cases for me it is very bad :)
My case is like this:
I have nginx as a frontend server to proxy 3 different servers:
apache with a typical php app
shaveet(a open source comet server) built by me with python and gevent
a file upload server built again with gevent that proxies the uploads to rackspace cloudfiles
while accepting the upload from the client.
#3 is the problem, right now what I have is that nginx buffers all the request and then sends that to the file upload server which in turn sends it to cloudfiles instead of sending each chunk as it gets it (those making the upload faster as i can push 6-7MB/s to cloudfiles).
The reason I use nginx is to have 3 different domains with one IP if I can't do that I will have to move the fileupload server to another machine.
As soon as this [1] feature is implemented, Nginx is able to act as reverse proxy without buffering for uploads (bug client requests).
It should land in 1.7 which is the current mainline.
[1] http://trac.nginx.org/nginx/ticket/251
Update
This feature is available since 1.7.11 via the flag
proxy_request_buffering on | off;
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
According to Gunicorn, they suggest you use nginx to actually buffer clients and prevent slowloris attacks. So this buffering is likely a good thing. However, I do see an option further down on that link I provided where it talks about removing the proxy buffer, it's not clear if this is within nginx or not, but it looks as though it is. Of course this is under the assumption you have Gunicorn running, which you do not. Perhaps it's still useful to you.
EDIT: I did some research and that buffer disable in nginx is for outbound, long-polling data. Nginx states on their wiki site that inbound requests have to be buffered before being sent upstream.
"Note that when using the HTTP Proxy Module (or even when using FastCGI), the entire client request will be buffered in nginx before being passed on to the backend proxied servers. As a result, upload progress meters will not function correctly if they work by measuring the data received by the backend servers."
Now available in nginx since version nginx-1.7.11.
See documentation
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering
To disable buffering the upload specify
proxy_request_buffering off;
I'd look into haproxy to fulfill this need.

How do I configure Apache2 to allow multiple simultaneous connections from same IP address?

By default, Apache2 seems to allow only 1 connection per IP address.
How do I configure Apache2 to allow multiple simultaneous connections from the same IP address?
Here is my situation:
a web app being hosted on a server.
a remote client makes an request that may take 15 seconds to complete.
the same remote client makes another (independent) request.
at present, the 2nd request sits in a queue until the 1st request completes,
since Apache2 seems to impose a limit of 1 connection per IP address.
How do I override this default behaviour and allow the 2nd request to be processed in parallel?
thanks in advance,
David Jones
I discovered the answer to my problem. It turns out others have encountered this difficulty before:
Simultaneous Requests to PHP Script
The key detail is that file-based sessions in PHP cause all requests from the same client to be processed sequentially in a queue, rather than in parallel.
In order to solve this problem, it is necessary to make a call to session_write_close() in every PHP script as soon as session handling is finished.
-- David Jones

apache to tomcat: mod_jk vs mod_proxy

What are the advantages and disadvantages of using mod_jk and mod_proxy for fronting a tomcat instance with apache?
I've been using mod_jk in production for years but I've heard that it's "the old way" of fronting tomcat. Should I consider changing? Would there be any benefits?
A pros/cons comparison for those modules exists on http://blog.jboss.org/
mod_proxy
* Pros:
o No need for a separate module compilation and maintenance. mod_proxy,
mod_proxy_http, mod_proxy_ajp and mod_proxy_balancer comes as part of
standard Apache 2.2+ distribution
o Ability to use http https or AJP protocols, even within the same
balancer.
* Cons:
o mod_proxy_ajp does not support large 8K+ packet sizes.
o Basic load balancer
o Does not support Domain model clustering
mod_jk
* Pros:
o Advanced load balancer
o Advanced node failure detection
o Support for large AJP packet sizes
* Cons:
o Need to build and maintain a separate module
If you wish to stay in Apache land, you can also try the newer mod_proxy_ajp, which uses the AJP protocol to communicate with Tomcat instead of plain old HTTP, but which leverages mod_proxy to do the work.
AJP vs HTTP
When using mod_jk, you are using the AJP. When using mod_proxy you will use HTTP or HTTPS. And this is essentially what makes all the difference.
The Apache JServ Protocol (AJP)
The Apache JServ Protocol (AJP) is a binary protocol that can proxy inbound requests from a web server through to an application server that sits behind the web server. AJP is a highly trusted protocol and should never be exposed to untrusted clients, which could use it to gain access to sensitive information or execute code on the application server.
Pros
Easy to set up as the correct forwarding of HTTP headers is not required.
It is less resource intensive because the TCP packets are forwarded in binary format instead of doing a costly HTTP exchange.
Cons
Transferred data is not encrypted. It should only be used within trusted networks.
Hypertext Transfer Protocol (HTTP)
HTTP functions as a request–response protocol in the client–server computing model. A web browser, for example, may be the client and an application running on a computer hosting a website may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.
Pros
Can be encrypted with SSL/TLS making it suitable for traffic across untrusted networks.
It is flexible as it allows to modify the request before forwarding. For example, setting custom headers.
Cons
More overhead as the correct forwarding of the HTTP headers has to be ensured.
More resource intensive as the request is fully parsed before forwarding.