apache to tomcat: mod_jk vs mod_proxy - apache

What are the advantages and disadvantages of using mod_jk and mod_proxy for fronting a tomcat instance with apache?
I've been using mod_jk in production for years but I've heard that it's "the old way" of fronting tomcat. Should I consider changing? Would there be any benefits?

A pros/cons comparison for those modules exists on http://blog.jboss.org/
mod_proxy
* Pros:
o No need for a separate module compilation and maintenance. mod_proxy,
mod_proxy_http, mod_proxy_ajp and mod_proxy_balancer comes as part of
standard Apache 2.2+ distribution
o Ability to use http https or AJP protocols, even within the same
balancer.
* Cons:
o mod_proxy_ajp does not support large 8K+ packet sizes.
o Basic load balancer
o Does not support Domain model clustering
mod_jk
* Pros:
o Advanced load balancer
o Advanced node failure detection
o Support for large AJP packet sizes
* Cons:
o Need to build and maintain a separate module

If you wish to stay in Apache land, you can also try the newer mod_proxy_ajp, which uses the AJP protocol to communicate with Tomcat instead of plain old HTTP, but which leverages mod_proxy to do the work.

AJP vs HTTP
When using mod_jk, you are using the AJP. When using mod_proxy you will use HTTP or HTTPS. And this is essentially what makes all the difference.
The Apache JServ Protocol (AJP)
The Apache JServ Protocol (AJP) is a binary protocol that can proxy inbound requests from a web server through to an application server that sits behind the web server. AJP is a highly trusted protocol and should never be exposed to untrusted clients, which could use it to gain access to sensitive information or execute code on the application server.
Pros
Easy to set up as the correct forwarding of HTTP headers is not required.
It is less resource intensive because the TCP packets are forwarded in binary format instead of doing a costly HTTP exchange.
Cons
Transferred data is not encrypted. It should only be used within trusted networks.
Hypertext Transfer Protocol (HTTP)
HTTP functions as a request–response protocol in the client–server computing model. A web browser, for example, may be the client and an application running on a computer hosting a website may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.
Pros
Can be encrypted with SSL/TLS making it suitable for traffic across untrusted networks.
It is flexible as it allows to modify the request before forwarding. For example, setting custom headers.
Cons
More overhead as the correct forwarding of the HTTP headers has to be ensured.
More resource intensive as the request is fully parsed before forwarding.

Related

What is the difference between Reverse proxy and Load balancer?

I am trying to understand how reverse proxy and load balancing are different from each other. When its useful to use reverse proxy over load balancing.
Both promise to improve efficiency and sits in between client and server. They nearly look the same when we try to understand them, but still their functionality differs.
Load balancing: Is hardware or a software unit that distributes the total load on a website by distributing it to multiple servers.
The algorithms used by load balancing should be chosen as such it makes the best use of each servers’ capacity and can provide the result as fast as possible.
Load balancers are of three categories: DNS Round Robin, L3/L4 Load Balancer [ works on IP and TCP layer ], and L7 Load Balancer [ works on application layer].
The different kinds of algorithms used by load balancer for distributing load are IP Hash, Least connection, Round robin, Least traffic, etc.
Reverse Proxy: They act as a face of website or we can say they serve as a gateway that web traffic has to pass. The main role of a reverse proxy is:
Security: They act as a wall to your backend server. Protecting the backend from direct interactions and thus improving the security of the overall system.
Web acceleration: It also provides features like caching, SSL encryption, and Compression to reduce the time to provide responses to clients.
Flexibility: The changes in backend architecture become more flexible as the client can only access the reverse proxy.
A reverse proxy can even be relevant even when there is only one server in your system. In such cases there is no requirement of load balancers but still the reverse proxy can be useful providing security, flexibility and web acceleration.
According to this link,
A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client. In other words, Reverse proxies act as such for HTTP traffic and application programming interfaces.
A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client. Load balancers can deal with multiple protocols — HTTP as well as Domain Name System protocol, Simple Message Transfer Protocol and Internet Message Access Protocol. A load balancer receives and routes client requests for application, text, image or video data to any server in a pool that is capable of fulfilling them and then returns the server’s response to the client.

HTTPS request served by Apache2 slower than via Reverse Proxy

My situation:
A website is hosted using a default apache2 installation on an ubuntu server.
Served on port 443 using HTTPS and a self-signed certificate (for developping).
Now I have a simple service written in golang that listens at port 8080 and acts as a Reverse Proxy to take https requests, forward them to apache locally and return the response back to the client. This webservice doesn't cache any files and only forwards requests.
Code: https://play.golang.org/p/tnfKVWyLuZQ
My "problem":
Calling apache directly, i.e. https://foo.com/bar/ is remarkably slower (200-400ms) than calling the website through my reverse proxy, i.e. https://foo.com:8080/bar/
Why is it slower to call apache2 directly? I expected to have overhead using a reverse proxy, not a speedup. -> Comparison for example page: https://i.imgur.com/TqznM2v.png
UPDATE: Sketch to show the current setup:
Current Setup
Regarding the encoding: The Encoding is consistent in both situations: Encoding header and Content-Length is in both cases (Situation 1 vs 2) the same, the client also receives the file size. Not sure why in the HAR Viewer it only displays the uncompressed size in the second case. If checking in Chrome I can see the compressed size in both case.
Update #2: I came to the conclusion that the golang implementation handles multiple requests from the same client in a short time more efficiently than apache2 in it's default configuration. Sicne I only test with few clients I can't say how well it scales - I imagine the webservice will fall behind when under load.
I see this as closed, thanks all for the help.
As far as i can see. There are two possible reasons.
The apache reverse proxy handled some cache contain static file like images, css or javascript.
When you browse a https url. A process named "ssl certificate uninstall" will happen, and it can cause huge server load. So if the web application and the ssl certificate are deployed on the same server, the load may cause high-latency. Generally, we use a special device named load-balancer to uninstall ssl certificate, Just like a reverse proxy.

What makes nginx/apache a web server, HAProxy not?

What makes nginx/apache a web server, HAProxy not?
What functionalities HAProxy lacks to be a web server?
HAProxy can listen on port 80 and can speak HTTP but that's not what people mean when they say "web server."
HAProxy is not a web server, because "web server" implies an HTTP endpoint that can serve static content from files and/or dynamic content generated from code. That's not what HAProxy is for.
Technically, there are certain capabilities in HAProxy that can be misused to emulate some capabilities of a web server -- you can serve very small static files from memory buffers and you can generate small dynamic responses using the optional embedded Lua interpreter -- but it is not intended or designed to be used as a web server. It's a proxy server -- emulating a web server toward the client, and emulating a client toward the real back-end web server(s) behind it -- because bidirectional emulation is commonly what proxies do.
With Nginx and Apache, you can specify a root directory from which files are served, and you can specify paths that are to be serviced by code running in languages like Perl, PHP, Python, etc. Not with HAProxy, because, again, that isn't what it's designed to do.
Both Nginx and Apache can also be used as proxy servers, as HAProxy can, but HAproxy is specifically designed and optimized for that primary purpose -- proxying and load balancing against multiple back-end, selecting the back-end using various rules and algorithms... in essence, HAProxy is an "intermediate router" for HTTP requests, delivering them rather than responding to them. It can also proxy and load balance non-HTTP protocols that rely on TCP.

AJP Connector or HTTP Connector

We have a web application (3rd party product) hosted in Tomcat 6x server. We will be installing a IBM HTTP Server as Web Server in-front of the Tomcat server. While doing this, the product vendor has asked us to use HTTP connector (instead of AJP Connector) for communication between Tomcat & the IHS Web Server.
The few articles i read seems to be pointing that,
an AJP connector will provide faster performance than proxied HTTP.... It is otherwise functionally equivalent to HTTP clustering.
1. Apart from performance, are there any other reasons when we should go for an AJP Connector and when we should go for an HTTP Connector ?
2. Are there any other side effects because of this choice of HTTP Connector, instead of AJP Connector ?
Note: Our application has approx 80 concurrent users during peak time.
AJP permits the proxy to tell the backend about client SSL certificate details, which in Java EE are used to satisfy some HTTPServletRequest APIs.
You shouldn't use either in IHS, though, with an application server IHS wasn't bundled with. You'll have no support, and the generic proxy support is not really maintained actively.

How to put up an off-the-shelf https to http gateway?

I have an HTTP server which is in our internal network and accessible only from inside it. I would like to put another server that would listen to an HTTPS port accessible from outside, and forward the requests to that HTTP server (and send back the responses via HTTPS). I know that there are several ways to do this with some programming involved (and I myself made a temporary solution with Tomcat and a very simple servlet I wrote), but is there a way to do the same just plugging parts already made (like Apache + modules)?
This is the sort of use-case that stunnel is designed for. There is a specific example of using stunnel to wrap an HTTP server.
You should consider whether this is really a good idea, though. Web applications designed for use inside a corporate firewall are often fairly lax about security. Merely encrypting the connections prevents casual eavesdropping, but does not secure the site. If an attacker finds your outward facing server and starts connecting to it, they can still try to find exploitable flaws in the web service (SQL injection, cross-site scripting, etc).
With Apache look into mod_proxy.
Apache 2.2 mod_proxy docs
Apache 2.0 mod_proxy docs