Can Apache HTTP Server be configured to forward a request to multiple backend workers simultaneously? - apache

Are you aware of a mod_proxy, mod_proxy_balancer, mod_proxy_http configuration of Apache 2.2 that would allow HTTP requests to be replicated? That is: each matched request is sent to an existing balancer AND replicated to an another worker node.
Goal:
Take production HTTP traffic coming into Apache 2.2, retain normal production load-balanced routing AND replicate that same traffic to one more [test] worker fronting a new back-end database required to be performance and load tested under production operations.
Background info:
Multi-tier system.
(a) Custom applications
(b) Redirector/Proxy [Apache 2.2 using mod_proxy, mod_proxy_balancer, mod_proxy_http]
(c) Workers [application server nodes: Tomcat 7.0.56 over Java 1.7.0_67 over 64-bit Linux kernels]
(d) Database [Oracle 11.2]
End-users driving custom applications generate HTTP requests funneled to the redirector. The redirector forwards application requests on a round-robin basis to a pool of worker nodes. Workers directly access backend database. HTTP responses funnel back through the redirector to the end-user workstation.

No, it is not currently possible. But you can save the traffic and replay it with a relatively new module called mod_firehose. But it is not an all in one tool.

Related

Why do we need web servers if we have load balancer to direct the requests?

Suppose we have two servers serving requests through a load balancer. Is it necessary to have web server in both of our servers to process the requests. Can load balancer itself act as a web server. Suppose we are using apache web server and HAProxy. So does that mean that web server(Apache) should be installed in both the server and load balancer in any one of the server. Why can't we have load balancer in both of our server machine that will be receiving the request and talking to each other to process the requests.
At the very basic, you want to have Webservers fulfill requests for static contents, while Application servers handle business logics, i.e. handle requests for dynamic contents.
But Web servers can do many other things as well such as authenticate and validate requests, logging metrics. Also, the important part of Webserver is putting the Content it gets from Application servers with a View for client to represent.
You want to have LB sitting in front of both Web and App servers if you have more than one server. Also, there's nothing preventing you from putting both Web and App server in one.
The load balancer is in front of your webserver(s) to redirect requests according to number of sessions, a hash of source IP and destination IP, requested URL or other criteria. Additionally, it will check availability of the backend servers to ensure requests get answered even if one server fails.
It's not installed on every webserver - you only need one instance. It could be a hardware appliance, or a software (like HAproxy) which may or may not be installed on one of the webservers. Although this would not be prudent, as this webserver could fail and then the proxy would not be able to redirect traffic to the remaining server.
There are several different scenarios for this. One is load balancing requests to 2 webservers which serve the same HTML content, to provide redundancy.
Another would be to provide multiple websites using just one public address, i.e. applying destination NAT according to the requested URL. For this, the software has to determine the URL in the HTML request and redirect traffic to the backend webserver servicing this site. This sometimes is called 'reverse proxy' as it hides the internal server addresses from the outside.

What makes nginx/apache a web server, HAProxy not?

What makes nginx/apache a web server, HAProxy not?
What functionalities HAProxy lacks to be a web server?
HAProxy can listen on port 80 and can speak HTTP but that's not what people mean when they say "web server."
HAProxy is not a web server, because "web server" implies an HTTP endpoint that can serve static content from files and/or dynamic content generated from code. That's not what HAProxy is for.
Technically, there are certain capabilities in HAProxy that can be misused to emulate some capabilities of a web server -- you can serve very small static files from memory buffers and you can generate small dynamic responses using the optional embedded Lua interpreter -- but it is not intended or designed to be used as a web server. It's a proxy server -- emulating a web server toward the client, and emulating a client toward the real back-end web server(s) behind it -- because bidirectional emulation is commonly what proxies do.
With Nginx and Apache, you can specify a root directory from which files are served, and you can specify paths that are to be serviced by code running in languages like Perl, PHP, Python, etc. Not with HAProxy, because, again, that isn't what it's designed to do.
Both Nginx and Apache can also be used as proxy servers, as HAProxy can, but HAproxy is specifically designed and optimized for that primary purpose -- proxying and load balancing against multiple back-end, selecting the back-end using various rules and algorithms... in essence, HAProxy is an "intermediate router" for HTTP requests, delivering them rather than responding to them. It can also proxy and load balance non-HTTP protocols that rely on TCP.

What is HTTPD exactly?

I mean is "httpd" only used by Apache for the download of the software or is it used by other websites as well? Also is it necessary to have httpd to run "cgi" or not?
And why does Apache use httpd to download the http server instead of having it in a file on their http website?
Apache HTTPD is an HTTP server daemon produced by the Apache Foundation. It is a piece of software that listens for network requests (which are expressed using the Hypertext Transfer Protocol) and responds to them.
It is open source and many entities use it to host their websites.
Other HTTP servers are available (including Apache Tomcat which is designed for running server side programs written in Java (which don't use CGI)).
CGI is a protocol that allows an HTTP server to use an external piece of software to determine how to respond to a request instead of simply returning the contents of a static file. Many HTTP servers support the CGI protocol.
You can use CGI without an HTTP server, but this typically has few uses beyond allowing a developer to perform command line testing of the CGI program. (You certainly can't interact with it directly from a web browser).
HTTP Daemon is a software program that runs in the background of a web server and waits for the incoming server requests. The daemon answers the request automatically and serves the hypertext and multimedia documents over the Internet using HTTP.
Apache Httpd is basically a web server used for handling requests and delivering static content. While CGI is a protocol which adds a scripts with the request and based on the script the content is delivered instead of simply returning a static content. So it is not necessary to use CGI with apache httpd but for delivering a dynnmic content httpd and cgi are used together.
Also using httpd with cgi is a very heavy process of delivering dynamic content as it creates and destroys process with every request response cycle, there are many other efficient alternatives with latest technology.
HTTPd - HyperText Transfer Protocol Daemon
HTTPd is a software program, that usually runs in the background, as a process.
It plays the role of server in a client-server model using HTTP and/or HTTPS network protocols.
HTTPd waits for the incoming client requests and for each request it answers by replying with requested information.
Following are some commonly used HTTPd
Apache
BusyBox
CERN HTTPd
Lighttpd
Ngnix

Using Jetty to serve a web application

I am using Jetty for the first time to deploy a GWT web app connecting to a Restlet API and I am trying to understand the best way to use it.
I want to make it embeddable so that I can update config during run-time (allowing me to add new domain names etc).
Our web server currently runs Apache to serve a PHP web app and this will be our first time deploying a GWT app and using Jetty.
Is it possible to use Jetty in parallel with Apache (both serving requests on port 80) and since I am embedding it do I use Apache before it reaches Jetty? So Apache receives request and forwards to Jetty?
Both server cannot run on same port. But you can run both on same machine. So use a separate port for jetty.
Jetty receives the request through its own port and doesn't depend on other server.

apache to tomcat: mod_jk vs mod_proxy

What are the advantages and disadvantages of using mod_jk and mod_proxy for fronting a tomcat instance with apache?
I've been using mod_jk in production for years but I've heard that it's "the old way" of fronting tomcat. Should I consider changing? Would there be any benefits?
A pros/cons comparison for those modules exists on http://blog.jboss.org/
mod_proxy
* Pros:
o No need for a separate module compilation and maintenance. mod_proxy,
mod_proxy_http, mod_proxy_ajp and mod_proxy_balancer comes as part of
standard Apache 2.2+ distribution
o Ability to use http https or AJP protocols, even within the same
balancer.
* Cons:
o mod_proxy_ajp does not support large 8K+ packet sizes.
o Basic load balancer
o Does not support Domain model clustering
mod_jk
* Pros:
o Advanced load balancer
o Advanced node failure detection
o Support for large AJP packet sizes
* Cons:
o Need to build and maintain a separate module
If you wish to stay in Apache land, you can also try the newer mod_proxy_ajp, which uses the AJP protocol to communicate with Tomcat instead of plain old HTTP, but which leverages mod_proxy to do the work.
AJP vs HTTP
When using mod_jk, you are using the AJP. When using mod_proxy you will use HTTP or HTTPS. And this is essentially what makes all the difference.
The Apache JServ Protocol (AJP)
The Apache JServ Protocol (AJP) is a binary protocol that can proxy inbound requests from a web server through to an application server that sits behind the web server. AJP is a highly trusted protocol and should never be exposed to untrusted clients, which could use it to gain access to sensitive information or execute code on the application server.
Pros
Easy to set up as the correct forwarding of HTTP headers is not required.
It is less resource intensive because the TCP packets are forwarded in binary format instead of doing a costly HTTP exchange.
Cons
Transferred data is not encrypted. It should only be used within trusted networks.
Hypertext Transfer Protocol (HTTP)
HTTP functions as a request–response protocol in the client–server computing model. A web browser, for example, may be the client and an application running on a computer hosting a website may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.
Pros
Can be encrypted with SSL/TLS making it suitable for traffic across untrusted networks.
It is flexible as it allows to modify the request before forwarding. For example, setting custom headers.
Cons
More overhead as the correct forwarding of the HTTP headers has to be ensured.
More resource intensive as the request is fully parsed before forwarding.