Load balancing based on Content - load-balancing

I wanted to send all the requests for same content to the same backend server. How I can do this. Are there any open source versions like HaProxy which can do this.
For example. Client 1 has requested for Content A, and my load balancer directs that request to one of the backend server say X on round robin basis. Now if I receive a request from different client 2 for the same content A, this request should be directed to the same backend server X. Are there any open source solution which can do this.
Any help/pointers would be appreciated.
Thanks, Nikhil

Ha proxy can do what you want and more. It has many acl options available to suit most requirements. Varnish is another option that has a robust acl language.

Interesting question!
I'm affraid it can depend on technology. As long as you're in HTTP domain, maybe you can somehow configure your loadbalancer.
I'm a Java guy, so, in java you can have, say EJB. These are distributed components installed on server and can be run remotely. Their communication protocol is binary and I doubt load balancer can read it.
So, in JBoss, for example you can create a cluster of servers, and deploy different EJBs on different servers.
For example, lets assume, there are two EJBs in the system. One allows to buy milk, and one for pizza.
So you deploy the milk ejb on server 1 and pizza ejb on server 2.
Now you have a naming resolution service (in java/jboss its called HA-JNDI).
It basic idea is to provide a remote stub based on the name:
PizzaEJB pizzaEjb = NamingService.getMyStub(PizzaEJB.class);
Its not a real working code of course, but it demonstrates an idea.
The trick is that this naming server knows where each EJB is deployed, so if you have the pizza ejb only on server 2 it will always return a stub that will go to server 2 and buy the pizza :)
Java programmers so, don't really care how its implemented under the hood. Just to give an idea - the naming service has some form of agent deployed on each server and they talk with each other...
This is how java can work here.
Now what I think, maybe you can base your api on Restful web services, in this case its easily parsable http request, so the implementation can be relatively easy (again, if your load balancer supports this kind of processing).
Hope this helps somehow

Related

Verifying individual servers in a load balancing configuration

Here is my situation. Recently, my production environment has been burned by a few Windows updates that caused some production servers to stop responding. While we have since resolved the issue of both of the servers (which are in a load balancing configuration) getting updates on the same day, the question arouse, how do we check that the application running on each server is still working? If we call the load balancing IP, we may or may not hit a server that is working. So if the update takes out the application on one server, how do we know that this has happened
The only idea I have for this is to purchase 2 more SSL certificates and allocate 2 ip addresses and assign one to each server. This way I would be guaranteed that I would know each server is up (we have a 3rd party service pinging our servers). But I have to believe that there is a better way to do this?
Please note that I am a .Net developer by trade with only an extremely small smattering of networking and IIS experience, but I'm what my small company has. So please assume I don't know where a lot of stuff is and dumb down the answer.
Load balancer maintains live status of the servers ( based on timeouts or http health checks ). It uses this status to route the traffic only to active servers.
Generally, LBs have a dashboard through which you can check this status. If not, you can check it's logs.

For a SaaS running on Node.JS, is a web-server (nginx) or varnish necessary as a reverse proxy?

For a SaaS running on Node.JS, is a web-server necessary?
If yes, which one and why?
What would be the disadvantages of using just node? It's role is to just handle the CRUD requests and serve JSON back for client to parse the date (like Gmail).
"is a web-server necessary"?
Technically, no. Practically, yes a separate web server is typically used and for good reason.
In this talk by Ryan Dahl in May 2010, at 37'30" he states that he recommends running node.js behind a reverse proxy or web server for "security reasons". To elaborate on that, hardened web servers like nginx or apache have had their TCP stacks evolve for a long time in terms of stability and security. Node.js is not at that same level yet. Thus, since putting node.js behind nginx is easy, doesn't have many negative consequences, and in theory increases the security of your deployment somewhat, it is a good choice. At some point in time, node.js may be deemed officially "ready for live direct Internet connections" but wait for Ryan/Joyent to make some announcement to that effect.
Secondly, binding to sub-1024 ports (like 80 and 443) requires the process to be root. nginx and others automatically handle binding as root and then dropping privileges to a safer user account (www-data or nobody typically). Although node.js has system call wrappers in the process module to drop root privileges with setgid and setuid, AFAIK other than coding this yourself the node community hasn't yet seen a convention emerge for doing this. More on this topic in this discussion.
Thirdly, web servers are good at virtual hosting and in general there are convenient things you can do (URL rewriting and such) that require custom coding in node.js to achieve otherwise.
Fourthly, nginx is great at serving static files. Better than node.js (at least by a little as of right now). Again as time goes forward this point may become less and less relevant, but in my mind a traditional static file web server and a web application server still have distinct roles and purposes.
"If yes, which one and why"?
nginx. Because it has great performance and is simpler to configure than apache.

non-http server

I'm writing a server that needs to serve many clients. The traffic is NOT http (but rather some proprietary protocol on top of TCP). I'm not very familiar with commercial web servers such as IIS and Apache. Can anyone tell me if it's possible to write some sort of "extension" to run on top of one of these platforms so that I don't have to write the logic for the sockets? Or perhaps there is another way (not IIS or Apache) of doing it which is better?
My server is generally going to behave as a web service (gets request, queries db, sends response) however there is one scenario in which it stays connected to the client socket and sends updates at a given interval on that socket.
It seems reasonable for it to be a way to do this in a way that I'd only have to write my logic without the general logic of a server. Any ideas?
Thanks!
Good question, and its also good too look to leverage an existing web server - you get scalability and stability, effectively for free.
I've never done this myself, but it should be totally possible in IIS (i recommend v7+ for this, makes it easier).
You can set up a new web site through the administration tool, and assign it a port to listen on - this bit is pretty straight forward. You should set its Binding Type to net.tcp (this is a dropdown in the dialog to add a new website, you can't miss it).
You can then use either modules or handlers to implement the rest of your custom functionality. This article Developing IIS 7.0 Modules and Handlers with the .NET Framework is a good intro to the subject. Most of the doco out there about writing custom handlers and modules is focussed on the HTTP protocol, but there are some snippets floating around for TCP and/or net.tcp (because IIS and Apache are web servers, and web is synonymous with http). Another resource that may be useful is this: Configure Request-Processing for a Web Server (IIS 7)
Alternatively, you may consider changing your approach and do it as a net.tcp WCF service, with this you get the benefits of using IIS, the flexibility of choosing the protocol (can be statically configured, doesn't need to be compiled in), and you don't have to write handlers or modules.

IIS7 and ARR and WCF... Can we load balance our app servers?

Perhaps I have the wrong product in mind for our needs -- but I want to know if I can use Application Request Routing (ARR) in IIS7 to load balance requests for our application tier.
We have a farm of web servers. Each will be running our MVC web application. We load balance these servers through our web application firewall and load balancing appliances. In turn, they will be make WCF calls to our application servers. It's these calls that I want to use ARR to manage.
However, after looking at ARR, it seems like it's all about rewriting URLs coming from the client. But that's not how our situation works. If a user browses to www.myapp.com/home/index, we will in turn be making WCF calls to services configured in the web.config to say myappservice.foo.local/home/GetInfo.
How do I configure for this scenario, or am I looking at the wrong product?
I am not really sure to understand your scenario, but if i understand correctly, I think you would be able to call your WCF sevice. If you dont need to keep the session on your call, just uncheck the client affinity checkbox in server affinity configuration.
Configure your load balance to Round Robin, or Least response time in the load balance interface and your request will suppose to be load balance.
If you got more that one ARR server, i suggest you to disable Shared configuration on your ARR, we get some problem with this features on the ARR server.
I agree with #Cedric, not sure if the question has been phrased as well as it could be?
What type of load balancing/distribution are you looking to achieve?
Are you looking to balance load? Distribute load to specific server farms based on request content? Some other function? A little more info here might help get a better answer.
Will ARR work with WCF?
Yes, but only with the HTTP bindings as far as I know (wsHttpBinding and basicHttpBinding).
I still imagine there being a hardware loadbalancer (or some other method) in front of your ARR servers.
Your web servers are going to act as clients of the application servers servicing your WCF requests. However, it appears as if your already going to a DNS name of myappservice.foo.local/home/GetInfo - if it resolves to a virtual ip your already getting loadbalancing of some kind?
Why not use your existing "load balancing appliances" to do the load balancing?
I could definitely imagine a hardware loadbalancer servicing requests for mayappservice.foo.local, which then resolves to a virtual ip backed by your ARR servers. Then conceivably your ARR servers could then further refine who services the request, maybe by some content of the request? Maybe map all /home requests to one group of servers and all /foo to another?
Pretty sure I muddied the waters a bit more! :) But I'm very curious for reasons for looking at ARR.

Why do some setups front-end Glassfish with Apache?

I've been trying to mug up on Glassfish and one thing that keeps coming up is the "how-to" on fronting Glassfish with Apache. Unfortunately, I have yet to find a description of why you would want to do this!
From my experimentation, Glassfish seems like a pretty fully featured web server-type service; but I might be missing a lot. So, is the notion of front-ending Glassfish more of a solution to integrate it with an existing architecture, or does front-ending (in a pure Java environment) provide extra benefits?
There's also another valid use case as to why we front Glassfish with Apache. Apache in this instance would function as a reverse proxy for increased security of your Glassfish. The RP is configured to allow only certain URLs to be passed through to the application server. For e.g., you may have app contexts /myApp and /myPrivApp deployed in Glassfish. In the RP server, you only configure /myApp to be passed to Glassfish. Anybody requesting for /myPrivApp would see a 404 'cos the request stops right at the RP level.
In one of my deployments, I have a bunch of WARs deployed, some for users coming from the internet, some for intranet only. I have 2 RPs running, one for internet users and the other for intranet. I configure the internet RP to only allow URLs for approved internet applications to pass through while intranet users get to see everything.
Hope that helps.
It is usually used to speed things up. Since apache is a very fast web server it is used to deliver static content. Like images, CSS files and so on. Glassfish serves the dynamic content (servlets, JSPs) in this scenario.
Another reason for using Apache as a frontend to Glassfish is the possibility to provide load balancing across a Glassfish cluster. See http://tiainen.sertik.net/2011/03/load-balancing-with-glassfish-31-and.html for details.
A other reason is that glassfish cannot run (easily) on port 80, without giving it root rights of course.
So, for most users it's easer to run a proxy (apache, nginx, varnish) some sort in front of apache and have both servers run under a normal user.
Then you have a other advantage of some configurations options of your front end. Like others mentioned, caching for example.