why do we need consistent hashing when round robin can distribute the traffic evenly - load-balancing

When the load balancer can use round robin algorithm to distribute the incoming request evenly to the nodes why do we need to use the consistent hashing to distribute the load? What are the best scenario to use consistent hashing and RR to distribute the load?

From this blog,
With traditional “modulo hashing”, you simply consider the request
hash as a very large number. If you take that number modulo the number
of available servers, you get the index of the server to use. It’s
simple, and it works well as long as the list of servers is stable.
But when servers are added or removed, a problem arises: the majority
of requests will hash to a different server than they did before. If
you have nine servers and you add a tenth, only one-tenth of requests
will (by luck) hash to the same server as they did before. Consistent hashing can achieve well-distributed uniformity.
Then
there’s consistent hashing. Consistent hashing uses a more elaborate
scheme, where each server is assigned multiple hash values based on
its name or ID, and each request is assigned to the server with the
“nearest” hash value. The benefit of this added complexity is that
when a server is added or removed, most requests will map to the same
server that they did before. So if you have nine servers and add a
tenth, about 1/10 of requests will have hashes that fall near the
newly-added server’s hashes, and the other 9/10 will have the same
nearest server that they did before. Much better! So consistent
hashing lets us add and remove servers without completely disturbing
the set of cached items that each server holds.
Similarly, The round-robin algorithm is used to the scenario that a list of servers is stable and LB traffic is at random. The consistent hashing is used to the scenario that the backend servers need to scale out or scale in and most requests will map to the same server that they did before. Consistent hashing can achieve well-distributed uniformity.

Let's say we want to maintain user sessions on servers. So, we would want all requests from a user to go to the same server. Using round-robin won't be of help here as it blindly forwards requests in circularly fashion among the available servers.
To achieve 1:1 mapping between a user and a server, we need to use hashing based load balancers. Consistent hashing works on this idea and it also elegantly handles cases when we want to add or remove servers.
References: Check out the below Gaurav Sen's videos for further explanation.
https://www.youtube.com/watch?v=K0Ta65OqQkY
https://www.youtube.com/watch?v=zaRkONvyGr8

For completeness, I want to point out one other important feature of Consistent Hashing that hasn't yet been mentioned: DOS mitigation.
If a load-balancer is getting spammed with requests, (either from too many customers, an attack, or a haywire local service) a round-robin approach will apply the request spam evenly across all upstream services. Even spread out, this load might be too much for each service to handle. So what happens? Your loadbalancer, in trying to be helpful, has brought down your entire system.
If you use a modulus or consistent hashing approach, then only a small subset of services will be DOS'd by the barrage.
Being able to "limit the blast radius" in this manner is a critical feature of production systems

Consistent hashing is fits well for stateful systems(where context of the previous request is required in the current requests), so in stateful systems if previous and current request lands in different servers than for current request context is lost and system won't be able to fulfil the request, so in consistent hashing with the use of hashing we can route of requests to same server for that particular user, while in round robin we cannot achieve this, round robin is good for stateless systems.

Related

Caching authentication calls

Imagine, that we have one authentication service and two applications that use It by http call. Where we should put a cache?
Only authentication service has its cache (redis for example)
Caching is a part of application number one and two
Putting your cache inside the applications is a judgement call. It will be far more efficient if it can cache in-memory, but it comes at a trade-off in that it might be invalid. This is ultimately a per-case basis, and depends on the needed security of your application.
You will have to figure out what an acceptable TTL for your cache is, it could be anywhere between zero and eternal. If it's zero, then you've answered your question, and there is no value in a cache layer at all in the application.
Most applications can accept some level of staleness, at least on the order of a few seconds. If you are doing banking transactions, you likely cannot get away with this, but if you are creating a social-media application, you can likely have a TTL at least in the several-minutes range, possibly hours or days.
Just a bit of advice, since you are using HTTP for your implementation, take a look at using the cache-control that is baked into HTTP, it's likely your client will support it out-of-the-box, and most of the large, complicated issues (cache expiration, store sizing, etc) will be be solved by people long before you.

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

Can ImageResizer be exploited if presets are not enabled?

My project uses the Presets plugin with the flag onlyAllowPresets=true.
The reason for this is to close a potential vulnerability where a script might request an image thousands of times, resizing with 1px increment or something like that.
My question is: Is this a real vulnerability? Or does ImageResizer have some kind of protection built-in?
I kind of want to set the onlyAllowPresets to false, because it's a pain in the butt to deal with all the presets in such a large project.
I only know of one instance where this kind of attack was performed. If you're that valuable of a target, I'd suggest using a firewall (or CloudFlare) that offers DDOS protection.
An attack that targets cache-misses can certainly eat a lot of CPU, but it doesn't cause paging and destroy your disk queue length (bitmaps are locked to physical ram in the default pipeline). Cached images are still typically served with a reasonable response time, so impact is usually limited.
That said, run a test, fake an attack, and see what happens under your network/storage/cpu conditions. We're always looking to improve attack handling, so feedback from more environments is great.
Most applications or CMSes will have multiple endpoints that are storage or CPU-intensive (often a wildcard search). Not to say that this is good - it's not - but the most cost-effective layer to handle this often at the firewall or CDN. And today, most CMSes include some (often poor) form of dynamic image processing, so remember to test or disable that as well.
Request signing
If your image URLs are originating from server-side code, then there's a clean solution: sign the urls before spitting them out, and validate during the Config.Current.Pipeline.Rewrite event. We'd planned to have a plugin for this shipping in v4, but it was delayed - and we've only had ~3 requests for the functionality in the last 5 years.
The sketch for signing would be:
Sort querystring by key
Concatenate path and pairs
HMACSHA256 the result with a secret key
Append to end of querystring.
For verification:
Parse the query,
Remove the hmac
Sort query and concatenate path as before
HMACSHA256 the result and compare to the value we removed.
Raise an exception if it's wrong.
Our planned implementation would permit for 'whitelisted' variations - certain values that a signature would permit to be modified by the client - say for breakpoint-based width values. This would be done by replacing targeted key/value pairs with a serialized whitelist policy prior to signature. For validation, pairs targeted by a policy would be removed prior to signature verification, and policy enforcement would happen if the signature was otherwise a match.
Perhaps you could add more detail about your workflow and what is possible?

LDAP Server side sorting - really a good idea?

I'm toying with using server side sorting in my OpenLDAP server. However as I also get to write the client code I can see that all it buys me is in this case one line of sorting code at the client. And as the client is one of presently 4, soon to be 16 Tomcats, maybe hundreds if the usage balloons, sorting at the client actually makes more sense to me. I'm wondering whether SSS is really considered much of an idea. My search results in the case aren't larger, dozens rather than hundreds. Just wondering whether it might be more of a weapon than a tool.
In OpenLDAP it is bundled with VLV - Virtual List View, which I will need some day, so it is already installed: so it's really a programming question, not just a configuration question, hence SO not SF.
Server-side sorting is intended for use by clients that are unable or unwilling to sort results themselves; this might be useful in hand-held clients with limited memory and CPU mojo.
The advantages of server-side sorting include, but not limited to:
the server can enforce a time limit on the processing of the sorting
clients can specify an ordering rule for the server to use
professional-quality servers can be configured to reject requests with sort controls attached if the client connection is not secure
the server can enforce resource limits, for example, the aforementioned time limit, or administration limits
the server can enforce access restrictions on the attributes and on the sort request control itself; this may not be that effective if the client can retrieve the attributes anyway
the server may indicate it is too busy to perform the sort or simply unwilling to perform the sort
professional-quality servers can be configured to reject search requests for all clients except for clients with the necessary mojo (privilege, bind DN, IP address, or whatever)
The disadvantages include, but not limited to:
servers can be overwhelmed by sorting large result sets from multiple clients if the server software is unable to cap the number of sorts to process simultaneously
client-side APIs have to support the server-side sort request control and response
it might be easier to configure clients to sort by their own 'ordering rules'; although these can be added to professional-quality, extensible servers
To answer my own question, and not to detract from Terry's answer, use of the Virtual List View requires a Server Side Sort control.

using BOSH/similar technique for existing application/system

We've an existing system which connects to the the back end via http (apache/ssl) and polls the server for new messages, needless to say we have scalability issues.
I'm researching on removing this polling and have come across BOSH/XMPP but I'm not sure how we should take the BOSH technique (using long lived http connection).
I've seen there are few libraries available but the entire thing seems bloaty since we do not need buddy lists etc and simply want to notify the clients of available messages.
The client is written in C/C++ and works across most OS so that is an important factor. The server is in Java.
does bosh result in huge number of httpd processes? since it has to keep all the clients connected, what would be the limit on that. we are also planning to move to 64 bit JVM/apache what would be the max limit of clients in that case.
any hints?
I would note that BOSH is separate from XMPP, so there's no "buddy lists" involved. XMPP-over-BOSH is what you're thinking of there.
Take a look at collecta.com and associated blog posts (probably by Jack Moffitt) about how they use BOSH (and also XMPP) to deliver real-time information to large numbers of users.
As for the scaling issues with Apache, I don't know — presumably each connection is using few resources, so you can increase the number of connections per Apache process. But you could also check out some of the connection manager technologies (like punjab) mentioned on the BOSH page above.