Caching authentication calls - authentication

Imagine, that we have one authentication service and two applications that use It by http call. Where we should put a cache?
Only authentication service has its cache (redis for example)
Caching is a part of application number one and two

Putting your cache inside the applications is a judgement call. It will be far more efficient if it can cache in-memory, but it comes at a trade-off in that it might be invalid. This is ultimately a per-case basis, and depends on the needed security of your application.
You will have to figure out what an acceptable TTL for your cache is, it could be anywhere between zero and eternal. If it's zero, then you've answered your question, and there is no value in a cache layer at all in the application.
Most applications can accept some level of staleness, at least on the order of a few seconds. If you are doing banking transactions, you likely cannot get away with this, but if you are creating a social-media application, you can likely have a TTL at least in the several-minutes range, possibly hours or days.
Just a bit of advice, since you are using HTTP for your implementation, take a look at using the cache-control that is baked into HTTP, it's likely your client will support it out-of-the-box, and most of the large, complicated issues (cache expiration, store sizing, etc) will be be solved by people long before you.

Related

How to limit the number of outgoing web request per second?

Right now, I am working on an ASP.NET Core Web API that calls an external web service and uses the returned data in its own response. This is working fine.
However, I discovered that the external service is not as scalable as I would like to. Therefore, as discussed with the company providing this external service, the number of outgoing requests needs to be limited to one per second. I als use caching to reduce the number of outgoing requests but this has been shown to be not effective enough because (as logically) it only works when a lot of requests are the same so cache data can be reused.
I have been doing some investigation on rate limiting but the documented examples and types are far more complicated than what I need. This is not about partitions, tokens or concurrency. What I want is far more simple. Just 1 outgoing request per second and that´s all.
I am not that experienced when it comes to rate limiting. I just started reading the referred documentation and found out that there is a nice and professional package for it. But the examples are more complicated than what I need for the reasons explained. It is not about tokens or concurrency or so. It is the number of outgoing requests per second that needs to be limited.
Possibly, there is a way using the package System.Threading.RateLimiting in such a way that this is possible by applying the RateLimiter class correctly. Otherwise, I may need to write my own DelegateHandler implementation to do this. But there must be a straightforward way which people with experience in rate limiting can explain me.
So how to limit the number of outgoing web request per second?
In addition, what I want to prevent is a 429 or so in case of to many request. In such a situation, the process should just take more waiting time in order to complete so the number of outgoing requests is limited.

why do we need consistent hashing when round robin can distribute the traffic evenly

When the load balancer can use round robin algorithm to distribute the incoming request evenly to the nodes why do we need to use the consistent hashing to distribute the load? What are the best scenario to use consistent hashing and RR to distribute the load?
From this blog,
With traditional “modulo hashing”, you simply consider the request
hash as a very large number. If you take that number modulo the number
of available servers, you get the index of the server to use. It’s
simple, and it works well as long as the list of servers is stable.
But when servers are added or removed, a problem arises: the majority
of requests will hash to a different server than they did before. If
you have nine servers and you add a tenth, only one-tenth of requests
will (by luck) hash to the same server as they did before. Consistent hashing can achieve well-distributed uniformity.
Then
there’s consistent hashing. Consistent hashing uses a more elaborate
scheme, where each server is assigned multiple hash values based on
its name or ID, and each request is assigned to the server with the
“nearest” hash value. The benefit of this added complexity is that
when a server is added or removed, most requests will map to the same
server that they did before. So if you have nine servers and add a
tenth, about 1/10 of requests will have hashes that fall near the
newly-added server’s hashes, and the other 9/10 will have the same
nearest server that they did before. Much better! So consistent
hashing lets us add and remove servers without completely disturbing
the set of cached items that each server holds.
Similarly, The round-robin algorithm is used to the scenario that a list of servers is stable and LB traffic is at random. The consistent hashing is used to the scenario that the backend servers need to scale out or scale in and most requests will map to the same server that they did before. Consistent hashing can achieve well-distributed uniformity.
Let's say we want to maintain user sessions on servers. So, we would want all requests from a user to go to the same server. Using round-robin won't be of help here as it blindly forwards requests in circularly fashion among the available servers.
To achieve 1:1 mapping between a user and a server, we need to use hashing based load balancers. Consistent hashing works on this idea and it also elegantly handles cases when we want to add or remove servers.
References: Check out the below Gaurav Sen's videos for further explanation.
https://www.youtube.com/watch?v=K0Ta65OqQkY
https://www.youtube.com/watch?v=zaRkONvyGr8
For completeness, I want to point out one other important feature of Consistent Hashing that hasn't yet been mentioned: DOS mitigation.
If a load-balancer is getting spammed with requests, (either from too many customers, an attack, or a haywire local service) a round-robin approach will apply the request spam evenly across all upstream services. Even spread out, this load might be too much for each service to handle. So what happens? Your loadbalancer, in trying to be helpful, has brought down your entire system.
If you use a modulus or consistent hashing approach, then only a small subset of services will be DOS'd by the barrage.
Being able to "limit the blast radius" in this manner is a critical feature of production systems
Consistent hashing is fits well for stateful systems(where context of the previous request is required in the current requests), so in stateful systems if previous and current request lands in different servers than for current request context is lost and system won't be able to fulfil the request, so in consistent hashing with the use of hashing we can route of requests to same server for that particular user, while in round robin we cannot achieve this, round robin is good for stateless systems.

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

Implementing IDuplexSessionChannel for Message Interception & Replacement

I would like to intercept WCF messages on the client side. I cannot use any MessageInspector for this, because I would like to implement a client side WCF cache. If the request has been cached before, the response should come from the cache, otherwise the request ist forwarded to the service.
As I am using netTcpBinding and netNamedPipeBinding, the "simple" way, implementing IRequestChannel is not possible. I need to implement IDuplexSessionChannel. Now, I am looking for a working sample how to intercept and replace messages.
But why is this important?
In theory WCF-services, as all other calls that possibly goes over a network, should have coarse-grained interfaces. The reason is obvious: WCF has tons of features to secure the connection, enable reliable-messing, ensures authentication, enable transactions (... continued ...). This will not come for free, obviously.
In practice we ofen encounter "exeptions" from that rules. Services that are called a thousend times in one service method and other violations of best practices. Well, of course the best way to deal with this situation would be to redesign the services. Unfortunately, that rarely happens (you name the reasons ...).
That is where caching comes into play. There are, basically, two ways of doing this:
Implementing a solution that needs to rewrite parts of your applications. One way of doing this is to write a proxy-caller (e.g. using generics for that).
List item
Implementing a "transparent" solution, that works with all WCF-services, without any modifications
For obvious reasons the second solution seems more promising. Again, there are two alternatives:
Writing a servers-side-caching solution, using WCF behaviors and IOperationInvoker. This is pretty straightforward to accomplish and the web gives you some good samples how to do this. Such as olution is acceptable, if the service to be cached is pretty expensive in its methods, e.g. loading lots of information from a database, so that looking up the result in your cache is much faster than perform all necessary calculations and IO-operations. However, the WCF-call is still there, with all overhead that comes with it. The advantage is, that you only need to define that behavior once in your service and all clients of this service will benefit from the cache.
Writing a client-side-caching-solution, that prevents the WCF-call, if the response is already in the cache. This, of course, prevents all the WCF-overhead, but requires to define the (endpoint) behavior within all clients that accesses the services to be cached (e.g. any master data service or any other "slowly changing dimension"-services).
The second solution is much more complicated, as you need a channel factory, (and / or a listener) and an implementation of the channel itself. The channel could be a IRequestChannel or a IDuplexSessionChannel. Again, you will find a working solution for the first type on the web, but that, naturally, will not work for netTcpBining or netNamedPipeBinding, which uses the IDuplexSessionChannel. That is, why I am looking for a sample that illustrates how to do it right.
Just to give an impression of what the benefits would be: One solution, that has a long-running service method, the execution time is (approx. 150000 calls of other services within that service):
netTcpBinding, no caching: 65 minutes
netNamedPipeBinding, no caching: 40 minutes
netNamedPipeBinding, server-side-caching: 27 minutes
netNamedPipeBinding, client-side-caching: 19 minutes
The no. of calls drops fro 150000 to about 40000 in that szenario. However, my solution for client-side-caching will not work well for duplex-channels and other special commmunication types. Therefor, I am looking for a sample.
Any help would be appreciated.

Concurrent access to WCF client proxy

I'm currently playing around a little with WCF, during this I stepped on a question where I'm not sure if I'm on the right track.
Let's assume a simple setup that looks like this: client -> service1 -> service2.
The communication is tcp-based.
So where I'm not sure is, if it makes sense that the service1 caches the client proxy for service2. So I might get a multi-threaded access to that proxy, and I have to deal with it.
I'd like to take advantage of the tcp session to get better performance, but I'm not sure if this "architecture" is supported by WCF/network/whatever at all. The problem I see is that all the communication goes over the same channel, if I'm not using locks or another sync.
I guess the better idea is to cache the proxy in a threadstatic variable.
But before I do that, I wanted to confirm that it's really not a good idea to have only one proxy instance.
tia
Martin
If you don't know that you have a performance problem, then why worry about caching? You're opening yourself to the risk of improperly implementing multithreading code, and without any clear, measurable benefit.
Have you measured performance yet, or profiled the application to see where it's spending its time? If not, then when you do, you may well find that the overhead of multiple TCP sessions is not where your performance problems lie. You may wish you had the time to optimize some other part of your application, but you will have spent that time optimizing something that didn't need to be optimized.
I am already using such a structure. I have one service that collaborates with some other services and realise the implementation. Of course, in my case the client calls some one-way method of the first service. I am getting very good benifit. Of course, I also have configured it to limit the number of concurrent calls in some of the cases.
Yes, that architecture is supported by WCF. I deal with applications every day that use similar structures, using NetTCPBinding.
The biggest thing to worry about is the ConcurrencyMode of the various services involved, and making sure that they do not block unnecessarily. It is very easy to get into a scenario where you will be guaranteed timeouts, or at the least have poor performance due to multiple, synchronous calls across service boundaries. Even OneWay calls are not guaranteed to immediately return.
careful with threadstatic, .net changes the thread so the variable can get null.
For session...perhaps you could use session enabled calls:
http://msdn.microsoft.com/en-us/library/ms733040.aspx
But i would not recomend using if you do not have any performance issue. I would use the normal way, or if service 1 is just for forwarding you could use that functionality easily with 4.0:
http://www.sdn.nl/SDN/Artikelen/tabid/58/view/View/ArticleID/2979/Whats-New-in-WCF-40.aspx
Regards
Firstly, make sure you know about the behaviour of ThreadStatic in ASP.NET applications:
http://piers7.blogspot.com/2005/11/threadstatic-callcontext-and_02.html
The same thread that started your request may not be the same thread that finishes it. Basically the only safe way of storing Thread local storage in ASP.NET applications is inside HttpContext. The next obvious approach would be to creat a wrapper client to manage your WCF client proxy and ensure each IO request is thread safe using locks.
Although my personal preference would be to use a pool of proxy clients. Whenever you need one pop it off the pool queue and when you're finished with it put it back on.