WCF Throttling - Rational behind the default values - wcf

The default values are
Concurrent Calls: 16 * processor count
Concurrent Sessions: 100 * processor count
Concurrent Instances: Concurrent Calls + Concurrent Sessions
This is fine. But I am trying to understand the rationale behind the default value for Concurrent Instances. Why its sum of other two? Can somebody demystify? please
Note: Yes, we can override the values as we like.

From Wenlong Dong's old Blog: WCF 4: Higher Default Throttling Settings for WCF Services
The main purpose for the throttling settings can be classified into
the following two aspects:
Controlled resource usage: With the throttling of concurrent execution, the usage of resources such as memory or threads can be
limited to a reasonable level so that the system works well without
hitting reliability issues.
Balanced performance load: Systems always work in a balanced way when the load is controlled. If there are too much concurrent
execution happening, a lot of contention and bookkeeping would happen
and thus it would hurt the performance of the system.
There's more detail in the blog...

A service can have its SessionMode set to Allowed, NotAllowed, or Required, so the instancing behaviour of the service can depend on the incoming connections. Look at the "Allowed" column of the table at the bottom of this documentation for "Sessions, Instancing, and Concurrency".
It allows an instance per session and an instance per call, depending on the connecting channels.
So the instance limit should be the sum of the session and call limits.

Related

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

Max IEndpointInstances per process

Is there an upper limit to the number of unique IEndpointInstances that be hosted within in a single process?
I'm considering a design that will see up to a 100 unique IEndpointInstances, all listening on separate queues, be active simultaneously.
Will this cause a problem for NServiceBus? Could the process deadlock or spin up so many threads as to be unresponsive and useless?
The question NServiceBus - How to get separate queue for each message type receiver subscribes to? seems to suggest that you can not have multiple endpoints in a process, but this is an older post. I have built a small sample against NServiceBus 6--beta4 that does work.
There is a similar question NServiceBus Single Process, but Multiple Input queues that concluded, based on the OP's context using Satellite Features was the recommended approach. However, in my case, I have 100 (functionally different) sagas (1 per queue), where each saga could need to receive similar messages, but I need to make sure that only the correct saga receives the message. Therefor, I don't think implementing a custom feature will meet my requirements. Or will Satellite Features support Sagas?
One of the options is to use self multi hosting. Using this approach, you self the endpoints yourself in the same process. There are a few things to take into consideration, such as:
Assembly scanning (might require custom scanning logic per endpoint).
Throughput (for heavy throughput endpoints I'd recommend a separate hosting process).
To update/redeploy a single endpoint, you'll be taking all of the other 99 endpoints down as well.
While there's no hard limit on how many endpoints can be co-hosted, 100 sounds a bit a lot. Saying that, it also depends how heavy the load on those endpoints is. If you process 1 msg/sec or 1K msg/sec determine a lot if this is a viable option or not.
Have a look at the sample that does exactly that.

WCF serviceBehaviors vs binding settings

In WCF, whats the difference between the binding setting maxConnections and the ServiceBehaviors serviceThrottling settings (maxConcurrentCalls,maxConcurrentInstances,maxConcurrentSessions)?
I'm trying to get my WCF service setup and I'm not exactly sure how those work with each other to limit connections.
Two things are important to consider:
the serviceThrottling behavior is a service-/server-side setting that determines how many concurrent calls, instances and sessions are supported by the server. This is independent of any binding or service endpoint - it's a service-wide setting. This allows you to tweak how many concurrent requests (and/or sessions) a specific service can handle - that depends on things like server "power", RAM, CPU and a lot more factors. Those values are kept fairly low by default, to avoid servers from being "overloaded" and thus rendered unresponsive by large floods of requests (erroneously or maliciously)
the maxConnections setting on the binding is specific to the netTcpBinding (and it's "cousins", like the netNamedPipe and various Azure-oriented net***Relay bindings) and has to do with connection pooling. Much like ADO.NET database connections are pooled, TCP/IP connections to the server can be pooled and reused to reduce the overhead of having to destroy and re-create them. This is mostly a client-side setting (although it also has effects on the server-side), and again: it's specific to the netTcpBinding (and cousins; all based on TCP/IP) and doesn't exist for any of the other bindings.
See: More details on MaxConnections for more, great in-depth insights into the ins and outs of this setting.

Ability to deal with 10000 concurrent requests - how to do it?

My server need to have the ability to handle 10000 request in the same time.
I don't know how to define it on the WCF - and I don't find any example or article that can help me to understand how to do it.
Do I need to create different thread for each coming request?
You can do this by using the serviceThrottling tag in your app.config file :
<serviceThrottling maxConcurrentSessions="10000" maxConcurrentCalls="1000"/>
You don't need to create any thread, WCF will handle all this for you, using a thread pool.
That being said, if all of those 10000 requests come at the exact same time, and should all be handled concurrently (let's say for example that they are all incrementing an interlocked int and that none of those calls will return before it reaches 10000), the .net CLR might have trouble creating the 10000 needed threads ...
I think you should carefully review what your concurrency requirements really are. Increasing the maxConcurrentCalls to 10000 is not a silver bullet, it's not going to magically solve all the other performance bottlenecks and/or limitations you might have somewhere else in your software (and in your hardware!)
Being able to handle x concurrent requests is not going to help if all of those threads are all using a single database connection, for example !

How well will WCF scale to a large number of client users?

Does anyone have any experience with how well web services build with Microsoft's WCF will scale to a large number of users?
The level I'm thinking of is in the region of 1000+ client users connecting to a collection of WCF services providing the business logic for our application, and these talking to a database - similar to a traditional 3-tier architecture.
Are there any particular gotchas that have slowed down performance, or any design lessons learnt that have enabled this level of scalability?
To ensure your WCF application can scale to the desired level I think you might need to tweak your thinking about the stats your services have to meet.
You mention servicing "1000+ client users" but to gauge if your services can perform at that level you'll also need to have some estimated usage figures, which will help you calculate some simpler stats such as the number of requests per second your app needs to handle.
Having just finished working on a WCF project we managed to get 400 requests per second on our test hardware, which combined with our expected usage pattern of each user making 300 requests a day indicated we could handle an average of 100,000 users a day (assuming a flat usage graph across the day).
In addition, since it's fairly common to make the WCF service code stateless, it's pretty easy to scale out the actual WCF code by adding additional boxes, which means the overall performance of your system is much more likely to be limited by your business logic and persistence layer than it is by WCF.
WCF configuration default limits, concurrency and scalability
Probably the 4 biggest things you can start looking at first (besides just having good service code) are items related to:
Bindings - some binding and they protocols they run on are just faster than others, tcp is going to be faster than any of the http bindings
Instance Mode - this determines how your classes are allocated against the session callers
One & Two Way Operations - if a response isn't needed back to the client, then do one-way
Throttling - Max Sessions / Concurant Calls and Instances
They did design WCF to be secure by default so the defaults are very limiting.