How well will WCF scale to a large number of client users? - wcf

Does anyone have any experience with how well web services build with Microsoft's WCF will scale to a large number of users?
The level I'm thinking of is in the region of 1000+ client users connecting to a collection of WCF services providing the business logic for our application, and these talking to a database - similar to a traditional 3-tier architecture.
Are there any particular gotchas that have slowed down performance, or any design lessons learnt that have enabled this level of scalability?

To ensure your WCF application can scale to the desired level I think you might need to tweak your thinking about the stats your services have to meet.
You mention servicing "1000+ client users" but to gauge if your services can perform at that level you'll also need to have some estimated usage figures, which will help you calculate some simpler stats such as the number of requests per second your app needs to handle.
Having just finished working on a WCF project we managed to get 400 requests per second on our test hardware, which combined with our expected usage pattern of each user making 300 requests a day indicated we could handle an average of 100,000 users a day (assuming a flat usage graph across the day).
In addition, since it's fairly common to make the WCF service code stateless, it's pretty easy to scale out the actual WCF code by adding additional boxes, which means the overall performance of your system is much more likely to be limited by your business logic and persistence layer than it is by WCF.

WCF configuration default limits, concurrency and scalability

Probably the 4 biggest things you can start looking at first (besides just having good service code) are items related to:
Bindings - some binding and they protocols they run on are just faster than others, tcp is going to be faster than any of the http bindings
Instance Mode - this determines how your classes are allocated against the session callers
One & Two Way Operations - if a response isn't needed back to the client, then do one-way
Throttling - Max Sessions / Concurant Calls and Instances
They did design WCF to be secure by default so the defaults are very limiting.

Related

Is it wrong for a service to be producer and consumer of Rabbit MQ?

I want to create a "Notifications Microservice" that will handle different type of notifications (Google Chat, Email, etc).
For this task, we will create a microservice that contains the logic on how to process these messages, and we'll be using Rabbit MQ to manage the queue.
Now, the question that I have, is if it is possible (or if it is a bad practice) to expose two endpoints in the microservice like this:
registerNotification('channel', $data)
processNotification(Rabbit Message)
So I only have to implement the communication with RabbitMQ in one service, and other services will just register messages using this same service instead of directly talking to RabbitMQ.
This way for each channel I could validate in the service that I have everything that I need before enqueuing a message.
Is this a good approach?
I'd suggest splitting your question into two separate ones. As usual, it depends ... there's pros and cons to either one. Below my points without claiming completeness. Assessing those really depends on your specific needs in the end.
1) Is it a good practice to use a Notification / Event Gateway in front of a message queue (here RabbitMQ)?
Pros:
enforce strong guarantees on message structure / correctness
provide advanced authentication / authorization mechanisms if required
provide convenience if languages in your stack lack first-class client support
abstract away / encapsulate technology choices & deployment considerations from services (publishers)
eliminate routing logic for messages from individual services (though, using available routing topologies in RabbitMQ, it's hard to see any added value here)
Cons:
availability becomes a critical concern for your gateway, e.g. assuming you can guarantee an uptime of four nines per service, you are already down to three nines for the composed system by adding this dependency
added operational complexity
added latency
An alternative consideration here might be to use a library to achieve some of the pros above. Though, this approach also comes with its own cons.
2) Is it a good practice to run both message publishers and consumers in one service?
Pros:
quick (shortcut?)
initially less deployed instances (until you have to scale up)
Cons:
operational requirements for producers and consumers (workers!) are typically very different
harder (and more expensive) to scale the system adequately and fine grained
(performance) metrics become difficult to interpret
consumers might impact producer latency negatively as everything is competing for the same resources
loss of flexibility on the consumer side (quick, low risk deployments)
harder to guarantee availability of producers
I hope this helps to better evaluate your architecture based on your own needs / priorities.

API Traffic Shaping/Throttling Strategies For Tenant Isolation

I'll start my question by providing some context about what we're doing and the problems we're facing.
We are currently building a SaaS (hosted on Amazon AWS) that consists of several microservices that sit behind an API gateway (we're using Kong).
The gateway handles authentication (through consumers with API keys) and exposes the APIs of these microservices that I mentioned, all of which are stateless (there are no sessions, cookies or similar).
Each service is deployed using ECS services (one or more docker containers per service running on one or more EC2 machines) and load balanced using the Amazon Application Load Balancer (ALB).
All tenants (clients) share the same environment, that is, the very same machines and resources. Given our business model, we expect to have few but "big" tenants (at first).
Most of the requests to these services translate in heavy resource usage (CPU mainly) for the duration of the request. The time needed to serve one request is in the range of 2-10 seconds (and not ms like traditional "web-like" applications). This means we serve relatively few requests per minute where each one of them take a while to process (background or batch processing is not an option).
Right now, we don't have a strategy to limit or throttle the amount of requests that a tenant can make on a given period of time. Taken into account the last two considerations from above, it's easy to see this is a problem, since it's almost trivial for a tenant to make more requests than we can handle, causing a degradation on the quality of service (even for other tenants because of the shared resources approach).
We're thinking of strategies to limit/throttle or in general prepare the system to "isolate" tenants, so one tenant can not degrade the performance for others by making more requests than we can handle:
Rate limiting: Define a maximum requests/m that a tenant can make. If more requests arrive, drop them. Kong even has a plugin for it. Sadly, we use a "pay-per-request" pricing model and business do not allow us to use this strategy because we want to serve as many requests as possible in order to get paid for them. If excess requests take more time for a tenant that's fine.
Tenant isolation: Create an isolated environment for each tenant. This one has been discarded too, as it makes maintenance harder and leads to lower resource usage and higher costs.
Auto-scaling: Bring up more machines to absorb bursts. In our experience, Amazon ECS is not very fast at doing this and by the time these new machines are ready it's possibly too late.
Request "throttling": Using algorithms like Leaky Bucket or Token Bucket at the API gateway level to ensure that requests hit the services at a rate we know we can handle.
Right now, we're inclined to take option 4. We want to implement the request throttling (traffic shaping) in such a way that all requests made within a previously agreed rate with the tenant (enforced by contract) would be passed along to the services without delay. Since we know in advance how many requests per minute each tenant is gonna be making (estimated at least) we can size our infrastructure accordingly (plus a safety margin).
If a burst arrives, the excess requests would be queued (up to a limit) and then released at a fixed rate (using the leaky bucket or similar algorithm). This would ensure that a tenant can not impact the performance of other tenants, since requests will hit the services at a predefined rate. Ideally, the allowed request rate would be "dynamic" in such a way that a tenant can use some of the "requests per minute" of other tenants that are not using them (within safety limits). I believe this is called the "Dynamic Rate Leaky Bucket" algorithm. The goal is to maximize resource usage.
My questions are:
Is the proposed strategy a viable one? Do you know of any other viable strategies for this use case?
Is there an open-source, commercial or SaaS service that can provide this traffic shaping capabilities? As far as I know Kong or Tyk do not support anything like this, so... Is there any other API gateway that does?
In case Kong does not support this, How hard it is to implement something like what I've described as a plugin? We have to take into account that it would need some shared state (using Redis for example) as we're using multiple Kong instances (for load balancing and high availability).
Thank you very much,
Mikel.
Managing request queue on Gateway side is indeed tricky thing, and probably the main reason why it is not implemented in this Gateways, is that it is really hard to do right. You need to handle all the distributed system cases, and in addition, it hard makes it "safe", because "slow" clients quickly consume machine resources.
Such pattern usually offloaded to client libraries, so when client hits rate limit status code, it uses smth like exponential backoff technique to retry requests. It is way easier to scale and implement.
Can't say for Kong, but Tyk, in this case, provides two basic numbers you can control, quota - maximum number of requests client can make in given period of time, and rate limits - safety protection. You can set rate limit 1) per "policy", e.g for group of consumers (for example if you have multiple tiers of your service, with different allowed usage/rate limits), 2) per individual key 3) Globally for API (works together with key rate limits). So for example, you can set some moderate client rate limits, and cap total limit with global API setting.
If you want fully dynamic scheme, and re-calculate limits based on cluster load, it should be possible. You will need to write and run this scheduler somewhere, from time to time it will perform re-calculation, based on current total usage (which Tyk calculate for you, and you get it from Redis) and will talk with Tyk API, by iterating through all keys (or policies) and dynamically updating their rate limits.
Hope it make sense :)

Max IEndpointInstances per process

Is there an upper limit to the number of unique IEndpointInstances that be hosted within in a single process?
I'm considering a design that will see up to a 100 unique IEndpointInstances, all listening on separate queues, be active simultaneously.
Will this cause a problem for NServiceBus? Could the process deadlock or spin up so many threads as to be unresponsive and useless?
The question NServiceBus - How to get separate queue for each message type receiver subscribes to? seems to suggest that you can not have multiple endpoints in a process, but this is an older post. I have built a small sample against NServiceBus 6--beta4 that does work.
There is a similar question NServiceBus Single Process, but Multiple Input queues that concluded, based on the OP's context using Satellite Features was the recommended approach. However, in my case, I have 100 (functionally different) sagas (1 per queue), where each saga could need to receive similar messages, but I need to make sure that only the correct saga receives the message. Therefor, I don't think implementing a custom feature will meet my requirements. Or will Satellite Features support Sagas?
One of the options is to use self multi hosting. Using this approach, you self the endpoints yourself in the same process. There are a few things to take into consideration, such as:
Assembly scanning (might require custom scanning logic per endpoint).
Throughput (for heavy throughput endpoints I'd recommend a separate hosting process).
To update/redeploy a single endpoint, you'll be taking all of the other 99 endpoints down as well.
While there's no hard limit on how many endpoints can be co-hosted, 100 sounds a bit a lot. Saying that, it also depends how heavy the load on those endpoints is. If you process 1 msg/sec or 1K msg/sec determine a lot if this is a viable option or not.
Have a look at the sample that does exactly that.

Having more WCF methods in a service can decrease performance?

What is a best practice for designing WCF services concerning to the use of more or less operations under a single service.
Taking into consideration that a Service must be generic and Business oriented, I have encountered some SOAP services # work that have too much XML elements per operation in their contracts and too many operations in a single service.
From my point of view, without testing, I think the number of operations within a service will not have any impact on the performance in the middleware since a response is build specifically for each operation containing only the XML elements concerning that operation.
Or are there any issues for having too many operations within a SOAP service ?
There is an issue, and that is when trying to do a metadata exchange or a proxy creation against a service with many methods (probably in the thousands). Since it will be trying to do the entire thing at once, it could timeout, or even hit an OutOfMemory exception.
Dont hink it will impact performance much but important thing is methods must be logically grouped in different service. Service with large number of method usually mean they are not logically factored.

Limiting number of calls to an WCF Web Service

We are planning to develop a web service using WCF. Is there a good way to limit the number of calls to the service in a period of time for a single IP? We don't want to put a hard limit (X number of times an hour), but we want to be able to prevent a spike from a single user.
Rather than trying to come up with our own custom solution and reinventing the wheel, is there an existing implementation or strategy that can be used? Will the different hosting environments in WCF make any difference?
Note: This question is related to this question. Comments were made about better extensibility in WCF, so I'm keeping these two questions separate.