High CPU usage with NServicebus when IIS hosted (Asp.net and WCF)? - nservicebus

We noticed that CPU usage went up from 5% TO 50% after adding NServicebus to our ASP.net MVC app. This was on a server that was not under any load. We noticed the same behavior on another server that hosted a WCF app. After trying out different things, we figured out that if we configured the bus as send only, the CPU usage dropped to 5%. Does anybody know why the cpu usage was so high when the bus is not configured as send only?

I've experienced this before.
What happened to me was I set up an application pool, and it started out running as Network Service. Before I had the chance to set the application pool identity to a domain-level user (for access to file shares, etc.) the pages had already been hit, and so the NServiceBus installers had already created a queue with the Network Service credentials.
When I set the application pool user, all of a sudden it didn't have the proper permissions to the queue.
Normally NServiceBus checks for messages with a timeout if none are available to be received, but in this instance, it goes into a very tight loop of "Are there messages? I don't have permission. Are there messages? I don't have permission." and so you get the very high CPU.
I fixed the problem by deleting the queue and allowing NServiceBus to recreate it with the proper permissions.

It's possible that the cause of the high CPU was the NServiceBus code that looks for a message in the queue, though I find that a bit hard to believe. Send-only mode prevents NServiceBus from looking for messages in the queue.

Related

ASP .NET Core Application Process Isolation for IIS hosted Kestrel Services

I'm migrating a service based integration platform from .Net Framework to .Net Core. The original versions of the integration platform have proven very successful and compared to replacing it with a 'off the shelf' integration solution, it has a far better ROI.
So after redeveloping the code, all tests has been working very well and have achieved higher levels of performance with a single IIS server that I could with 2 IIS servers with the original versions.
Except... If I go over ~3 message/sec with multiple clients, I start seeing duplicate GUID key errors when trying to save instrumentation data to my DB. All these errors are generated from the on-ramp service. The on-ramp places the message on a queue. The messages are then consumed by an off-ramp service and sent to the destination (for this load test the destination is a file folder).
Even though the off-ramp is also running on the same server as the on-ramp, we do not see any duplication errors generated by the off-ramp. I suspect this is due to the queue creating a linier process, so only one instance of the off-ramp is running at any time vs the on-ramp that has up to 4 clients firing concurrent messages at it's API.
Initially I thought the issue was caused by a static global variable class I had implemented, crossing process boundaries. But I would expect that the issue would be seen with the off-ramp as well, as the service architecture for both are virtually identical.
Summary of thoughts on issue:
If it is a pure coding issue, then errors would happen at low messaging rates.
The error would also be seen on the off-ramp if the GUID duplication was chance.
The on and off ramps are both running on the same server, but duplication only seen on the on ramp. IE on ramp not impacting the off ramp and visa versa.
Duplication has to be due to shared memory between concurrently running on-ramp instances, generated by multiple client scenario.
To try and resolve the issue I removed the static global variable class but I'm still seeing the duplication errors.
This issue was never observed in the original IIS implementation (after millions of message processed). I suspect the issue is with process isolation in the IIS hosted Kestrel .Net Core service host. From what I have read there is good isolation between different apps (based on IIS path) but not within the same app. So basically within the same IIS app pool. This could explain why .Net Core does not support multiple app running in the same IIS app pool.
If any one has a good idea how i can achieve process isolation between instances of the same app running in the same IIS app pool I would appreciate your thoughts/suggestions.
After running more tests I was able to resolve the issue. The problem was with the scope of the instrumentation variable. At low rates there was never a problem, but at high throughput, the same instrumentation object was being accessed by a second instance of the process.
The issue was difficult to track down due to the short lived nature of the integration services.
Thanks to anyone who reviewed the question.
Martin

web service messages consumed by other service

I Have a .NET 4.5 WCF web service that consumes messages from a local private MSMQ queue running on Windows Server 2008 R2 with AppFabric installed.
This service reads the messages of the queue and processes the files referenced in the message, i have used AppFabric to throttle the service to process 16 concurrent messages, 8 on each AppPool worker process.
The AppPool is uses a domain account that has full privileges on the network share where the files to be processed are stored.
This service has been working fine for years, except in the last week the ~90% of the files its been asked to process have failed with either a UnauthorizedAccessException.
This behavior was exhibited across all of the services on that application server, no matter which file server the service was asked to process files from. Even files that had previously processed file were now failing.
After a long a fruitless weekend searching and hacking of various different things including:
Shared Folder Permissions and Quotas
Windows Licencing (CALs etc)
Firewalls
Various software patches to the Web app
I eventually discovered the actual issue by accident, whilst redeploying the Web app i noticed something odd. When i stopped the web app via the WCF menu in IIS, the messages continued to be consumed so i stopped the stopped the app pool running the web service, but the messages continue to be consumed, I though this might be due to the large latency added to MQMQ message state by the Distributed transaction service when lots of messages are rolled back to the poison message queue, so i went to lunch. When i came back the messages were still being consumed and process explorer confirmed the apppool running my server was no longer executing.
Something was clearly up but it was uncertain weather this was the cause, a symptom or a coincidence. The clincher was when i throttled my service back to only process one message at a time to see if the access, to the share was reaching some sort of limit, I noticed that failure rate went up to ~98%. This suggested that something else was processing the messages and failing, but also reporting those failures into my reporting system in a way only my application could.
I little further investigation revealed that the default application pool used to serve the default web site, was also executing my WCF web service but failing to access the files on the file server as the identity used to run the default application pool had no privileges the failures took less time the than the successful file processes therefor the slower i made my service go the more messages were failed by the default app pool.
The Cause
Whilst i was adjusting the throttling on my web app, i inadvertently set the throttling or the default web site that was the parent to the web application, i noticed this strait away and reset them back to the default value. What i hadn't realized at the time was that this had added a <system.servicemodel> tag to the web config of the default website. The outcome of which was that my default web site started to behave like a web application and for reasons i am yet to understand, it started to execute the functionality of its child web application, it may be related to the WAS activation, all i know is that i was most certainly not the desired behavior.
The Fix
I removed the <system.servicemodel> tag and its contents from the web.conf of the default website and removed net.msmq from its list of enabled protocols and everything is back to normal.

Why is an IIS/WAS service not cleaning up lock queues on Win 2k8R2 with MSMQ 4.0

I have an IIS/WAS hosted service running as myself in the development environment. It is activated using net.msmq binding. It is configured to use BufferedReceive (it is a WF service), which involves locking the queue. (Sub queues prefixed lock_ are created).
After some time and after the app pool is terminated, the lock queues are automatically removed.
After moving the system to production, with the environment AFAIK identical, but with the service running as a dedicated account, the lock queues are no longer cleaned up. If I stop the app pool, the lock queue remains, with messages.
The most obvious difference is the accounts under which the service runs but I can't find what kind of permissions are affecting this. Alternatively, is there some other configuration difference I am overlooking. Thanks for help.
OK this turns out to be a bug in IIS/WAS.
I had 3 app pools , 2 of which were auto-starting, one of them consuming the msmq. All shared the same AD domain identity, which was shared with the NetMsmqListenerActivator. That netmsmqlistener service was originally stopped, but after I had started it, the symptoms appeared.
Somehow IIS/WAS was creating multiple channels, sending messages to the wrong processes. I have worked around this by creating a separate identity, used now by the mq consuming web service and the netmsmqlistener, followed by a server reboote.
This was a painful problem that only occured on the production server. The lesson learned is to make sure that the production server shares the same hotfixes and patches as the test server .
UPDATE: Hotfix for Win 2k8 R2 SP1: http://support.microsoft.com/kb/2672277

Connect NServiceBus with an AIX Mainframe

I have a back end system that drops events to my system. It is critical that these events don't get lost (I work for a health care company and lost info can impact a patient's care).
I would like to make this system drop it's data into NServiceBus so that it can be published to subscribers that need it. However, my server that is dropping these messages is an AIX machine, so it can't run .NET Code.
This system can send the messages via a lot of standard protocol and communication types (TCP, WSDL Based Services, Call A Database Sproc, etc).
One option I have considered is to setup a WCF service that the AIX mainframe will call. I can then have my WCF service make the call to NServiceBus.
But the events sent per minute of this back end service can at times be fairly high (about 500 messages per minute). I am worried that WCF is not up to this, while NService bus says it can handle 1000 messages per second. Am also worried about data loss in the event of a downtime. NserviceBus claims it is not going to loose any data.
Am I wrong? Is WCF going to be just fine? Or am I making a weak link in the chain?
Is there a way I can use an established protocol to add items directly to an NServiceBus Queue?
Or should I just write my own .NET app that will allow NServiceBus to use a TCP connection?
Note: Because these messages are critical, the message must be acknowledged or the server will keep sending it.
I would take a look at the WCF integration that comes right out of the box. The WCF service is contained within the same host as NSB. The integration does nothing more than just push the message onto the queue, so I don't think you'll have a throughput issue. Seeing that this is critical data, I would suggest clustering the service. The other option would be to install 2 or more instances of the service on different machines and load balance the HTTP calls across both. In essence you would have 1 logical Publisher with 2 physical components doing the publishing.

Limit WCF service resources usage

I'm developing a web application that needs to perform a task that consumes a lot CPU and Memory, and that also may last several minutes. In order to get a better user experience, I also developed a windows service that hosts a WCF service that performs this "high cost" task and that comunicates with the web app using msmq (message queues).
This worked great until I tried to make a load test... The windows service starts consuming a lot of resource, puttin the CPU to work at 100% and more than 1GB of memory. I've looked for optimizations and I've done a lot of tweaks to the code and I think that it is very efficient, but the task just requires a lot of resources.
The problem is that while the WCF service is working, the CPU gets used at 100% and the web app turns INCREDIBLY SLOW! I don't mind if the task that the WCF service does takes a couple of minutes more, but I want the web app to perform well for users.
So I'm wondering if there is a way to limit the resources that the WCF service can consume, giving priority to the web app.
Thanks in advance.
Juan
The easy solution would be to place the WCF service on a different machine.
The fact that the service is using alot of CPU is probably not related to you using WCF.
There are some ways that you may be able to improve the performance of your web app:
Process only one message at a time.
Break the jobs into smaller parts.
Set priority of the windows service to below normal in the task manager
Install more RAM on the server
I guess this is a problem of your Windows service design. When you decide to host WCF in Windows service you have to control resource utilization = you have to control throttling. You have to create configurable control over internal service processing so that you can change the load based on available resources. If you host WCF in IIS it already provides such control on AppPool level.
There are some freeware tools which allow limiting CPU usage for given process but that is not something I would recommend for production usage.
Best regards, Ladislav