I'm migrating a service based integration platform from .Net Framework to .Net Core. The original versions of the integration platform have proven very successful and compared to replacing it with a 'off the shelf' integration solution, it has a far better ROI.
So after redeveloping the code, all tests has been working very well and have achieved higher levels of performance with a single IIS server that I could with 2 IIS servers with the original versions.
Except... If I go over ~3 message/sec with multiple clients, I start seeing duplicate GUID key errors when trying to save instrumentation data to my DB. All these errors are generated from the on-ramp service. The on-ramp places the message on a queue. The messages are then consumed by an off-ramp service and sent to the destination (for this load test the destination is a file folder).
Even though the off-ramp is also running on the same server as the on-ramp, we do not see any duplication errors generated by the off-ramp. I suspect this is due to the queue creating a linier process, so only one instance of the off-ramp is running at any time vs the on-ramp that has up to 4 clients firing concurrent messages at it's API.
Initially I thought the issue was caused by a static global variable class I had implemented, crossing process boundaries. But I would expect that the issue would be seen with the off-ramp as well, as the service architecture for both are virtually identical.
Summary of thoughts on issue:
If it is a pure coding issue, then errors would happen at low messaging rates.
The error would also be seen on the off-ramp if the GUID duplication was chance.
The on and off ramps are both running on the same server, but duplication only seen on the on ramp. IE on ramp not impacting the off ramp and visa versa.
Duplication has to be due to shared memory between concurrently running on-ramp instances, generated by multiple client scenario.
To try and resolve the issue I removed the static global variable class but I'm still seeing the duplication errors.
This issue was never observed in the original IIS implementation (after millions of message processed). I suspect the issue is with process isolation in the IIS hosted Kestrel .Net Core service host. From what I have read there is good isolation between different apps (based on IIS path) but not within the same app. So basically within the same IIS app pool. This could explain why .Net Core does not support multiple app running in the same IIS app pool.
If any one has a good idea how i can achieve process isolation between instances of the same app running in the same IIS app pool I would appreciate your thoughts/suggestions.
After running more tests I was able to resolve the issue. The problem was with the scope of the instrumentation variable. At low rates there was never a problem, but at high throughput, the same instrumentation object was being accessed by a second instance of the process.
The issue was difficult to track down due to the short lived nature of the integration services.
Thanks to anyone who reviewed the question.
Martin
I have a Web Server implemented using dot net MVC4. There are clients connected to this web server which perform some operations and upload live logs to the server using WebClient.UploadString method. Sending these logs from client to server is being done in group of 2500 characters at a time.
Things work fine until 2-3 client upload logs. However when more than 3 clients try to upload logs simultaneously they start receiving "http 500 internal server error".
I might have to scale up and add more slaves but that will make the situation worse.
I want to implement Jenkins like live logging, where logs from slave are updated live.
Please suggest some better and scalable solution to this problem.
Have you considered looking into SignalR?
It can be used for anything from instant messaging to stocks! I have implemented both a chatbox, and a custom system that sends off messages, does calculations and then passes them back down to client. It is very reliable, there are some nice tutorials, and I think it's awesome.
I Have a .NET 4.5 WCF web service that consumes messages from a local private MSMQ queue running on Windows Server 2008 R2 with AppFabric installed.
This service reads the messages of the queue and processes the files referenced in the message, i have used AppFabric to throttle the service to process 16 concurrent messages, 8 on each AppPool worker process.
The AppPool is uses a domain account that has full privileges on the network share where the files to be processed are stored.
This service has been working fine for years, except in the last week the ~90% of the files its been asked to process have failed with either a UnauthorizedAccessException.
This behavior was exhibited across all of the services on that application server, no matter which file server the service was asked to process files from. Even files that had previously processed file were now failing.
After a long a fruitless weekend searching and hacking of various different things including:
Shared Folder Permissions and Quotas
Windows Licencing (CALs etc)
Firewalls
Various software patches to the Web app
I eventually discovered the actual issue by accident, whilst redeploying the Web app i noticed something odd. When i stopped the web app via the WCF menu in IIS, the messages continued to be consumed so i stopped the stopped the app pool running the web service, but the messages continue to be consumed, I though this might be due to the large latency added to MQMQ message state by the Distributed transaction service when lots of messages are rolled back to the poison message queue, so i went to lunch. When i came back the messages were still being consumed and process explorer confirmed the apppool running my server was no longer executing.
Something was clearly up but it was uncertain weather this was the cause, a symptom or a coincidence. The clincher was when i throttled my service back to only process one message at a time to see if the access, to the share was reaching some sort of limit, I noticed that failure rate went up to ~98%. This suggested that something else was processing the messages and failing, but also reporting those failures into my reporting system in a way only my application could.
I little further investigation revealed that the default application pool used to serve the default web site, was also executing my WCF web service but failing to access the files on the file server as the identity used to run the default application pool had no privileges the failures took less time the than the successful file processes therefor the slower i made my service go the more messages were failed by the default app pool.
The Cause
Whilst i was adjusting the throttling on my web app, i inadvertently set the throttling or the default web site that was the parent to the web application, i noticed this strait away and reset them back to the default value. What i hadn't realized at the time was that this had added a <system.servicemodel> tag to the web config of the default website. The outcome of which was that my default web site started to behave like a web application and for reasons i am yet to understand, it started to execute the functionality of its child web application, it may be related to the WAS activation, all i know is that i was most certainly not the desired behavior.
The Fix
I removed the <system.servicemodel> tag and its contents from the web.conf of the default website and removed net.msmq from its list of enabled protocols and everything is back to normal.
I have an iPad client application that are installed in around- 3000 - 4000 iPads. They are available in remote areas and are talking to a web service for submitting the data they collect. The data submission call from the iPads may happen together. I have one single server where all the data is stored in SQL server. The web services are written in .NET and are hosted in IIS 7.
Currently the iPad application does not work as expected as the web services are not able to handle that many requests simultaneously.
What is the best possible way to handle this scenario? Is the delay/scalability issue caused by DB access? Can an in-memory caching at web-service side solve the issue?
I am not in a position to invest in a separate server. So would like to know the best solution for handling as many requests together. The DB insertion can be done asynchronously. Most important task is to bring the data collected on iPads to server.
I have a WCF service that does some document conversions and returns the document to the caller. When developing locally on my local dev server, the service is hosted on ASP.NET Development server, a console application invokes the operation and executes within seconds.
When I host the service in IIS via a .svc file, two of the documents work correctly, the third one bombs out, it begins to construct the word document using the OpenXml Sdk, but then just dies. I think this has something to do with IIS, but I cannot put my finger on it.
There are a total of three types of documents I generate. In a nutshell this is how it works
SQL 2005 DB/IBM DB2 -> WCF Service written by other developer to expose data. This service only has one endpoint using basicHttpBinding
My Service invokes his service, gets the relevant data, uses the Open Xml Sdk to generate a Microsoft Word Document, saves it on a server and returns the path to the user.
The word documents are no bigger than 100KB.
I am also using basicHttpBinding although I have tried wsHttpBinding with the same results.
What is amazing is how fast it is locally, and even more that two of the documents generate just fine, its the third document type that refuses to work.
To the error message:
An error occured while receiving the HTTP Response to http://myservername.mydomain.inc/MyService/Service.Svc. This could be due to the service endpoint binding not using the HTTP Protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the server shutting down). See server logs for more details.
I have spent the last 2 days trying to figure out what is going on, I have tried everything, including changing the maxReceivedMessageSize, maxBufferSize, maxBufferPoolSize, etc etc to large values, I even included:
<httpRuntime maxRequestLength="2097151" executionTimeout="120"/>
To see maybe if IIS was choking because of that.
Programatically the service does nothing special, it just constructs the word documents from the data using the Open Xml Sdk and like I said, locally all 3 documents work when invoked via a console app running locally on the asp.net dev server, i.e. http://localhost:3332/myService.svc
When I host it on IIS and I try to get a Windows Forms application to invoke it, I get the error.
I know you will ask for logs, so yes I have logging enabled on my Host.
And there is no error in the logs, I am logging everything.
Basically I invoke two service operations written by another developer.
MyOperation calls -> HisOperation1 and then HisOperation2, both of those calls give me complex types. I am going to look at his code tomorrow, because he is using LINQ2SQL and there may be some funny business going on there. He is using a variety of collections etc, but the fact that I can run the exact same document, lets call it "Document 3" within seconds when the service is being hosted locally on ASP WebDev Server is what is most odd, why would it run on scaled down Cassini and blow up on IIS?
From the log it seems, after calling HisOperation1 and HisOperation2 the service just goes into la-la land dies, there is a application pool (w3wp.exe) error in the Windows Event Log.
Faulting application w3wp.exe, version 6.0.3790.1830, stamp 42435be1, faulting module kernel32.dll, version 5.2.3790.3311, stamp 49c5225e, debug? 0, fault address 0x00015dfa.
It's classified as .NET 2.0 Runtime error.
Any help is appreciated, the lack of sleep is getting to me.
Help me Obi-Wan Kenobi, you're my only hope.
I had this message appearing:
An error occured while receiving the HTTP Response to http://myservername.mydomain.inc/MyService/Service.Svc. This could be due to the service endpoint binding not using the HTTP Protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the server shutting down). See server logs for more details.
And the problem was that the object that I was trying to transfer was not [Serializable]. The object I was trying to transfer was DataTable.
I believe word documents you were trying to transfer are also non serializable so that might be the problem.
Yes, we'd want logs, or at least some idea of what you're logging. I assume you have both message and transport logging on at the WCF level.
One thing to look at is permissions. When you run under Cassini the web server is running as the currently logged in user. This hides any SQL or CAS permission problems (as, lets be honest, your account is usually a local administrator). As soon as you publish to IIS you are now running under the application pool user, which is, by default, a lot more limited.
Try turning on IIS debug dumps and following the steps in KB919789
Fyi, I changed IIS 6 to work in IIS 5.0 Isolation mode and everything works. Odd.
I had the same error when using an IEnumerable<T> DataMember in my WCF service. Turned out that in some cases I was returning an IQueryable<T> as an IEnumerable<T>, so all I had to do was add .ToList<T>() to my LINQ statements.
I changed the IEnumerable<T> to IList<T> to prevent making the same mistake again.