This problem has taken our more than 2 weeks but no solution I could manage to find anywhere on web.
Problem statement:
We have created a WCF service which has 3 layers:
1) WCF service layer
2) Facade Layer
3) Agent Layer
Layer 1 uses third party dlls to use the out-of-box contracts and calls facade layer (layer 2) to delegate the request.
Layer 2 simply redirects to the agent layer( layer 3) after business validations or any fore front processing.
Layer 3 uses the same third party dlls (as used in layer 1) to call their APIs to do the business operations.
After deployment everything works perfectly well. As many request comes are served well by the service. Till anyone deliberately or at a fixed point of time on production recycles the application pool hosting this service.
After recycling the service starts behaving weird. The requests are still being processed when I debug the code. At the agent layer (layer 3) same way requests still reach for business operations as expected but then there is some out-of-box business validation error message starts getting thrown from third party dlls. And it continues to throw same error, same behavior, until and unless I clean the temporary ASP.Net folder or re-publish the service code on IIS. This business validation error as such doesn't stop, fault or break our service but then it aborts further processing in layer 3 (agent layer) beyond the point it originates.
I have tried all what I could find searching on web, suggested by many developers but nothing worked. Few (not all) are listed below what I tried:
a. Cleared cache (both client and server side of third party application) and re-run tests
b. Created different app pool for wcf service and re-executed tests
c. Tried on other development environments and local development environment - issue replicates everywhere
d. Verified by changing publish service level setting and app pool related setting in IIS
e. Cleared AssemblyInfo file of all the settings across all projects in solution.
f. played with different settings at application pool level like one about two overlap flags.
g. gave all kind of permissions at temporary ASP.Net and IIS publish folders level.
Seems some references are getting broken at runtime.
In other services in our solution we have never used contracts from third party dlls but always created a wrapper and they work fine. Third party dlls till now were always used in the agent (layer 3) layer.
I used them in layer 1. Technically I didnt find anything wrong in using them even at layer 1 (service code). May be I am wrong or my design, not sure. But what is happening has taken more than 2 weeks.
technology stack:
IIS 7
Visual Studio 2017
MS .Net Framework 4.5.2
please suggest.
Related
I'm migrating a service based integration platform from .Net Framework to .Net Core. The original versions of the integration platform have proven very successful and compared to replacing it with a 'off the shelf' integration solution, it has a far better ROI.
So after redeveloping the code, all tests has been working very well and have achieved higher levels of performance with a single IIS server that I could with 2 IIS servers with the original versions.
Except... If I go over ~3 message/sec with multiple clients, I start seeing duplicate GUID key errors when trying to save instrumentation data to my DB. All these errors are generated from the on-ramp service. The on-ramp places the message on a queue. The messages are then consumed by an off-ramp service and sent to the destination (for this load test the destination is a file folder).
Even though the off-ramp is also running on the same server as the on-ramp, we do not see any duplication errors generated by the off-ramp. I suspect this is due to the queue creating a linier process, so only one instance of the off-ramp is running at any time vs the on-ramp that has up to 4 clients firing concurrent messages at it's API.
Initially I thought the issue was caused by a static global variable class I had implemented, crossing process boundaries. But I would expect that the issue would be seen with the off-ramp as well, as the service architecture for both are virtually identical.
Summary of thoughts on issue:
If it is a pure coding issue, then errors would happen at low messaging rates.
The error would also be seen on the off-ramp if the GUID duplication was chance.
The on and off ramps are both running on the same server, but duplication only seen on the on ramp. IE on ramp not impacting the off ramp and visa versa.
Duplication has to be due to shared memory between concurrently running on-ramp instances, generated by multiple client scenario.
To try and resolve the issue I removed the static global variable class but I'm still seeing the duplication errors.
This issue was never observed in the original IIS implementation (after millions of message processed). I suspect the issue is with process isolation in the IIS hosted Kestrel .Net Core service host. From what I have read there is good isolation between different apps (based on IIS path) but not within the same app. So basically within the same IIS app pool. This could explain why .Net Core does not support multiple app running in the same IIS app pool.
If any one has a good idea how i can achieve process isolation between instances of the same app running in the same IIS app pool I would appreciate your thoughts/suggestions.
After running more tests I was able to resolve the issue. The problem was with the scope of the instrumentation variable. At low rates there was never a problem, but at high throughput, the same instrumentation object was being accessed by a second instance of the process.
The issue was difficult to track down due to the short lived nature of the integration services.
Thanks to anyone who reviewed the question.
Martin
I'm using Simple Injector in my WCF service. While running it from VS2010 everything is fine. However, when I publish it to my server using IIS 7, after some time (20 min, counted) my WCF loses all registered assemblies, modules, classes in container.
I guess IIS recycles the WCF Service Application Pool and drops my container registrations.
Can anyone help me on this?
While there exists many legitimate cases of using self-hosting WCF services, however, approaching self-hosting just because of IIS recycling may be counter productive.
Hosting in IIS gives you a lot benefit during development and daily operations, and I am not going to repeat what benefits which you could easily find out in google search.
So when IIS receive the first request to your application, it will launch a worker process named "w3wp.exe" according the settings in the application pool associated with your web app. And by default IIS will shutdown in 20 minutes of idle time. Check the Advanced Settings of the application pool, you will find a lot settings for the life cycle. You won't get such flexibility and robustness through self-hosting out of the box.
So basically you could have a few options provided you decide to stay with IIS hosting.
Change the Idle Time-out to 24-hours or even a month.
Write a small program or use cUrl to ping your application every 10 minute.
Leave it as it is
If you want to keep states during operations, save them in disk, then load them during next launch triggered by a request.
I'm working on my fourth or fifth implementation of a WCF service over MSQM with IIS/WAS activation. And I was never able to make it work properly. It's always the same story: my services are activated only if the IIS web site was interacted some other way (like servicing the service metadata page at /somewhere/myService.svc). Suddenly, if the only thing happening is sending messages into the queue, my services stop to process messages, and restart as soon as I visit the .svc page...
It's a so common pattern for me, that I also came to a common solution: scheduling a job (every few minutes) that runs a powershell script that access that page. Quite simple, but not very elegant. And, further more, unnecessary in theory.
This happened over different IIS versions (7.0 and 7.5), over various Win 2008 service packs and releases and with server in AD domains or workgroups. I think I've read every bits on the web about this, especially MSDN and microsofties blog, so binding configuration, MSMQ permissions, and all the other small details you can discover here and there are set up.
So the question: does anybody was successful with WAS over MSMQ?
We have some WCF services implemented in an IIS application, communicating over net.tcp on the default port (808), using the Microsoft Net.Tcp Port Sharing Service, throwing an error on production servers. When I instantiate a connection to the first of the services, I get back an exception:
The server at <URL> rejected the session-establishment request. All the other services respond fine.
But it runs fine on our test servers.
I initially thought there was something wrong with the particular service that was failing, but I tried rearranging the list of services into a different order, and it SEEMS to always be the first service that I hit that fails. (I say SEEMS because it think once in the early iterations of testing, I saw it happen on the second service that it hit. But I haven't been able to reproduce that.)
I've looked at application startup delays, and that doesn't seem to be the problem, because I can come back and run the test again as soon as it finishes - a delay of only a minute or two - and get the same error. Also, in the lower level environments, there is a start up delay of probably 30 seconds to a minute, but the result still comes back as expected.
I've tried accessing the services over http from INetManager, and I get intermittent failures on all the services - a particular service will return a yellow screen of death on on invocation, then come up with the expected link to the WSDL on the next one seconds later.
I'm completely at a loss to explain this behavior, or how to resolve it. I've googled the error message, and not found anything helpful. It may be a configuration issue - the production servers are newly provisioned VM's, and we may not have the config exactly right (whereas all the lower level environments have been running this and other similar apps for some time), but I have not idea what to look for. I've looked at the properties of the app pool that the app is running on and compared it to the lower level environments without finding any differences.
If somebody can point me in the right direction, you would have my undying gratitude.
Things I can find:
http://go4answers.webhost4life.com/Example/connect-busy-wcf-service-host-while-725.aspx:
MaxConcurrentSessions (default = 10) [Per-channel] The maximum number of sessions that a service can accept at one time. Only comes into play with session-based bindings (wsHttp or netTcp)"
http://blogs.infosupport.com/unable-to-generate-a-wcf-proxy-using-svcutil-but-retreiving-the-wsdl-works/
So in the end the trick is to add the additional right on the c:\windows\temp folder for your App Pool Identity [for the service to be able to generate metadata] to solve the problem.
Also, are timeouts or other limits configured and being hit? Give tracing a look and access the service using WcfTestClient and see if you can find underlying errors.
I have two WCF services in a single solution (let's call it the service solution). I've deployed those services on a remote machine. I have another solution in which I consume those services by creating a reference to the services on the remote machine (let's call it the client solution). The code for both services is up to date on the remote machine (AFAIK), because I have deployed the most up to date code to it.
The issue is that in the service solution I can debug one of the services but not the other. The one that fails gives this error when I put a breakpoint on it.
The breakpoint will not currently be hit. The source code is different from the original version.
The question is, what could be different between the two WCF services to allow one to be debuggable and the other not.