Recovering from a CommunicationObjectFaultedException in WCF - wcf

I have a client app that tries every 10 seconds to send a message over a WCF web service. This client app will be on a computer on board a ship, which we know will have spotty internet connectivity. I would like for the app to try to send data via the service, and if it can't, to queue up the messages until it can send them through the service.
In order to test this setup, I start the client app and the web service (both on my local machine), and everything works fine. I try to simulate the bad internet connection by killing the web service and restarting it. As soon as I kill the service, I start getting CommunicationObjectFaultedExceptions--which is expected. But after I restart the service, I continue to get those exceptions.
I'm pretty sure that there's something I'm not understanding about the web service paradigm, but I don't know what that is. Can anyone offer advice on whether or not this setup is feasible, and if so, how to resolve this issue (i.e. re-establish the communications channel with the web service)?

Client service proxies cannot be reused once they have faulted. You must dispose of the old one and recreate a new one.
You must also make sure you close the client service proxy properly. It is possible for a WCF service proxy to throw an exception on close, and if this happens the connection is not closed, so you must abort. Use the "try{Close}/catch{Abort}" pattern. Also bear in mind that the dispose method calls close (and hence can throw an exception from the dispose), so you can't just use a using like with normal disposable classes.
For example:
if (yourServiceProxy != null)
if (yourServiceProxy.State != CommunicationState.Faulted)
catch (CommunicationException)
// Communication exceptions are normal when
// closing the connection.
catch (TimeoutException)
// Timeout exceptions are normal when closing
// the connection.
catch (Exception)
// Any other exception and you should
// abort the connection and rethrow to
// allow the exception to bubble upwards.
// This is just to stop you from trying to
// close it again (with the null check at the start).
// This may not be necessary depending on
// your architecture.
yourServiceProxy = null;
There was a blog article about this, but it now appears to be offline. A archived version is available on the Wayback Machine.


How to restart Kestrel Hosted in Windows Service?

I want to restart Kestrel ( core 3.1) via an authorized http request.
Kestrel is contained in a Windows service configured to restart itself automatically upon failure.
As far as i know the simplest way is to return an exit code different from 0, windows will do all the rest.
In light of that, the code I wrote is actually simple:
public MaintenanceController(IHostApplicationLifetime hostLifetime)
this.HostLifetime = hostLifetime ?? throw new ArgumentNullException(nameof(hostLifetime));
public ActionResult RestartService()
Program.IsRestart = true; //see below
//this does not work!
if (HostLifetime is WindowsServiceLifetime lifetime)
lifetime.ExitCode = 1;
//neither this!
Environment.ExitCode = 1;
return Ok();
The only way to make windows restarts the service is just actually call
without HostLifetime.StopApplication();
But the issue with Environment.Exit called alone is it causes a non graceful shutdown, something I want absolutely to avoid.
Another approach I tried, to force an exit code different from 0, which did not work, was to put in Startup.cs:
public void Configure(IApplicationBuilder app, IWebHostEnvironment env, IHostApplicationLifetime applicationLifetime)
applicationLifetime.ApplicationStopped.Register(() =>
//this can be set in the controller
if (Program.IsRestart)
but, when called after ApplicationStopped, Environment.Exit(1) does apparently nothing, in fact even inside event viewer there is no trace of the services' shutdown with error, so windows does nothing.
Going back to classic, I changed the Main entry point to return an int and returned 1.
Still windows does not restart the service nor a failure entry is written to event viewer. It looks like is always stopped gracefully
Everything is working like it should. You may want to consider writing a wrapper batch file or watcher process.
Your watcher would wait for the process to exit, and if a flag file (e.g. .staydown) is present, it would exit gracefully. If the file doesn't exist, it would then restart the process.
In your RestartService method, use Environment.Exit(0) which would be a graceful shutdown, but since the .staydown file doesn't exist, your watcher would then restart the server. Your watcher would only stop running if the .staydown file exists and then the server is stopped, or the Windows Service itself is stopped.
When your app starts up, be sure to delete .staydown if present.
If you have node installed, you might be able to use a utility like forever in place of the watcher batch file.
You need to check the 'Enable actions for stops with errors' on the service, or run 'sc failureflag "My Service" 1'
More info here:
(this is a very strange behaviour on Windows part I think, honouring exit codes should be default!)

Websocket ping timeout freezes the Mattermost "bot"

I'm creating a Mattermost bot. It stops responding after the websocket connection receives a ping timeout (PingTimeoutChannel) after random periods of time (1 minute, 8 minutes, 2 hours etc.). Mattermost server is v.5.13, API v.4.
The bot connects to the Mattermost API by creating new Client4. Next it logs in as the user and after it creates a Websocket client with the authorization token received. It starts listening on all channels and when it receives an event which is a message directed to him (#botname) it responds automatically (creates
I chose to use simple username/password authentication for logging in, just as it is in the Mattermost sample bot. However, I tried to rewrite it to the personal access token authentication (as in here) because I'd thought it'd solve the timeout problem. However, this solution doesn't work anymore, it gives the "Invalid or expired session error, please login again" while trying to login that way.
So I dropped this idea and started searching where the timeout happens. The server pings are ok, the websocket's are not. I tried many ways, to the point where I just reconnect (by creating new Mattermost API and Websocket clients again). The bot still does not respond. I've run out of ideas.
Websocket connection (skipped error handling):
if config.BotCfg.Port == "443" {
protocol = "https"
secure = true
config.ConnectionCfg.Client = model.NewAPIv4Client(fmt.Sprintf("%s://%s:%s", protocol, config.BotCfg.Server, config.BotCfg.Port))
user,resp := config.ConnectionCfg.Client.Login(config.BotCfg.BotName, config.BotCfg.Password)
if limit.Users == nil {
ws := "ws"
if secure {
ws = "wss"
if Websocket != nil {
websocket, err := model.NewWebSocketClient4(fmt.Sprintf("%s://%s:%s", ws, config.BotCfg.Server, config.BotCfg.Port), config.ConnectionCfg.Client.AuthToken)
Listening function:
for {
select {
case <-connection.Websocket.PingTimeoutChannel:
logs.WriteToFile("Websocket ping timeout. Connecting again.")
log.Println("Websocket ping timeout. Connecting again.")
case event := <-connection.Websocket.EventChannel:
if event != nil {
if event.IsValid() && isMessage(event.Event){
// block to the go function
select {}
I expect the bot to run continuously.
If you have any suggestions how to fix this issue, I'd really appreciate that!
Edit: As Cerise suggested, I added the SIGQUIT to the exit function and ran a race detector. Fixed the data race issue by deleting one if from the case event := [...]. Race detector doesn't report any issues anymore, however the bot still stops responding after some time.
I found out that the first time PingTimeout occurs, the peer stops responding until I restart the app. The reconnection of Websocket doesn't help. However, I don't actually know how to solve this problem or does the solution even exist.

Topshelf Windows Service times out Error 7000 7009

I have a windows service programmed in vb.NET, using Topshelf as Service Host.
Once in a while the service doesn't start. On the event log, the SCM writes errors 7000 and 7009 (service did not respond in a timely fashion). I know this is a common issue, but I (think) I have tried everything with no result.
The service only relies in WMI, and has no time-consuming operations.
I read this question (Error 1053: the service did not respond to the start or control request in a timely fashion), but none of the answers worked for me.
I Tried:
Set topshelf's start timeout.
Request additional time in the first line of "OnStart" method.
Set a periodic timer wich request additional time to the SCM.
Remove TopShelf and make the service with the Visual Studio Service Template.
Move the initialization code and "OnStart" code to a new thread to return inmediately.
Build in RELEASE mode.
Set GeneratePublisherEvidence = false in the app.config file (per application).
Unchecked "Check for publisher’s certificate revocation" in the internet settings (per machine).
Deleted all Alternate Streams (in case some dll was marked as web and blocked).
Removed any "Debug code"
Increased Window's general service timeout to 120000ms.
The service doesn't try to communicate with the user's desktop in any way.
The UAC is disabled.
I believe that the code of the service itself is not the problem because:
It has been on production for over two years.
Usually the service starts fine.
There is no exception logged in the Event Log.
The "On Error" options for the service dosn't get called (since the service doesn't actually fails, just doesn't respond to the SCM)
I've commented out almost everything on it, pursuing this error! ;-)
Any help is welcome since i'm completely out of ideas, and i've been strugling with this for over 15 days...
For me the 7009 error was produced by my NET core app because I was using this construct:
var builder = new ConfigurationBuilder()
and appsettings.json file obviously couldn't be found in C:\WINDOWS\system32.. anyway, changing it to Path.Combine(AppContext.BaseDirectory, "appsettings.json") solved the issue.
More general help - for Topshelf you can add custom exception handling where I finally found some meaningfull error info, unlike event viewer:
HostFactory.Run(x => {
x.OnException(e =>
using (var fs = new StreamWriter(#"C:\log.txt"))
I've hit the 7000 and 7009 issue, which fails straight away (even though the error message says A timeout was reached (30000 milliseconds)) because of misconfiguration between TopShelf and what the service gets installed as.
The bottom line - what you pass in HostConfigurator.SetServiceName(name) needs to match exactly the SERVICE_NAME of the Windows service which gets installed.
If they don't match it'll fail straight away and you get the two event log messages.
I had this start happening to a service after Windows Creator's Edition update installed. Basically it made the whole computer slower, which is what I think triggered the problem. Even one of the Windows services had a timeout issue.
What I learned online is that the constructor for the service needs to be fast, but OnStart has more leeway with the SCM. My service had a C# wrapper and it included an InitializeComponent() that was called in the constructor. I moved that call to OnStart and the problem went away.

What is the proper way to configure UploadReadAheadSize in an Azure web service?

I have an Azure web service sitting behind Azure API Management. This means that the API Management layer uses SSL to talk to my service, along with a client cert for authentication. I am running into what seems to be a common issue with this kind of setup where POST sizes greater than 49152 result in error 413 RequestEntityTooLarge. There are a number of docs that reference the UploadReadAheadSize setting, but all of my attempts to set this value in Web.config result in internal server errors. Here is how I am setting the value:
<serverRuntime uploadReadAheadSize="1048576" />
Ideally I want to use something larger, but I am just trying to get things to work first. The moment I deploy with this setting all subsequent requests fail with internal server error. I can't find anything in my diagnostic logs to indicate why that failure is occurring.
Looking for any pointers on where/how to set this value. Thanks!
Finally figured this out. Note that ideally since I am using only cert auth I should be able to set the sslFlags to "required". I tried that but was unable to get it work work properly with Azure API Management. I kept getting 403.7 errors from IIS. For now I am leaving it set to "negotiate" and increasing the value of uploadReadAheadSize as outlined below:
public class WebRole : RoleEntryPoint
public override bool OnStart()
using (ServerManager server = new ServerManager())
string siteName = $"{RoleEnvironment.CurrentRoleInstance.Id}_Web";
Configuration config = server.GetApplicationHostConfiguration();
ConfigurationSection accessSection = config.GetSection("system.webServer/security/access", siteName);
accessSection["sslFlags"] = #"Ssl,SslNegotiateCert";
ConfigurationSection runtimeSection = config.GetSection("system.webServer/serverRuntime", siteName);
runtimeSection["uploadReadAheadSize"] = 5242880;
catch (Exception e)
return base.OnStart();

Task Persistence C#

I'm having a hard time trying to get my task to stay persistent and run indefinitely from a WCF service. I may be doing this the wrong way and am willing to take suggestions.
I have a task that starts to process any incoming requests that are dropped into a BlockingCollection. From what I understand, the GetConsumingEnumerable() method is supposed to allow me to persistently pull data as it arrives. It works with no problem by itself. I was able to process dozens of requests without a single error or flaw using a windows form to fill out the request and submit them. Once I was confident in this process I wired it up to my site via an asmx web service and used jQuery ajax calls to submit request.
The site submits request based on a url that is submitted, the Web Service downloads the html content from the url and looks for other urls within the content. It then proceeds to create a request for each url it finds and submits it to the BlockingCollection. Within the WCF service, if the application is Online (i.e. Task has started) - it pulls the request using the GetConsumingEnumerable via a Parallel.ForEach and Processes the request.
This works for the first few submissions, but then the task just stops unexpectedly. Of course, this is doing 10x more request than I could simulate in testing - but I expected it to just throttle. I believe the issue is in my method that starts the task:
public void Start()
Online = true;
Task.Factory.StartNew(() =>
tokenSource = new CancellationTokenSource();
CancellationToken token = tokenSource.Token;
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 20;
options.CancellationToken = token;
Parallel.ForEach(FixedWidthQueue.GetConsumingEnumerable(token), options, (request) =>
catch (OperationCanceledException e)
}, TaskCreationOptions.LongRunning);
I've thought about moving this into a WF4 Service and just wire it up in a Workflow and use Workflow Persistence, but am not willing to learn WF4 unless necessary. Please let me know if more information is needed.
The code you have shown is correct by itself.
However there are a few things that can go wrong:
If an exception occurs, your task stops (of course). Try adding a try-catch and log the exception.
If you start worker threads in a hosted environment (ASP.NET, WCF, SQL Server) the host can decide arbitrarily (without reason) to shut down any worker process. For example, if your ASP.NET site is inactive for some time the app is shut down. The hosts that I just mentioned are not made to have custom threads running. Probably, you will have more success using a dedicated application (.exe) or even a Windows Service.
It turns out the cause of this issue was with the WCF Binding Configuration. The task suddenly stopped becasue the WCF killed the connection due to a open timeout. The open timeout setting is the time that a request will wait for the service to open a connection before timing out. In certain situations, it reached the limit of 10 max connection and caused the incomming connections to get backed up waiting for a connection. I made sure that I closed all connections to the host after the transactions were complete - so I gave in to upping the max connections and the open timeout period. After this - it ran flawlessly.