I'm having a hard time trying to get my task to stay persistent and run indefinitely from a WCF service. I may be doing this the wrong way and am willing to take suggestions.
I have a task that starts to process any incoming requests that are dropped into a BlockingCollection. From what I understand, the GetConsumingEnumerable() method is supposed to allow me to persistently pull data as it arrives. It works with no problem by itself. I was able to process dozens of requests without a single error or flaw using a windows form to fill out the request and submit them. Once I was confident in this process I wired it up to my site via an asmx web service and used jQuery ajax calls to submit request.
The site submits request based on a url that is submitted, the Web Service downloads the html content from the url and looks for other urls within the content. It then proceeds to create a request for each url it finds and submits it to the BlockingCollection. Within the WCF service, if the application is Online (i.e. Task has started) - it pulls the request using the GetConsumingEnumerable via a Parallel.ForEach and Processes the request.
This works for the first few submissions, but then the task just stops unexpectedly. Of course, this is doing 10x more request than I could simulate in testing - but I expected it to just throttle. I believe the issue is in my method that starts the task:
public void Start()
{
Online = true;
Task.Factory.StartNew(() =>
{
tokenSource = new CancellationTokenSource();
CancellationToken token = tokenSource.Token;
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 20;
options.CancellationToken = token;
try
{
Parallel.ForEach(FixedWidthQueue.GetConsumingEnumerable(token), options, (request) =>
{
Process(request);
options.CancellationToken.ThrowIfCancellationRequested();
});
}
catch (OperationCanceledException e)
{
Console.WriteLine(e.Message);
return;
}
}, TaskCreationOptions.LongRunning);
}
I've thought about moving this into a WF4 Service and just wire it up in a Workflow and use Workflow Persistence, but am not willing to learn WF4 unless necessary. Please let me know if more information is needed.
The code you have shown is correct by itself.
However there are a few things that can go wrong:
If an exception occurs, your task stops (of course). Try adding a try-catch and log the exception.
If you start worker threads in a hosted environment (ASP.NET, WCF, SQL Server) the host can decide arbitrarily (without reason) to shut down any worker process. For example, if your ASP.NET site is inactive for some time the app is shut down. The hosts that I just mentioned are not made to have custom threads running. Probably, you will have more success using a dedicated application (.exe) or even a Windows Service.
It turns out the cause of this issue was with the WCF Binding Configuration. The task suddenly stopped becasue the WCF killed the connection due to a open timeout. The open timeout setting is the time that a request will wait for the service to open a connection before timing out. In certain situations, it reached the limit of 10 max connection and caused the incomming connections to get backed up waiting for a connection. I made sure that I closed all connections to the host after the transactions were complete - so I gave in to upping the max connections and the open timeout period. After this - it ran flawlessly.
Related
We get these Hangfire error messages about 2 or 3 times a day at random times, even at night when there is no activity happening. These messages come at different times; they seem to be independent.
Execution Worker is in the Failed state now due to an exception,
execution will be retried no more than in 00:00:04
Dispatcher is stopped due to an exception, you need to restart the
server manually. Please report it to Hangfire developers.
We have Hangfire running in a website hosted in the Azure Cloud, and we have the “Always On” setting turned on.
It is an ASP.NET 4.61 MVC website.
Here is our configuration in Global.asax.cs:
HangfireAspNet.Use(GetHangfireServers);
.
.
.
private IEnumerable GetHangfireServers()
{
Hangfire.GlobalConfiguration.Configuration
.SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
.UseSimpleAssemblyNameTypeSerializer()
.UseRecommendedSerializerSettings()
.UseNinjectActivator(_kernel)
.UseNLogLogProvider()
.UseSqlServerStorage(“SQL connection string”, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
UseRecommendedIsolationLevel = true,
DisableGlobalLocks = true
});
yield return new BackgroundJobServer();
}
We have this in the Owin Startup.cs file:
app.UseHangfireDashboard("/hangfire", new DashboardOptions
{
Authorization = Enumerable.Empty()
});
We have these packages installed:
Hangfire.AspNet version=0.2.0
Hangfire.Core version=1.7.28
Hangfire.Ninject version=1.2.0
Hangfire.SqlServer version=1.7.28
Our website (hosted in the Azure Cloud) has 5 webservers that are load-balanced, so we normally have 5 Hangfire servers running.
Does anyone have ideas on how we can solve this? Or any ideas on how we can troubleshoot this?
Thank You
Update
Here is the inner exception for the "Execution Worker" error.
Execution Worker is in the Failed state now due to an exception, execution will be retried no more than in 00:00:04 System.Data.SqlClient.SqlException Login failed for user '...'. Token is expired.
A severe error occurred on the current command. The results, if any, should be discarded.
Any ideas on how we can refresh the Token? Maybe it expires if there is no activity? So maybe we need to have a scheduled job that will refresh the token?
The "Dispatcher Error" stopped happening. I think it may be related to the "ghost servers" that #jbl mentioned below. I'm not sure, but I think we got ghost servers during the deployment process. We deploy new code to a "deployment slot", and then the deployment slot swaps into the production slot. Somewhere in the deployment process some hangfire servers were not shutdown properly and they became ghost servers (that's my guess). What seemed to solve the issue is that we restart both slots after a deployment to make sure all the hangfire servers are shutdown.
Thank You
I have a WCF application which consists in some async communications with ecternal services. When we start a new expedient, a new instance is created; it process data and send an xml to a external service and waits for the response. This response requires that a person review the xml and send the response so it usually it is delayed for a long time. For this reason, the workflow go to idle and we use persistence with AppFabric.
The fact is that sometime, when we receive the response, the next exception is raised:
The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}LoadWorkflowByInstanceKey was interrupted by an error.
Normally this error does not occur, it can occur very sporadically. However, we are trying to update the app to include a new functionality (it does not modify the workflow) but when the application is deployed to the server, the instances that were created with the old deployment and were waiting for the response, throw this exception when they receive the response from the external service. However, the instances initiated with the new deployment process the response without problem.
I have been looking for information about this problem but I haven't found much. Anybody can help me?
SOLUTION:
Thanks a lot for your answer, it may be helpful for me in the future. In this case, the problem was that I was updating an assembly version of one of the implicated project (to upload a nuget package) and for a reason that I don’t understand, the instances created with an old version raised this exception when the service with the new version had to manipulate the mentioned instances.
If I change the assembly version to upload the nuget and then set the original version and deploy with this version, everything works ok. Anybody knows what is the reason?
Thanks a lot.
This may be because there is a program running in the background and trying to extend the lock on the instance store every 30 seconds, and it seems that whenever the connection to the SQL service fails, it marks the instance store as invalid.
You can try <workflowIdle timeToUnload="0"/>, if it doesn't work you can look at the methods provided by other links.
Windows workflow 4.0 InstancePersistenceCommand Error
Why do I get exception "The execution of the InstancePersistenceCommand named LoadWorkflowByInstanceKey was interrupted by an error"
WF4 InstancePersistenceCommand interrupted
I have a windows service programmed in vb.NET, using Topshelf as Service Host.
Once in a while the service doesn't start. On the event log, the SCM writes errors 7000 and 7009 (service did not respond in a timely fashion). I know this is a common issue, but I (think) I have tried everything with no result.
The service only relies in WMI, and has no time-consuming operations.
I read this question (Error 1053: the service did not respond to the start or control request in a timely fashion), but none of the answers worked for me.
I Tried:
Set topshelf's start timeout.
Request additional time in the first line of "OnStart" method.
Set a periodic timer wich request additional time to the SCM.
Remove TopShelf and make the service with the Visual Studio Service Template.
Move the initialization code and "OnStart" code to a new thread to return inmediately.
Build in RELEASE mode.
Set GeneratePublisherEvidence = false in the app.config file (per application).
Unchecked "Check for publisher’s certificate revocation" in the internet settings (per machine).
Deleted all Alternate Streams (in case some dll was marked as web and blocked).
Removed any "Debug code"
Increased Window's general service timeout to 120000ms.
Also:
The service doesn't try to communicate with the user's desktop in any way.
The UAC is disabled.
The Service Runs on LOCAL SYSTEM ACCOUNT.
I believe that the code of the service itself is not the problem because:
It has been on production for over two years.
Usually the service starts fine.
There is no exception logged in the Event Log.
The "On Error" options for the service dosn't get called (since the service doesn't actually fails, just doesn't respond to the SCM)
I've commented out almost everything on it, pursuing this error! ;-)
Any help is welcome since i'm completely out of ideas, and i've been strugling with this for over 15 days...
For me the 7009 error was produced by my NET core app because I was using this construct:
var builder = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json");
and appsettings.json file obviously couldn't be found in C:\WINDOWS\system32.. anyway, changing it to Path.Combine(AppContext.BaseDirectory, "appsettings.json") solved the issue.
More general help - for Topshelf you can add custom exception handling where I finally found some meaningfull error info, unlike event viewer:
HostFactory.Run(x => {
...
x.OnException(e =>
{
using (var fs = new StreamWriter(#"C:\log.txt"))
{
fs.WriteLine(e.ToString());
}
});
});
I've hit the 7000 and 7009 issue, which fails straight away (even though the error message says A timeout was reached (30000 milliseconds)) because of misconfiguration between TopShelf and what the service gets installed as.
The bottom line - what you pass in HostConfigurator.SetServiceName(name) needs to match exactly the SERVICE_NAME of the Windows service which gets installed.
If they don't match it'll fail straight away and you get the two event log messages.
I had this start happening to a service after Windows Creator's Edition update installed. Basically it made the whole computer slower, which is what I think triggered the problem. Even one of the Windows services had a timeout issue.
What I learned online is that the constructor for the service needs to be fast, but OnStart has more leeway with the SCM. My service had a C# wrapper and it included an InitializeComponent() that was called in the constructor. I moved that call to OnStart and the problem went away.
When attempting to connect/communicate with my service i have to wait for almost exactly 20 seconds each time before the exception is fired. Since this all gonna be running on a local network, I would like decrease that timeout period to 5 seconds? I tried decreasing the receiveTimeout on my client, but it didn't work. I looked all over my code for a 20 second timeout variable set, but couldn't find any. What should i be changing?
There are different timeout settings http://msdn.microsoft.com/en-us/library/ms731078.aspx. They can be set for example in a config file (web.config or app.config) see http://msdn.microsoft.com/en-us/library/ms731343.aspx as an example. Under http://msdn.microsoft.com/en-us/library/ms731399.aspx you can choose the binding which you use and set the corresponding setting.
UPDATED: You probably have the timeout set on the TCP level. Try reducing the TcpMaxConnectRetransmissions (Default value 2) or TcpInitialRTT (Default value 3, on NT 4.0 the parameter has the name InitialRTT) parameters in the registry, reboot your computer and try your experiments one more time. About affect of 21 seconds you can read in http://support.microsoft.com/kb/223450, http://support.microsoft.com/kb/175523, http://support.microsoft.com/kb/170359 or http://www.boyce.us/windows/tipcontent.asp?ID=189. You can read a description of the TCP/IP default configuration values at http://support.microsoft.com/kb/314053 (for Windows XP) and http://technet.microsoft.com/en-us/library/cc739819(WS.10).aspx (for Windows Server 2003 with SP2).
What you may actually be seeing is the cold start from your webapp. The Service Not Found exception would fire back pretty quickly unelss you had hit it pretty hard and you started queueing service requests beyond what WCF was configured to do.
However, if you had your website unloaded (appdomain and worker process) it could take 20 seconds to hit to the code that builds the channel to your service. So it may be something masked.
If your website and service are in different application pools then this is maginfied because it has to cold start the website and then coldstart the service, which are done in succession instead of simultaneously.
To somewhat alleviate this you can use a keepalive/ping service. Something that just constantly hits the URL to keep the AppDomain in memory and the worker process alive (if not shared). By default IIS 6 will shutdown the worker process after 20 minutes of inactivity, so when the first request comes in, http.sys starts up a new worker process, which loads the framework, which loads your app, which starts the pipeline, which executes your code, which delivers to your user. :)
I have a client app that tries every 10 seconds to send a message over a WCF web service. This client app will be on a computer on board a ship, which we know will have spotty internet connectivity. I would like for the app to try to send data via the service, and if it can't, to queue up the messages until it can send them through the service.
In order to test this setup, I start the client app and the web service (both on my local machine), and everything works fine. I try to simulate the bad internet connection by killing the web service and restarting it. As soon as I kill the service, I start getting CommunicationObjectFaultedExceptions--which is expected. But after I restart the service, I continue to get those exceptions.
I'm pretty sure that there's something I'm not understanding about the web service paradigm, but I don't know what that is. Can anyone offer advice on whether or not this setup is feasible, and if so, how to resolve this issue (i.e. re-establish the communications channel with the web service)?
Thanks!
Klay
Client service proxies cannot be reused once they have faulted. You must dispose of the old one and recreate a new one.
You must also make sure you close the client service proxy properly. It is possible for a WCF service proxy to throw an exception on close, and if this happens the connection is not closed, so you must abort. Use the "try{Close}/catch{Abort}" pattern. Also bear in mind that the dispose method calls close (and hence can throw an exception from the dispose), so you can't just use a using like with normal disposable classes.
For example:
try
{
if (yourServiceProxy != null)
{
if (yourServiceProxy.State != CommunicationState.Faulted)
{
yourServiceProxy.Close();
}
else
{
yourServiceProxy.Abort();
}
}
}
catch (CommunicationException)
{
// Communication exceptions are normal when
// closing the connection.
yourServiceProxy.Abort();
}
catch (TimeoutException)
{
// Timeout exceptions are normal when closing
// the connection.
yourServiceProxy.Abort();
}
catch (Exception)
{
// Any other exception and you should
// abort the connection and rethrow to
// allow the exception to bubble upwards.
yourServiceProxy.Abort();
throw;
}
finally
{
// This is just to stop you from trying to
// close it again (with the null check at the start).
// This may not be necessary depending on
// your architecture.
yourServiceProxy = null;
}
There was a blog article about this, but it now appears to be offline. A archived version is available on the Wayback Machine.