I am writing a windows service that needs to be running 24/7. It is a pretty simple service that monitors a directory where files are dropped into and processes those files. I need to restart the service if an unhandled exception is thrown.
Is there a way for a service to restart itself in the event of an unhandled exception?
The Services applet has many different recovery features:
It can take different actions on the first, second, and subsequent failures:
Restart the service, after a configurable delay
Run a Program (passing command line parameters, possibly including the failure count)
Restart the Computer (after a configurable delay, and with a particular message being sent)
The program that runs should be able to look in the event log and see the reason for failure (especially if you log it), and should therefore be able to disable the service if the exception is one that is not recoverable.
And, of course, in the meantime, the service should be logging what's going on, which should enable any management tool to notify Operations of what's going on.
I agree that you should probably not configure "third and subsequent" to be "restart service", or you could wind up in a loop.
Have you tried using the Recovery tab of the Service entry - you can set rules for failures, including "Restart the Service" - by default this is on "No Action"
This is able to be done programatically if you wanted, this code was not written by me. I am posting the link to the Authors CodeProject page that contains the source / binaries. Below the link I have explained how I implemented the authors code.
http://www.codeproject.com/KB/install/sercviceinstallerext.aspx
Add a reference to the DLL.
Open ProjectInstaller.Designer.vb in notepad
In the InitializeComponent Sub
CHANGE
Me.ServiceProcessInstaller1 = New System.ServiceProcess.ServiceProcessInstaller
Me.ServiceInstaller1 = New System.ServiceProcess.ServiceInstaller
TO
Me.ServiceProcessInstaller1 = New System.ServiceProcess.ServiceProcessInstaller
Me.ServiceInstaller1 = New Verifide.ServiceUtils.ServiceInstallerEx
With the Friend Declarations in the ProjectInstaller.Designer.vb
CHANGE
Friend WithEvents ServiceProcessInstaller1 As System.ServiceProcess.ServiceProcessInstaller
Friend WithEvents ServiceInstaller1 As System.ServiceProcess.ServiceInstaller
TO
Friend WithEvents ServiceProcessInstaller1 As System.ServiceProcess.ServiceProcessInstaller
Friend WithEvents ServiceInstaller1 As Verifide.ServiceUtils.ServiceInstallerEx
CHANGE
Me.Installers.AddRange(New System.Configuration.Install.Installer() {Me.ServiceProcessInstaller1, Me.ServiceInstaller1})
TO
Me.Installers.AddRange(New System.Configuration.Install.Installer() {Me.ServiceInstaller1, Me.ServiceProcessInstaller1})
Import The Namespace On ProjectInstaller.vb
In ProjectInstaller.vb in the Public Sub New Function After Initialize component function has been called
ADD
'Set Reset Time Count - This Is 4 Days Before Count Is Reset
ServiceInstaller1.FailCountResetTime = 60 * 60 * 24 * 4
'ServiceInstaller1.FailRebootMsg = "Houston! We have a problem"
'Add Failure Actions
ServiceInstaller1.FailureActions.Add(New FailureAction(RecoverAction.Restart, 60000))
ServiceInstaller1.FailureActions.Add(New FailureAction(RecoverAction.Restart, 60000))
ServiceInstaller1.FailureActions.Add(New FailureAction(RecoverAction.None, 3000))
ServiceInstaller1.StartOnInstall = True
Build installer and install. Voila
Wrap your service code in a runner which can catch any errors and restart your service.
The best way is to wrap Try / Catch blocks around the methods in the service you can afford to let throw exceptions.
However, there may be serious exceptions thrown that should result in the service being stopped immediately. Don't ignore these! In these cases, handle the exception, log it, email it and then rethrow it. That way you will be informed that the exception has occurred and will know what went wrong. You can then fix the problem and re-start the service manually.
Just ignoring it could cause a major failure in your system which you would not know about. It could also be very expensive on CPU/RAM if the service stops then restarts then stops ad infinitum.
As suggested by "John Saunders" and "theGecko", you can monitor the service and restart it when it fails. The builtin Windows Service Recovery functionality will get you a long way, but if you find that you need some more advanced features (for example, CPU hogging and hang detection) then please check out Service Protector. It is designed to keep your important Windows Services operating 24x7.
Good luck!
Related
My main program is an ASP.Net Core Web API that has a third party library in a hosted service. The third party library is initializing fine but then it throws some errors sometime throughout its lifecycle.
It supplies a way of hooking into the object via an event and will let me know what the error is so that I can handle it but it still throws in the third party library..
Since I am handling the event myself, I want to completely ignore these errors that are occurring in this library. Is there anyway that I can do that?
I have already tried to add a global exception handler and the strange thing is, this exception handler never gets hit. The only way I can get the exception is to set my exception settings to break when CLR exceptions happen like in the picture above
This does not crash my program. For some reason, the program just hangs. When I turn off CLR exceptions in the "Break when thrown" window, then the program runs just fine. It is almost like visual studio is doing something special to handle these types of exceptions that a console version cannot do
The only way that I can seem to get a console version of this running, is attach a visual studio debugger to the process and when the exception is hit, press the green play button "Continue" in visual studio. Otherwise the application just seems to hang on the exception being thrown by the third party library.
The application will run fine as long as visual studio is attached and the CLR break exceptions are not checked
Does anyone know how to make sure that these types of exceptions do not hang the program when released?
Additional Info:
The third party library is a .NET Framework 4 library
The Asp.Net project is targetting "net5.0-windows"
The 3rd party class is probably using multi-threading
if it helps, this is how I am creating the third party class
Handling NullReferenceException in release code(Official advice)
It's usually better to avoid a NullReferenceException than to handle it after it occurs. Handling an exception can make your code harder to maintain and understand, and can sometimes introduce other bugs. A NullReferenceException is often a non-recoverable error. In these cases, letting the exception stop the app might be the best alternative.
However, there are many situations where handling the error can be useful:
1.Your app can ignore objects that are null. For example, if your app retrieves and processes records in a database, you might be able to ignore some number of bad records that result in null objects. Recording the bad data in a log file or in the application UI might be all you have to do.
2.You can recover from the exception. For example, a call to a web service that returns a reference type might return null if the connection is lost or the connection times out. You can attempt to reestablish the connection and try the call again.
3.You can restore the state of your app to a valid state. For example, you might be performing a multi-step task that requires you to save information to a data store before you call a method that throws a NullReferenceException. If the uninitialized object would corrupt the data record, you can remove the previous data before you close the app.
4.You want to report the exception. For example, if the error was caused by a mistake from the user of your app, you can generate a message to help them supply the correct information. You can also log information about the error to help you fix the problem. Some frameworks, like ASP.NET, have a high-level exception handler that captures all errors to that the app never crashes; in that case, logging the exception might be the only way you can know that it occurs.
So after days of research I've finally found an event to hook into to give you error messages from ANY source no matter how many level deep you go in threads.
AppDomain.CurrentDomain.FirstChanceException += CurrentDomain_FirstChanceException;
Hooking into this event it will allow you to see errors from every library and every thread. Simply place the above into you program.cs (or whatever startup file you have) and magically you will be flooded with all of the unknown errors from all of the 3rd party libraries you thought were once flawless.
private static void CurrentDomain_FirstChanceException(object sender, System.Runtime.ExceptionServices.FirstChanceExceptionEventArgs e)
{
Console.WriteLine(e.Exception.Message, e.Exception.StackTrace);
}
I've done so with the following method and low and behold. The third party library was trying to reference another project in an unsafe way and throwing an error. Since I didn't need this other project reference the built exe did not have a reference to this assembly because I had no direct reference to it in the project (darn smarty pants who need to optimize everything). I was able to run correctly because in my visual studio solution, I had a reference to this other project. So the third party library would pick up on it as soon as visual studio connected with the debugger through some sort of dark magic.
Anyways, I made a throw away object that used the project that was required and the issue was solved.
I really hope that this helps someone else and saves them the days it took me to find this.
I am new to vb.net and have a project that I have made my first windows service. Now I have a function that retrieves a count of transactions. I would like to call that function and put the results in a text file. I can hard code a stream to put into the text file, but whenever I call the function, the services just crashes. Not errors just dies. What am I doing wrong?
I have tried coding the function inside of the service-nope
I coded the function in a separate class-Nope! dies when I call it
Private Sub BrowserMailSender(obj As Object, e As EventArgs)
Try
FileIO.WriteToFile("service is started:" + Now + vbNewLine)
My_Count() 'service dies here
FileIO.WriteToFile("end" + vbNewLine)
Catch ex As Exception
MsgBox(ex)
End Try
the function works if I call from the main project but I would like the service to run and save the data behind the scenes.
The call to MsgBox is at the root of the problem. A Windows Service runs in a context where it does not have the ability to present a User Interface to the user. You'll have to find another way to communicate errors, such as the Event Log or a log file.
Prior to Vista, the line between services and the user was permeable, partly because the OS wasn't yet designed to keep them isolated, and partly because most users ran with full administrative privileges all the time. From Vista forward, you have to work "in the dark".
There are ways present a UI to the user, and one of the answers here briefly mentions one of them. However, I would caution you against trying to present a UI at all. The main principle of a service is that it sits in the background and does things without requiring the user to interact with it. Presenting a UI for events that the user is not aware are happening at that moment is an asymmetrical relationship. It could block your service indefinitely when a user isn't expecting to have to interact with it to allow it to continue.
I am trying to get a Visual Basic service to run.
Installing and uninstalling the service works perfectly, however, when trying to start the service via the task manager, the following error message is displayed:
Unable to start service
The operation could not be completed.
The service did not respond to the start or control request in a timely fashion.
In the Event Viewer, the following error messages are logged in regard to this:
A timeout was reached (30000 milliseconds) while waiting for the ABC service to connect.
The ABC service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
Trying to start the service via cmd using the following command:
net start "ABC"
...results in the following error message:
The service is not responding to the control function.
More help is available by typing NET HELPMSG 2186.
And unhelpfully, typing "NET HELPMSG 2186" only repeats the first half of the error message:
The service is not responding to the control function.
I've had a look at the source code of the service, but I'm not quite familiar with its architecture. However, I could identify a few functions that I feel may be relevant for the service, namely:
OnStart(String())
OnStop()
New()
From what I've gathered out of related threads thus far, the error message could possibly mean that the service does not have the proper functions to be addressed by the service control functions. Could this be the case here?
If not, what approach would you suggest for debugging this?
Additional Info (21-Jul-2018):
Here's what the OnStart method looks like:
Protected Overrides Sub OnStart(ByVal args() As String)
Dim log As ILogger = CommonObjectFactory.instance.buildLogger(LogName.ABCDSys).open("OnStart")
Dim oCallback As Threading.TimerCallback = Nothing
Try
Dim numTimeDuration As Integer = 30000
Try
'numTimeDuration = Convert.ToInt32(ABCDA.ABCProperties.Instance.GetValue("DTimer", "DSys", "5000"))
Dim config As IDServiceConfiguration = CommonObjectFactory.instance.buildConfigurationFactory().buildDSConfiguration()
numTimeDuration = config.DTimer
Catch ex As Exception
log.error(ex)
End Try
log.info("Setting up a " & numTimeDuration / 1000 & " second timer")
oCallback = New Threading.TimerCallback(AddressOf TimerEvent)
_timer = New System.Threading.Timer(oCallback, Nothing, numTimeDuration, numTimeDuration)
Catch ex As Exception
log.error(ex)
End Try
log.close()
log = Nothing
End Sub
As far as I know, this service should check documents every 30 seconds, and from what I can see in the code that'S what should happen.
However, when trying to start the service, it crashes immediately. Not after 30 seconds, but right away, even though the error message in the Event Log says "A timeout was reached (30000 milliseconds)". In fact, I had problems with this service before where it crashed after 30 seconds, but in all these cases it wrote a helpful error message in one of the logs. Now, not even the logs are created.
I wonder if something is wrong with the logging. That would be this line, right?
Dim log As ILogger = CommonObjectFactory.instance.buildLogger(LogName.ABCDSys).open("OnStart")
What would be the best way for me to debug this? Due to the company workflow, I can't test this on my development machine but have to run the entire code through TeamCity first before testing it on another machine, so I'm not sure how to handle debugging. Given hat the service crashes immediately, I don't even have time to attach a debugger to it. Is there a way to start it attached to a debugger?
You need to share the code, particularly in the OnStart() method. My guess is that this code is blocking or long-running. You have only 30 seconds to start or stop a service before Windows thinks a problem happened. The typical approach is to start a thread in OnStart and do the real business logic there.
I'm working on a windows forms application (.NET 4.0).
My form contains a 'Fast Line' chart using the Microsoft chart control included in VS2010.
The chart gets filled with about 20,000 datapoints.
My application then starts receiving market data from a server via DDE (Dynamic Data Exchange) in real-time and adds it the chart.
Note: I have no control over the server and so I have to deal with DDE only even though it's an outdated technology. VS doesn't support DDE anymore and so I use the Ndde library which works like a charm.
First we connect to the server, create an advise loop, and then subscribe to the OnAdvise event to receive notifications of new data:
Dim client As DdeClient = New DdeClient("ServerApplication", "Bid")
Private Sub StartDDE()
client.Connect()
client.StartAdvise("EURUSD", 1, True, 60000)
AddHandler client.Advise, AddressOf OnAdvise
End Sub
Now we can put the commands to update the chart inside the event:
Private Sub OnAdvise(ByVal sender As Object, ByVal args As DdeAdviseEventArgs)
Dim myPrice As Double = args.Text
Chart1.Series("Bid").Points.AddY(myPrice)
End Sub
You get the idea.
THE PROBLEM:
This works fine for a few seconds until the chart crashes throwing the exception:
"Collection was modified; enumeration operation may not execute."
I spent a lot of time researching what may be the cause of this in my particular case, and I've come to the conclusion that it's because the chart is receiving data quicker than it can handle. It's already loaded with a lot of data and needs a certain time (less than a second) to add the received data in a new DataPoint and invalidate (refresh) itself. Whereas the server often sends data values very quickly (for example 5ms in between). So I've tried the following:
System.Threading.Thread.Sleep(800)
Chart1.Series("Bid").Points.AddY(myPrice)
thus pausing the application to give time to the chart to finish its work before adding a new point, and guess what? The application now works for minutes before throwing the exception. (altering the value in Sleep() doesn't help any further however)
The only help I could find online is an old post of someone mentioning that you should put incoming data on a cache queue, with one new data value released from the cash at a time (every time the chart finishes working).
My question is how would you do this?
Other suggestions are welcome!
This is most likely an issue caused by attempting to modify a UI element from a thread other than the UI thread.
The way you have it coded now the DdeClient.Advise event handler is being executed on a worker thread managed by the library. See, DDE sucks and because it sucks it has these requirements that it has to run on a thread with a message pump.1 To make the library compatible with other types of applications besides windows forms I coded it in such a manner that it will create a dedicated thread with a message loop and marshal all of the operations onto that thread by default.
But, you can override this behavior by specifying an ISynchronizeInvoke instance manually in the DdeClient constructor. The library will then use whatever thread is hosting the ISynchronizeInvoke instance for all of its DDE operations. All Form and Control instances implement ISynchronizeInvoke so it is easy enough to tell the library to use the main UI thread.
Dim client As DdeClient = New DdeClient("ServerApplication", "Bid", yourForm)
If you tell the library to use your Form instance then the Advise event handlers will be executed on the same thread hosting that Form; the UI thread.
By the way, I realize that you have no control over the server, but I would at least begin talking with the vendor of the software to use more modern (not 20 years old) mechanisms for doing interprocess communications.
1It also has the unfortunate requirement of thread affinity which made dealing with the garbage collector a real pain.
Get real ;) DDE is slow, graphics is slow. Do not do them in the same thread.
Try that:
Create a second thread that handles DDE, queues the items.
The chart thread then pulls the updates and draws them.
Now, here comes the point:
ONLY the ui thread is allowed to modify the chart control. Yes, sucks. No, not negotiable. - old UI rule since the dawn of time.
Threads needs locking ;)
I have a Windows service built upon ATL 7's CAtlServiceModuleT class. This service serves up COM objects that are used by various applications on the system, and these other applications naturally start getting errors if the service is stopped while they are still running.
I know that ATL DLLs solve this problem by returning S_OK in DllCanUnloadNow() if CComModule's GetLockCount() returns 0. That is, it checks to make sure no one is currently using any COM objects served up by the DLL. I want equivalent functionality in the service.
Here is what I've done in my override of CAtlServiceModuleT::OnStop():
void CMyServiceModule::OnStop()
{
if( GetLockCount() != 0 ) {
return;
}
BaseClass::OnStop();
}
Now, when the user attempts to Stop the service from the Services panel, they are presented with an error message:
Windows could not stop the XYZ service on Local Computer.
The service did not return an error. This could be an internal Windows error or an internal service error.
If the problem persists, contact your system administrator.
The Stop request is indeed refused, but it appears to put the service in a bad state. A second Stop request results in this error message:
Windows could not stop the XYZ service on Local Computer.
Error 1061: The service cannot accept control messages at this time.
Interestingly, the service does actually stop this time (although I'd rather it not, since there are still outstanding COM references).
I have two questions:
Is it considered bad practice for a service to refuse to stop when asked?
Is there a polite way to signify that the Stop request is being refused; one that doesn't put the Service into a bad state?
You can't do this. Once the SCM sends a SERVICE_CONTROL_STOP to your service, you have to stop.
If your other apps are also services, you can make them dependencies within the SCM. Of course, if the apps using this service are just regular applications that can't be used.
When ATL's lock count increments to 1, call SetServiceStatus() with the SERVICE_ACCEPT_STOP flag omitted in the SERVICE_STATUS::dwControlsAccepted field. Then you will not receive any SERVICE_CONTROL_STOP requests at all. Any attempt to stop the service will fail immediately. When ATL's lock count falls back to 0, call SetServiceStatus() again with the SERVICE_ACCEPT_STOP flag specified.
I just had to do this in 2 (older) ATL-based services, and it works well. Granted, I was unable to figure out the best way to override Lock() and Unlock() directly, so I just put a small loop inside my service that checks GetLockCount() at frequent intervals and calls SetServiceStatus() when needed.
In your constructor, update m_status.dwControlsAccepted removing SERVICE_ACCEPT_STOP. For instance:
CMyServiceModule::CMyServiceModule()
: ATL::CAtlServiceModuleT<CMyServiceModule, IDS_SERVICENAME>()
{
m_status.dw &= ~SERVICE_ACCEPT_STOP
}