Regarding the worker pool, what happens when the computer restarts? Does the worker continue where it left off?
Found the answer. Actually workers terminate immediately upon navigating away from the page. This must protect from run away processes taking over the computer.
As a result, as soon as the browser closes on a computer restart, the worker will terminate.
And it will not continue where it left off.
Tip: One must program atomically as to expect if computation can stop at any moment in time.
Related
Is there a way to listen or wait for a specific time (e.g. 11:30 am) every day. The only way I know how is to set a timer that checks for the current time every 60 seconds which I have actually implemented using a backgroundworker. But is there a way to just wait and listen for the specified time (similar to monitoring for directory changes) and then take some action?
Thanks in advance.
Typically, rather than having a program resident in memory waiting, you would setup a Scheduled Task for this (or a cron job on linux). The scheduled task will run the program at the appropriate time. The program can still check (validate) the expected time if needed, but it shouldn't just always sit in the background using up resources if it's only going to run once per day.
The scheduled task is also better because it will recover automatically from computer reboots, crashes, etc. If something happens that interrupts your program's normal running, the scheduled task will still be able to run.
This is especially important in the .Net world, because .Net requires you to be very careful writing long-lived programs to avoid address space fragmentation. The .Net garbage collector is good at freeing up and returning old memory to the operating system, but over time your program's virtual address space can become fragmented and eventually you will not be able to allocate new memory any longer.
Even if this is part of a larger program, where there are also other things happening based on user interactions, it's still a good idea to split this off into a separate process.
I'm reading Google's Site Reliability Engineering book and ran across the word crashloop which I've never heard before and have not been able to locate a definition
"If a task tries to use more resources than it requested, Borg kills the task and restarts it (as a slowly crashlooping task is usually preferable to a task that hasn’t been restar‐ ted at all)."
What is a crashloop and how does it compare to an infinite loop if at all?
A crashloop is when a process crashes and is restarted by a watchdog daemon, indefinitely.
That is, the history is:
Process starts at time T.
Process crashes at time T+1.
Watchdog daemon restarts process.
Process started at time T+2.
Process crashes at time T+3.
Watchdog daemon restarts process.
Process starts...etc.
Here, the watchdog deamon is Borg, and the process is encapsulated into a task.
In general, in distributed computing if you want something to eventually succeed, you have to write down your intent for it to be completed and you need a worker to loop continually to act on this intent. This is "at least once delivery" of a work item.
Here, the intent is that the task runs (written down into Borg), and Borg itself is running the loop that is constantly trying to make sure the task runs. This is why when a task crashes, it is restarted. When a task crashes repeatedly, together you end up with a crashloop.
The title is a bit misleading, so let me explain further.
I have a non thread-safe dll I have no choice but to use as part of my back end
servers. I can't use it directly in my servers as the thread issues it has causes
it to crash. So, I created an akka.net cluster of N nodes each which hosts a single
actor. All of my API calls that were originally to that bad dll are now routed through
messages to these nodes through a round-robin group. As each node only has a single, single
threaded actor, I get safe access, but as I have N of them running I get parallelism, of a sort.
In production, I have things configured with auto-down = false and default timings on heartbeats
and so on. This works perfectly. I can fire up new nodes as needed, they get added to the group,
I can remove them with Cluster.Leave and that is happy as well.
My issue is with debugging. In our development environment we keep a cluster of 20 nodes each
exposing a single actor as described above that wraps this dll. We also have a set of nodes that act as
seed nodes and do nothing else.
When our application is run it joins the cluster. This allows it to direct requests through the round-robin
router to the nodes we keep up in our cluster. When doing development and testing and debugging the app, if I configure things to use auto-down = false
we end up with problems whenever a test run crashes or we stop the application with out going through
proper cluster leaving logic. Such as when we terminate the app with the stop button in the debugger.
With out auto-down, this leaves us with a missing member of the cluster that causes the leader to disallow
additions to the cluster. This means that the next time I run the app to debug, I cant join the cluster and am
stuck.
It seems that I have to have auto-down set to get debugging to work. If it is set, then when I crash my app
the node is removed from the cluster 5 seconds later. When I next fire up my
app, the cluster is back in a happy state and I can join just fine.
The problem with this is that if I am debugging the application and pause it for any amount of time, it is almost immediately
seen as unreachable and then 5 seconds later is thrown out of the cluster. Basically, I can't debug with these settings.
So, I set failure-detector.acceptable-heartbeat-pause = 600s to give me more time to pause the app
while debugging. I will get shutdown in 10 min, but I don't often sit in the debugger for that long, so its an acceptable
trade-off. The issue with this is of course that when I crash the app, or stop it in the debugger, the cluster thinks it
exists for the next 10 minutes. No one tries to talk to these nodes directly, so in theory that isn't a huge issue, but I keep
running into cases where the test I just ran got itself elected as role leader. So the role leader is now dead, but the cluster
doesn't know it yet. This seems to prevent me from joining anything new to the cluster until my 10 min are up. When I try to leave
the cluster nicely, my dead node gets stuck at the exiting state and doesn't get removed for 10 minutes. And I don't always get
notified of the removal either, forcing me to set a timeout on leaving that will cause it to give up.
There doesn't seem to be any way to say "never let me be the leader". When I have run the app with no role set for the cluster
it seems to often get itself elected as the cluster leader causing the same problem
as when the role leader is dead but unknown to be so, but at a larger level.
So, I don't really see any way around this, but maybe someone has some tricks to pull this off. I want to be able to debug
my cluster member without it being thrown out of the cluster, but I also don't want the cluster to think that leader nodes
are around when they aren't, preventing me from rejoining during my next attempt.
Any ideas?
I'm trying to keep an eye on how long an application runs. To do this, I capture every process's ID as it starts, and when that process is shut down, I log the time. However, Google's Chrome starts and stops like 6 processes when you start it up and shut it down, meaning each execution of Chrome gets logged multiple times.
Is there a better way to track the execution of an application than by process ID? Or is there, perhaps, a technique for getting around this particular problem? I'd considered not adding a process ID if a process with the same ID was added within a second or so, but that seems exploitable.
Any ideas?
I am not 100% but I would assume that one process in Chrome must be the parent. try eliminating processes from your list if their parent (PPID) is the same (and not init = PID 1)
I ended up just checking if I was adding a duplicate. Not very efficient, but easy and effective. It will serve for now.
Can anyone confirm that this statement "Select to recycle worker processes after a specific period of inactivity" in this Microsoft help file is wrong and should in fact not have the "of inactivity" at the end of it?
Yes, that seems wrong. As far as I'm aware, this option just recycles the processes regardless of whether they are idle or not.
This article seems to confirm that too.
The statement you quoted is correct. It's a way to allow you to free up resources that aren't being used.
When it is necessary to conserve
system resources by terminating unused
worker processes, you can configure a
worker process to gracefully close
after a specified period of time. You
can use this feature to better manage
the resources when the processing load
is heavy, when identified applications
consistently fall into an idle state,
or when new processing space is not
available. You can also start
additional worker processes to replace
a worker process that is finished.
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/83b35271-c93c-49f4-b923-7fdca6fae1cf.mspx?mfr=true