TASK_RUNNING process state - process

I am reading "Linux Kernel Development". It has definition about "TASK_RUNNING"
"TASK_RUNNING—The process is runnable; it is either currently running or on a runqueue waiting to run". My question is why don't we have two separate state for "currently running" and "on a runqueue waiting to run". Like TASK__RUNNING and TASK_READYTORUN.
1) because when I first look at word "TASK_RUNNING", I thought it just refers to a running process state
2) more exact definition would avoid many troubles
So do we have strong reasons not to do this ??

The process is runnable.
Futher state separation has no sence from the view of users of this field.
Because users of that state are not interact with a scheduler, knowing whether a process is scheduled or not is useless: immediately after you obtain that knowledge, the scheduler may change that property, so your knowledge becomes invalid.
As for name TASK_RUNNING, only Linux developers knows why it is choosen. It could be historic reason, or intentionally: "Think of the process as if it is running."

Related

Is it possible to request more time to a running job in SLURM?

I know it's possible on a queued job to change directives via scontrol, for example
scontrol update jobid=111111 TimeLimit=08:00:00
This only works in some cases, depending on the administrative configuration of the slurm instance (I'm not an admin). Thus this post does not answer my question.
What I'm looking for is a way to ask SLURM to add more time to a running job, if resources are available, and even if it's already running. Sort of like a nested job request.
Particularly a running job that was initiated with srun on-the-fly.
In https://slurm.schedmd.com/scontrol.html, it is clearly written under TimeLimit:
Only the Slurm administrator or root can increase job's TimeLimit.
So I fear what you want is not possible.
An it makes sense, since the scheduler looks at job time to decide which jobs to launch and some short jobs can benefit from back-filling to start before longer jobs, it would be really a mess if users where allowed to change the job length while running. Indeed, how to define "when resource are available"? Some node can be IDLE for some time because slurm knows that it will need it soon for a large job

Distributed workers that ensure a single instance of a task is running

I need to design a distributed system a scheduler sends tasks to workers in multiple nodes. Each task is assigned an id, and it could be executed more than once, scheduled by the scheduler (usually once per hour).
My only requirement is that a task with a specific id should not be executed twice at the same time by the cluster. I can think of a design where the scheduler holds a lock for each task id and sends the task to an appropriate worker. Once the worker has finished the lock should be released and the scheduler might schedule it again.
What should my design include to ensure this. I'm concerned about cases where a task is sent to a worker which starts the task but then fails to inform the scheduler about it.
What would be the best practice in this scenario to ensure that only a single instance of a job is always executed at a time?
You could use a solution that implements a consensus protocol. Say - for example - that all your nodes in the cluster can communicate using the Raft protocol. As such, whenever a node X would want to start working on a task Y it would attempt to commit a message X starts working on Y. Once such messages are committed to the log, all the nodes will see all the messages in the log in the same order.
When node X finishes or aborts the task it would attempt to commit X no longer works on Y so that another node can start/continue working on it.
It could happen that two nodes (X and Z) may try to commit their start messages concurrently, and the log would then look something like this:
...
N-1: ...
N+0: "X starts working on Y"
...
N+k: "Z starts working on Y"
...
But since there is no X no longer works on Y message between the N+0 and N+k entry, every node (including Z) would know that Z must not start the work on Y.
The only remaining problem would be if node X got partitioned from the cluster before it can attempt to commit its X no longer works on Y for which I believe there is no perfect solution.
A work-around could be that X would try to periodically commit a message X still works on Y at time T and if no such message was committed to the log for some threshold duration, the cluster would assume that no one is working on that task anymore.
With this work-around however, you'd be allowing the possibility that two or more nodes will work on the same task (the partitioned node X and some new node that picks up the task after the timeout).
After some thorough search, I came to the conclusion that this problem can be solved through a method called fencing.
In essence, when you suspect that a node (worker) failed, the only way to ensure that it will not corrupt the rest of the system is to provide a fence that will stop the node from accessing the shared resource you need to protect. That must be a radical method like resetting the machine that runs the failed process or setup a firewall rule that will prevent the process from accessing the shared resource. Once the fence is in place, then you can safely break the lock that was being held by the failed process and start a new process.
Another possibility is to use a relational database to store task metadata + proper isolation level (can't go wrong with serializable if performance is not your #1 priority).
SERIALIZABLE
This isolation level specifies that all transactions occur in a completely isolated fashion; i.e., as if all transactions in the system had executed serially, one after the other. The DBMS may execute two or more transactions at the same time only if the illusion of serial execution can be maintained.
Use either optimistic or pessimistic locking should work too. https://learning-notes.mistermicheels.com/data/sql/optimistic-pessimistic-locking-sql/
In case you need a rerun of the task, simply update the metadata. (or I would recommend to create a new task with different metadata to keep track of its execution history)

Spring Batch restart crashed jobs

Hi spring batch users,
regarding the documentation http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320
"If the process died ("kill -9" or server failure) the job is, of course, not running, but the JobRepository has no way of knowing because no-one told it before the process died."
I try to find and restart the stale job executions by using
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
...
jobExecution.setStatus(FAILED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
jobOperator.restart(jobExecution.getId());
But this seems to be very inconvenient.
1) I have to do this before other (new) jobs could be started.
2) I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick.
You can find other questions regarding this topic:
https://jira.spring.io/browse/BATCH-2433?jql=project%20%3D%20BATCH%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
Spring Batch after JVM crash
I would love to see a solution to register a "start up clean jobs listener". This will still not fix the problems originated by the multi server environment because spring batch does not know if the JobExecution marked by STARTED is not running on an other instance.
Thanks for any advice
Alex
Your job cannot and should not recover "automatically" from a kill -9 scenario. A kill -9 is treated very differently than you application throwing a caught Exception. The reason for this is that you've effectively pulled the carpet out from under the application without giving it a chance to reach a synchronization point with the database to commit any necessary information to the ExecutionContext or update the job/step status(es). Therefore, the last status touchpoint with the database will remain and the job will still look STARTED.
"OK, fine" you say, "but if I start another execution, I want it to find that STARTED execution, and pick up where it left off." The problem here is that there is no clean way for the application to distinguish a job that is ACTUALLY RUNNING from one that has failed but couldn't up the database. The framework here correctly errs on the side of caution and prevents you from starting a job that already appears running, and this is a GOOD thing.
Why? Because let's assume your job was actually still running and you restarted by accident. As coded, the framework will start to spin up, see your running execution and fail with the following message A job execution for this job is already running. I can't tell you how many times we've been saved by this because someone accidentally launched a job twice!
If you were to implement the listener you suggest, the 2nd execution would instead be allowed to start and you'd have 2 different JVMs repeating the same work, possibly writing to the same files/tables and causing a huge data mess that could be impossible to clean up.
Trust me, in the event the Linux terminal kills your job or your job dies because the connection to the database has been severed, you WANT human eyes on those execution states before you attempt a restart.
Finally, on the off chance you actually wanted to kill you job, you can leverage several other standard patterns for stopping jobs:
Stop via throw Exception
Stop via JobOperator.stop()

process states - new state & ready state

As OS concepts book illustrate this section "Process States":
Process has defined states: new, ready, running, waiting and terminated.
I have conflict between new and ready states, I know that in ready state the process is allocated in memory and all resources needed at creation time is allocated but it is only waiting for CPU time (scheduling).
But what is the new state? what is the previous stage before allocating it in memory?
All the tasks that the OS has to perform cannot be allocated memory immediately after the task is submitted to the OS. So they have to remain in the new state. The decision as to when they move to the ready state is taken by the Long term scheduler. More info about long term scheduler here http://en.wikipedia.org/wiki/Scheduling_(computing)#Long-term_scheduling
To be more precise,the new state is for those processes which are just being created.These haven't been created fully and are in it's growing stage.
Whereas,the ready state means that the process created which is stored in PCB(Process Control Block) has got all the resources which it required for execution,but CPU is not running that process' instructions,
I am giving you a simple example :-
Say, you are having 2 processes.Process A is syncing your data over cloud storage and Process B is printing other data.
So,in case process B is getting created to be stored in PCB,the other
process,Process A has been already created and is not getting the
chance to run because CPU hasn't called these instructions of Process
A.But,Process B requires printer to be found and other drivers to be
checked.It must also check for verification of pages to be printed!
So,here Process A has been created and is waiting for
CPU-time---hence,in ready state. Whereas,Process B is waiting for
printer to be initialised and files to be examined to be
printed--->Hence,in new state(That means these processes haven't been
successfully added into PCB).
One more thing to guide you isFor each process there is a Process Control Block, PCB, which stores the process-specific information.
I hope it clears your doubt.Feel free to comment whatever you don't understand...

How to get notified when a process terminates in Windows and Linux?

I want to write a program, that should be notified by O.S. whenever any running process on that OS dies.
I don't want to myself poll and compare everytime if a previously existing process has died. I want my program to be alerted by OS whenever a process termination happens.
How do I go about it? Some sample code would be very helpful.
PS: Looking for approaches in Java/C++.
Sounds like you want PsSetCreateProcessNotifyRoutine(). See this article to get started:
http://www.codeproject.com/KB/threads/procmon.aspx
Under Unix, you could use the sigchld signal to get notified of the death of the process. This requires, however, that the process being monitored is a child process of the monitoring process.
Under Windows, you might need to have a valid handle to the process. If you spawn the process yourself using CreateProcess, you get the handle for free, otherwise you must acquire by other means. It might then be possible to wait for the process to terminate by calling WaitForSingleObject on the handle.
Sorry, I don't have any example code for this. I am not even sure, that waiting on the process handle under Windows really awaits termination of the process (as opposed to some other "significant" condition, which causes the process handle to enter "signalled" state or something).
I don't have a code sample ready but one idea – on Linux – might be to find out the ID of the process you'd like to watch when first starting your watcher program (e.g. using $ pgrep) and then using inotify to watch /proc/<PID>/ – which gets deleted when the process dies. In contrast to polling, this doesn't cost any significant CPU resources.
Now, procfs is not completely supported by inotify, so I can't guarantee this approach would actually work but it is certainly worth looking into.