I've an application that sends redid jobs output to front-end via socket. There are some special type nodes those are special & needs to wait to finish their jobs.So I'm using infinite waiting to finish those jobs to complete. Something like below.
While True:
If job.status() == "finished" :
Break
I want to if there's any way to free the resource & re run the jobs from their previous state.
Tried a solution to stay awake all the time. I want a solution where the resources are not bound to one jobs , the system can utilize their resources to other jobs
Well what you can do is,
To return if the job is special. And save the states of jobs in Redis Environment.
If u have a back end in your application, you can always check if the special jobs are finished running.
Related
I know it's possible on a queued job to change directives via scontrol, for example
scontrol update jobid=111111 TimeLimit=08:00:00
This only works in some cases, depending on the administrative configuration of the slurm instance (I'm not an admin). Thus this post does not answer my question.
What I'm looking for is a way to ask SLURM to add more time to a running job, if resources are available, and even if it's already running. Sort of like a nested job request.
Particularly a running job that was initiated with srun on-the-fly.
In https://slurm.schedmd.com/scontrol.html, it is clearly written under TimeLimit:
Only the Slurm administrator or root can increase job's TimeLimit.
So I fear what you want is not possible.
An it makes sense, since the scheduler looks at job time to decide which jobs to launch and some short jobs can benefit from back-filling to start before longer jobs, it would be really a mess if users where allowed to change the job length while running. Indeed, how to define "when resource are available"? Some node can be IDLE for some time because slurm knows that it will need it soon for a large job
Can someone of you help me, how to make the following service selected in the image get into wait mode after starting the server.
Please let me know if developer trace is required to be posted for resolving this issue.
that particular process is a BATCH process, a process that runs scheduled background tasks (maintained by transaction SM36/SM37). If the process is busy right after starting the server, that means there were scheduled tasks with status released waiting for execution, and as soon as the server was up, it started those tasks.
If you want to make sure the system doesn't immediately start released background tasks, you'll have to set the status back to scheduled (which, thanks to a bit of weird translation, means they won't be executed because they are not released).
if you want to start the server without having a chance to first change the job status in SM37, you would either have to reset the status on database level (likely not officially supported by SAP) or first start the server without any BATCH processes (which would give you a number of great big warning messages upon login) and change the job status before then restarting the server with the BATCH processes. You can set the number of processes for each type in the profile of your instance (parameter rdisp/wp_no_btc).
I am trying out RabbitMQ with springboot. I have a main process and within that process I am creating many number of small tasks that can be processed from other workers. From the main process perspective, I like to know when all of these tasks are completed so that it can move to next step. I did not find a easy way to query rabbitmq if the tasks are complete.
One solution I can think of is to store these tasks in a database and when each message is completed, update the database with COMPLETE status. Once all jobs are in COMPLETE status, the main process can know the jobs are COMPLETE and it can move to next step o fits process.
Another solution I can think of is that the main process maintain the list of jobs that is being sent to other workers. Once each worker completes it's job, it can send a message to the main process indicating the job is complete. Then the Main process can mark the job is complete and remove the item from the list.Once the list is empty, the main process will know the jobs are complete and it can move to next step of it's work.
I am looking to learn best practice on how other people have dealt this kind of situation. I appreciate for any suggestion.
Thank you!
There is no way to query RabbitMQ for this information.
The best way to approach this is with the use of a process manager.
The basic idea is to have your individual steps send a message back to a central process that keeps track of which steps are done. When that main process receives notice that all of the steps are done, it lets the system move on to the next thing.
The details of this approach are fairly complex, but I do have a blog post that covers the core of a process manager from a JavaScript/NodeJS perspective.
You should be able to find something like a "process manager" or "saga" as they are sometimes called, within your language and RabbitMQ framework of choice. If not, you should be able to write one for your process without too much trouble, as described in my blog post.
I have some recurring jobs run frequently or last for a while.
It seems that Scheduler().get_jobs() will only return the list of scheduled jobs that are not currently running, so I cannot determine if a job with certain id do not exists or is actually running.
How may I test if a job is running or not in this situation?
(I set up those jobs not the usual way, (because I need them to run in a random interval, not fixed interval), they are jobs that execute only once, but will add a job with the same id by the end of their execution, and they will stop doing so when reaching a certain threshold.)
APScheduler does not filter the list of jobs for get_jobs() in any way. If you need random scheduling, why not implement that in a custom trigger instead of constantly readding the job?
Each Job in my system belongs to a specific userid and can be put in rabbitmq from multiple sources. My requirements:
No more than 1 job should be running per user at any given time.
Jobs for other users should not experience any delay because of job piling up for a specific user.
Each Job should be executed at least once. Each Job will have a max retries count and is re-inserted in queue (or probably delayed) with a delay if fails.
Maintaining Sequence of Jobs (per user) is desirable but not compulsory.
Jobs should probably be persisted, as I need them executing atleast once. There is no expiry time of jobs.
Any of the workers should be able to run jobs for any of the user.
With these requirements, I think maintaining a single queue for each individual user makes sense. I would also need all the workers watching all user queues and execute job for user, whose job is currently not running anywhere (ie, no more than 1 job per user)
Would this solution work using RabbitMQ in a cluster setup? Since the number of queues would be large, I am not sure each worker watching every user queue would cause significant overhead or not. Any help is appreciated.
As #dectarin has mentioned, having multiple workers listen to multiple job queues will make it hard to ensure that only one job per user is being executed.
I think it'd work better if the jobs go through a couple steps.
User submits job
Job gets queued per user until no other jobs are running
Coordinator puts job on the active job queue that is consumed by the workers
Worker picks up the job and executes it
Worker posts the results in a result queue
The results get sent to the user
I don't know how the jobs get submitted to the system, so it's hard to tell if actual per-user MessageQueues would be the best way to queue the waiting. If the jobs already sit in a mailbox, that might work as well, for instance. Or store the queued jobs in a database, as a bonus that'd allow you to write a little front end for users to inspect and manage their queued jobs.
Depending on what you choose, you can then find an elegant way to coordinate the single job per user constraint.
For instance, if the jobs sit in a database, the database keeps things synchronised and multiple coordinator workers could go through the following loop:
while( true ) {
if incoming job present for any user {
pick up first job from queue
put job in database, marking it active if no other active job is present
if job was marked active {
put job on active job queue
}
}
if result is present for any user {
pick up first result from result queue
send results to user
mark job as done in database
if this user has job waiting in database, mark it as active
if job was marked active {
put job on active job queue
}
}
}
Or if the waiting jobs sit in per-user message queues, transactions will be easier and a single Coordinator going through the loop won't need to worry about multi-threading.
Making things fully transactional across database and queues may be hard but need not be necessary. Introducing a pending state you should allow you to err on the side of caution making sure no jobs get lost if a step fails.