Block a node with several executors from accepting more jobs until a given job is completed - selenium

I am running a Selenium test suite on ec2 windows instances. These instances should be restarted once every few days as maintenance to free up memory etc.
The problem I have is when I send the restart-command to the slave from Jenkins, I can´t be certain that the slave have no running jobs at that time, since the slave run several executors.
Is there a way to tell a node that, as soon as job X is triggered, drop the amount of executors to 0? If not, is there a way to gracefully put a slave offline (i.e: "complete all jobs in que but don´t accept any new jobs")?

(jenkins_url)/safeRestart - Allows all running jobs to complete. New
jobs will remain in the queue to run after the restart is complete.

Related

Celery 4.3.0 - Send Signal To a Task Without Termination

On a celery service on CENTOS which runs a single task at a time, the termination of a task is simple:
revoke(id, terminate=True, signal='SIGINT')
However while the interrupt signal is being processed, the running task gets revoked. Then a new task - from the queue - starts on the node. This is troublesome. Two task are running at the same time on the node. The signal handling could take up to a minute.
The question is how a signal could be sent to a running task, without actually terminating the task in celery?
Or let's say is there any way to send a signal to a running task?
The assumption is user should be able to send a signal from a remote node. In other words user does not have access to list the running processes of the node.
Any other solution is welcome.
I don't understand your goal.
Are you trying to kill the worker? if so, I guess you are talking about t "Warm shutdown", so you can send the SIGTEERM to the worker's process. The running task will get a chance to finish but no new task will be added.
If you're just interested in revoking a specific task and keep using the same worker, can you share your celery configuration and the worker command? are you sure you're running with concurrency 1 ?

Distributed job management system

I'm using beeQueue for video transcoding job scheduling and processing
For now everything is fine and but I'm now facing challenge of working with distributed environment like auto scaling the amazon the instances for adding more workers to process more jobs which are pending in the queue, We scale well but need to implement a system which is fail safe, I mean in case a instance on which workers were processing the job has gone shutdown and we don't get job status or events, In that case the job which were running on that instance is gone into blackhole and can't be recovered and processed again.
What I did :
I'm looking up for ready made solution who works fail safe in distributed env.
Thanks

Apache Marathon/Docker Swarm: containers keep repeating

I have a very simple container which says "hello world"
I have successfully run them and scale them to X.
They all seem to be in a cycle where they would run it then sleep for a bit then run it again.
Marathon cycle would be: Waiting, Running, Delayed and repeat
Swarm cycle woulbd be: Ready, Running, Shutdown and repeat
How do i specify so that the container finishes after first execution whether in swarm or marathon?
You can not, both Swarm and Marathon are designed for long running services.
For running something just one time you should use plain docker run command in Swarm and some other framework in Mesos (Marathon run on Mesos) e.g. Chronos which is cron replacement for Mesos and runs tasks periodically.

Amazon EMR managing my spark cluster

I have a spark setup on Amazon EC2 machines with 2 worker machines running. It reads data from cassandra, do some processing and write to sql server. I have heard about amazon EMR and read about it. I want a managed system where my worker machines are automatically added to my cluster if my job is taking more time and shutdown when my job gets completed.
Can I achieve this through Amazon EMR?
The requirements are:
My worker machines are automatically added to my cluster if my job is taking more time.
Shutdown when my job gets completed.
No. 2 is definitely possible if your job is launched from the steps. There is an option that auto-terminates cluster after the last step is completed. Alternatively, this could also be done programatically with the SDK.
No. 1 is a little more difficult but EMR has three classes of nodes; master, core, and task. Task nodes can be added after cluster creation. The trigger for that would probably have to be done programatically or utilizing another Amazon service, like Lambda.

Jenkins removing queued and running build on restart

I have a Jenkins instance to which I am sending build request programmatically through API. My server gets restarted once in a day.
I have observed that when Jenkins server gets restarted, Jenkins is not keeping any track of queued Jobs and running jobs. We looses those jobs to be triggered.
Also I wanted to monitor programmatically if the queued build was actually executed or not. But when we restart the Jenkins, queue ids gets started from one.
Is there any way [any plugin] available that persistent the queued build and continue executing after restart in the same order as they were queued ?
Also want to continue queue numbed at which it was there before restart.
According to this and this, /safeRestart should be enough for what you need.
Or you can use Naginator plugin to restart failed (due to Jenkins going down) builds