Jenkins removing queued and running build on restart - api

I have a Jenkins instance to which I am sending build request programmatically through API. My server gets restarted once in a day.
I have observed that when Jenkins server gets restarted, Jenkins is not keeping any track of queued Jobs and running jobs. We looses those jobs to be triggered.
Also I wanted to monitor programmatically if the queued build was actually executed or not. But when we restart the Jenkins, queue ids gets started from one.
Is there any way [any plugin] available that persistent the queued build and continue executing after restart in the same order as they were queued ?
Also want to continue queue numbed at which it was there before restart.

According to this and this, /safeRestart should be enough for what you need.
Or you can use Naginator plugin to restart failed (due to Jenkins going down) builds

Related

Celery 4.3.0 - Send Signal To a Task Without Termination

On a celery service on CENTOS which runs a single task at a time, the termination of a task is simple:
revoke(id, terminate=True, signal='SIGINT')
However while the interrupt signal is being processed, the running task gets revoked. Then a new task - from the queue - starts on the node. This is troublesome. Two task are running at the same time on the node. The signal handling could take up to a minute.
The question is how a signal could be sent to a running task, without actually terminating the task in celery?
Or let's say is there any way to send a signal to a running task?
The assumption is user should be able to send a signal from a remote node. In other words user does not have access to list the running processes of the node.
Any other solution is welcome.
I don't understand your goal.
Are you trying to kill the worker? if so, I guess you are talking about t "Warm shutdown", so you can send the SIGTEERM to the worker's process. The running task will get a chance to finish but no new task will be added.
If you're just interested in revoking a specific task and keep using the same worker, can you share your celery configuration and the worker command? are you sure you're running with concurrency 1 ?

Distributed job management system

I'm using beeQueue for video transcoding job scheduling and processing
For now everything is fine and but I'm now facing challenge of working with distributed environment like auto scaling the amazon the instances for adding more workers to process more jobs which are pending in the queue, We scale well but need to implement a system which is fail safe, I mean in case a instance on which workers were processing the job has gone shutdown and we don't get job status or events, In that case the job which were running on that instance is gone into blackhole and can't be recovered and processed again.
What I did :
I'm looking up for ready made solution who works fail safe in distributed env.
Thanks

How to properly restart a kafka s3 sink connect?

I started a kafka s3 sink connector (bundle connector from confluent package) since 1 May. It works fine until 8 May. Checking the status, it tells that some aws exception crashes this connector. This should not be a big problem, so I want to restore it.
I tried the following steps:
I POST /connectors/s3sink/restart . Then I saw the connector is in RUNNING mode, but the task is still FAIL.
Then I PUT /connectors/s3sink/task/0/restart. Ok, now the task is in RUNNING mode.
But then I tail the log, I found it starts to rewrite the old data, such as 3 May data. And it messed the old data!
So, does connect restart REST API reset the offset? I thought it will save the offset and just start from the offset it fails.
And how to restart a failed connector task correctly? By deleting those PODs? (using kubernetes), or by REST /task/0/restart? When should I use /connectors/s3sink/restart?
/connector/:name/restart is a rolling restart operation on the worker leader that needs to propagate to all worker server tasks in async fashion. So, you need to ensure network connection between the leader worker and all others.
/connector/:name/task/:num/restart will send request straight to that worker, restarting the thread.
Restart should not reset the offset since they are stored in the consumer offsets topic for that connect cluster. If anything, the tasks were not able to commit offsets back to the __consumer_offsets topic, but you should see logs for that.

How to detach Jprofiler8 'jpenable' remote agent

Since I need to profile the application runs in remote machine where GUI is not allowed. I started remote session profiling with JProfiler8 and ran /bin/jpenable agent in remote host. After the successful analysis I need to stop that remote jpenable jprofiler8 agent. How can I do that?
To make sure previously started agent is still in running state or not, I ran the /bin/jpenable agent again. Now I don't see previously binded JVM. So i assume it already bind with previous agent.
Unfortunately, it is not possible to unload a JVMTI profiling agent. The JVM only unloads agents when it shuts down.

Block a node with several executors from accepting more jobs until a given job is completed

I am running a Selenium test suite on ec2 windows instances. These instances should be restarted once every few days as maintenance to free up memory etc.
The problem I have is when I send the restart-command to the slave from Jenkins, I can´t be certain that the slave have no running jobs at that time, since the slave run several executors.
Is there a way to tell a node that, as soon as job X is triggered, drop the amount of executors to 0? If not, is there a way to gracefully put a slave offline (i.e: "complete all jobs in que but don´t accept any new jobs")?
(jenkins_url)/safeRestart - Allows all running jobs to complete. New
jobs will remain in the queue to run after the restart is complete.