Sidekiq stop one single, running job - jobs

So I need to stop a running Job in Sidekiq (3.1.2) programmatically, not a scheduled one. I did read the API documentation but didn't really find anything about cancelling running jobs. Is this possible with sidekiq?
When this is not directly possible, my idea was to circumvent this, by raising an exception in the job when I call the signal, then deleting the job from the retryset. This is clearly not optimal though.
Thanks in advance

Correct, the only way to stop a job is for the job to stop itself. Your application must implement that logic.
https://github.com/mperham/sidekiq/wiki/FAQ#how-do-i-cancel-a-sidekiq-job

If you know the long running job's Thread ID, its possible to terminate it from another task:
class ThreadLightly
include Sidekiq::Worker
def perform(tid)
puts "I'm %s, and I'll be terminating TID: %s..." % [self.class, tid]
Thread.list.each {|t|
if t.object_id.to_s == tid
puts "Goodbye %s!" % t
t.exit
end
}
end
end
You can trigger it from the sidekiq_pusher:
bundle exec ./pusher.rb ThreadLightly $YOURJOBSTHREADID
You'll need to log the Thread.current.object_id from each job since the UI dosn't show it. Also, if you run distributed sidekiqs, you'll need to run this task until it runs on the same instance.

Related

When running sequence job in datastage, get the timeout error ,code =-14 [Timed out while waiting for an event]

When running sequence job in Datastage, there would be occasions to get error like this:
Seq_CHECK_JV_OMS_DATA..JobControl (#Pjob_CHECK_JV_OMS_DATA): Controller problem: Error calling DSRunJob(Pjob_CHECK_JV_OMS_DATA.Seq_CHECK_JV_OMS_DATA), code=-14
[Timed out while waiting for an event]
Why would this happen? Cause I'm not running these job instances concurrently, but execute one by one (It is serializable)?
And I get other problems too. The job instances under sequence job controller often get stuck like this:
And this status will last for ever unless I clear the status file
I'm getting crazy with that! Could anyone help? Thanks very much!
The problem is that your server is overloaded, and after 60 seconds it stops and abort with this error, when you try to execute your job Pjob_CHECK_JV_OMS_DATA.Seq_CHECK_JV_OMS_DATA. Do you have many instances of the job Pjob_CHECK_JV_OMS_DATA running at the same time?
Try to wait a few seconds to start this job.

How do I debug a Delayed::Worker.work_off that doesn't return success or failure

I am testing my Delayed::Job using Rspec.
In my rspec_controller:
it "queues up delayed job and fires" do
setup
expect {
post :create, {:job => valid_attributes}
}.to change(Delayed::Job, :count).by(2)
Delayed::Worker.new.work_off.should == [2,0]
end
Delayed::Job.count passes as expected, but Delayed::Worker.new.work_off returns as [0,0], indicating there are 0 successes and 0 failures when there are 2 jobs.
How should I debug to find out why work_off doesn't fire the jobs.
Edit: The 2 jobs that are supposed to run, have their run_at set into the future. Does work_off fire off jobs that are not meant to be immediate?
Although this could be an older question, there's one parameter that's not much documented, try using
Delayed::Worker.new(quiet: false).work_off
to debug the result of your background jobs, this could help you to find out if the fact that they're supposed to run in the future is messing with the assert itself.
EDIT: Don't forget to take off the "quiet:false" when you're done, otherwise your tests will always output the results of the background jobs.
The construct
Delayed::Worker.new.work_off
immediately processes everything that is in the DJ queue, and in the same thread as the caller (it doesn't spawn a separate worker thread). But this doesn't explain why you're not getting [2, 0] for a result.
To answer your original question 'How should I debug to find out why work_off doesn't fire the jobs?', I suggest you use the callback hooks to trace the lifecycle of the jobs. Add a comment if you need to be shown how to do that... :)

spring batch| Graceful job termination within the job

After launching a job, in the before job - there are certain occasions where we want to gracefully terminate the job (i.e. dont run the job at all but neither complain i.e .no exception). The current way of doing this looks like invoking jobExecution.stop - However, this results in JobInteruptedException which further results in logger.error invocation.
Is there any other better programmatic alternative (without manual intervention)?
You may read :
Section 5.3.3 Configuring for Stop and
section 5.3.4. Programmatic Flow Decisions.
Just introduce an end element for your first step based on condition:
The 'end' element instructs a Job to stop with a BatchStatus of
COMPLETED.
I solved the problem adding a flag boolean executeTheJob in my "before job" listener that I set to false when I don't want to execute the job.
Then I handle that in my firstStep with this configuration:
<step id="firstStep" >
<tasklet ref="myFirstTasklet"/>
<stop on="STOPPED" restart="firstStep" />
<next on="COMPLETED" to="nextStep"/>
</step>
And at the beginning of my first tasklet I have this:
if (executeTheJob == false) {
contribution.setExitStatus(ExitStatus.STOPPED);
}
stop() instruction will be active only if transaction commit successfully.
If all you chunks rollback your job doesn't stop.
I have make this workaround:
Create a ChunkListener and in the method afterChunkError(ChunkContext chunkCtx) put:
StepExecution stepExecution = chunkCtx.getStepContext().getStepExecution();
JobExecution jobExecution = jobExplorer.getJobExecution(stepExecution.getJobExecutionId());
if (jobExecution.getStatus().equals(BatchStatus.STOPPING)) {
stepExecution.setTerminateOnly();
}
This will force a "controlled" stop.
Instead of invoking stop() on the job execution, try signalling it via the JobOperator as shown in Stopping a Job

Erlang finish or kill process

I have erlang application. In this application i run process with spawn(?MODULE, my_foo, [my_param1, my_param2, my_param3]).
And my_foo:
my_foo(my_param1, my_param2, my_param3) ->
...
some code here
...
ok.
When i open etop i see that this my_foo/3 function status: proc_lib:sync_wait/2
Than i try to put exit(self(), normal) in the end of my function, but i see same behavior: proc_lib:sync_wait/2 in etop.
How can i kill or exit process correctly?
Thank you.
Note that exit(Pid, Reason) and exit(Reason) do NOT do the same thing if Pid is the process itself. exit/1 tells the current process to exit - from the inside if you like - while exit/2 sends an exit signal to the process, even if the process is itself. So when you do exit(self(), normal) you are actually sending the normal exit signal to yourself, which is ignored.
In this case putting the exit call at the end of the function should not make any difference as the process automatically dies (with reason normal) when the function with which it was started ends. It seems like the process is suspended somewhere before that.
proc_lib:sync_wait/2 is called inside proc_lib:start/start_link and sits and waits for the spawned process to do proc_lib:init_ack/1/2 to return the return value for start. It would appear that your process does not call init_ack.
Based on the limited information that you give in the question I would suspect that your process hasn't finished running yet.
Normally you don't need to add exit/2 to your process. It will exit automatically when the function has finished running.
You probably have a long running call in some code here that has not finished running. I recommend that you add logging information and see where you are stuck.

Is it possible to terminate an already running delayed job using Ruby Threading?

Let's say I have delayed_job running in the background. Tasks can be scheduled or run immediately(some are long tasks some are not)
If a task is too long, a user should be able to cancel it. Is it possible in delayed job? I checked the docs and can't seem to find a terminate method or something. They only provide a catch to cancel delayed job itself(thus cancelling all tasks...I need to just cancel a certain running task)
UPDATE
My boss(who's a great programmer btw) suggested to use Ruby Threading for this feature of ours. Is this possible? Like creating new threads per task and killing that thread while it's running?
something like:
t1 = Thread.new(task.run)
self.delay.t1.join (?) -- still reading on threads so correct me if im wrong
then to stop it i'll just use t1.stop (?) again don't know yet
Is this possible? Thanks!
It seems that my boss hit the spot so here's what we did(please tell us if there's some possibility this is bad practice so I can bring it up):
First, we have a Job model that has def execute! (which runs what it's supposed to do).
Next, we have delayed_job worker in the background, listening for new jobs. Now when you create a job, you can schedule it to run immediately or run every certain day (we use rufus for this one)
When a job is created, it checks if its supposed to run immediately. If it is, it adds itself to the delayed job queue. The execute function creates a Thread, so each job has its own thread.
User in the ui can see if a job is running(if there's a started_at and no finished_at). If it IS running, there's a button to cancel it. Canceling it just sets the job's canceled_at to Time.now.
While the job is running it also checks itself if it has a canceled_at or if Time.now is > finished_at. If so, kill the thread.
Voila! We've tested it for one job and it seems to work. Now the only problem is scaling...
If you see any problems with this please do so in the comments or give more suggestions if ever :) I hope this helps some one too!
Delayed::Job is an < ActiveRecord::Base model, so you can query it just like you normally would like Delayed::Job.all(:conditions => {:last_error => nil}).
Delayed::Job objects have a payload field which contain a serialized version of the method or job that you're attempting to run. This object is accessed by their '#payload_object' method, which loads the object in question.
You can combine these two capabilities to make queriable job workers, for instance, if you have a User model, and the user has a paperclip'ed :avatar, then you can make a method to delete unprocessed jobs like so:
class User < ActiveRecord::Base
has_attached_file :avatar, PaperclipOptions.new(:avatar)
before_create :'process_avatar_later'
def process_avatar_later
filename = Rails.root.join('tmp/avatars_for_processing/',self.id)
open(filename, 'w') do |file| file <<self.avatar.to_file end
Delayed::Job.enqueue(WorkAvatar.new(self.id, filename))
self.avatar = nil
end
def cancel_future_avatar_processing
WorkAvatar.future_jobs_for_user(self.id).each(&:destroy)
#ummm... tell them to reupload their avatar, I guess?
end
class WorkAvatar < Struct.new(:user_id, :path)
def user
#user ||= User.find(self.user_id)
end
def self.all_jobs
Delayed::Job.scoped(:conditions => 'payload like "%WorkAvatar%"')
end
def self.future_jobs_for_user(user_id)
all_jobs.scoped(:conditions => {:locked_at => nil}).select do |job|
job.payload_object.user_id == user_id
end
end
def perform
#user.avatar = File.open(path, 'rb')
#user.save()
end
end
end
It's possible someone has made a plugin make queryable objects like this. Perhaps searching on GitHub would be fruitful.
Note also that you'd have to work with any process monitoring tools you might have to cancel any running job worker processes that are being executed if you want to cancel a job that has locked_at and locked_by set.
You can wrap the task into a Timeout statement.
require 'timeout'
class TaskWithTimeout < Struct.new(:parameter)
def perform
Timeout.timeout(10) do
# ...
end
rescue Timeout::Error => e
# the task took longer than 10 seconds
end
end
No, there's no way to do this. If you're concerned about a runaway job you should definitely wrap it in a timeout as Simone suggests. However, it sounds like you're in search of something more but I'm unclear on your end goal.
There will never be a way for a user to have a "cancel" button since this would involve finding a method to directly communicate with the worker running process running the job. It would be possible to add a signal handler to the worker so that you could do something like kill -USR1 pid to have it abort the job it's currently working and move on. Would this accomplish you goal?