I want to scrape some website & I want that this should be done by separate worker process. I came to know about delayed job to do jobs in background. I am using collectiveidea / delayed_job in my rails application.I followed the installation steps for rails 3.0 & active record.
After that I created a dj.rb in lib file & wrote code as follows.
require 'nokogiri'
require 'open-uri'
class Dj_testing
def perform
#code for scraping the site
#code to add entry into database
end
end
Now after that I use following command to start worker
script/delayed_job start
rake jobs:work
My worker started & on my terminal I can see
[Worker(host:user1234-desktop pid:9487)] Starting job worker
Now my problem is When I call the perform method directly It works fine. I mean following code works perfectly scrapes the site and populates the database.
ruby-1.9.2-p0 > Dj_testing.new.perform
But when I delay that same job it adds job to delayed_job table & does nothing :(
ruby-1.9.2-p0 > Dj_testing.delay.new
or
ruby-1.9.2-p0 > Delayed::Job.enqueue Dj_testing.new
#<Delayed::Backend::ActiveRecord::Job id: 150, priority: 0, attempts: 0, handler:
"---!ruby/object:Delayed::PerformableMethod \nargs: ...", last_error: nil,
run_at: "2012-04-27 05:25:29", locked_at: nil, failed_at: nil, locked_by: nil,
queue: nil,created_at: "2012-04-27 05:25:29", updated_at: "2012-04-27 05:25:29">
Why the job is not working as desired?
You need to call 'delay' on the object on which you are invoking the method. So, in your case, it should be:
Dj_testing.new.delay.perform
and not
Dj_testing.delay.new
#salil's way of calling the method is correct, however you might be suffering from other issues.
Follow the guidelines in my answer to this post (I did not post here since there are many :-))
https://stackoverflow.com/a/15000180/226255
I hope this helps
Related
I have many jobs that are calling other nested jobs using perform_later. However, during some tests on Cucumber, I'd like to execute those jobs immediately after to proceed with the rests of the tests.
I thought it would be enough to add
# features/support/active_job.rb
World(ActiveJob::TestHelper)
And to call jobs using this in a step definition file
perform_enqueued_jobs do
# call step that calls MyJob.perform_later(*args)
end
However I run into something like that
undefined method `perform_enqueued_jobs' for #<ActiveJob::QueueAdapters::AsyncAdapter:0x007f98fd03b900> (NoMethodError)
What am I missing / doing wrong ?
I switched to the :test adapter in tests and it worked out for me:
# initialisers/test.rb
config.active_job.queue_adapter = :test
# features/support/env.rb
World(ActiveJob::TestHelper)
It would seem as long as you call .perform_now inside the cucumber step, even if there are nested jobs with .deliver_later inside, it does work too
#support/active_job.rb
World(ActiveJob::TestHelper)
#my_job_steps.rb
Given(/^my job starts$/) do
MyJob.perform_now(logger: 'stdout')
end
#jobs/my_job.rb
...
MyNestedJob.perform_later(*args) # is triggered during the step
...
Also, in my environment/test.rb file I didn't write anything concerning ActiveJob, the default was working fine. I believe the default adapter for tests is :inline so calling .deliver_later _now shouldn't matter
So I need to stop a running Job in Sidekiq (3.1.2) programmatically, not a scheduled one. I did read the API documentation but didn't really find anything about cancelling running jobs. Is this possible with sidekiq?
When this is not directly possible, my idea was to circumvent this, by raising an exception in the job when I call the signal, then deleting the job from the retryset. This is clearly not optimal though.
Thanks in advance
Correct, the only way to stop a job is for the job to stop itself. Your application must implement that logic.
https://github.com/mperham/sidekiq/wiki/FAQ#how-do-i-cancel-a-sidekiq-job
If you know the long running job's Thread ID, its possible to terminate it from another task:
class ThreadLightly
include Sidekiq::Worker
def perform(tid)
puts "I'm %s, and I'll be terminating TID: %s..." % [self.class, tid]
Thread.list.each {|t|
if t.object_id.to_s == tid
puts "Goodbye %s!" % t
t.exit
end
}
end
end
You can trigger it from the sidekiq_pusher:
bundle exec ./pusher.rb ThreadLightly $YOURJOBSTHREADID
You'll need to log the Thread.current.object_id from each job since the UI dosn't show it. Also, if you run distributed sidekiqs, you'll need to run this task until it runs on the same instance.
I am testing my Delayed::Job using Rspec.
In my rspec_controller:
it "queues up delayed job and fires" do
setup
expect {
post :create, {:job => valid_attributes}
}.to change(Delayed::Job, :count).by(2)
Delayed::Worker.new.work_off.should == [2,0]
end
Delayed::Job.count passes as expected, but Delayed::Worker.new.work_off returns as [0,0], indicating there are 0 successes and 0 failures when there are 2 jobs.
How should I debug to find out why work_off doesn't fire the jobs.
Edit: The 2 jobs that are supposed to run, have their run_at set into the future. Does work_off fire off jobs that are not meant to be immediate?
Although this could be an older question, there's one parameter that's not much documented, try using
Delayed::Worker.new(quiet: false).work_off
to debug the result of your background jobs, this could help you to find out if the fact that they're supposed to run in the future is messing with the assert itself.
EDIT: Don't forget to take off the "quiet:false" when you're done, otherwise your tests will always output the results of the background jobs.
The construct
Delayed::Worker.new.work_off
immediately processes everything that is in the DJ queue, and in the same thread as the caller (it doesn't spawn a separate worker thread). But this doesn't explain why you're not getting [2, 0] for a result.
To answer your original question 'How should I debug to find out why work_off doesn't fire the jobs?', I suggest you use the callback hooks to trace the lifecycle of the jobs. Add a comment if you need to be shown how to do that... :)
I recollect getting log files that were nicely ordered, so that you could follow one request, then the next, and so on.
Now, the log files are, as my 4 year old says "all scroggled up", meaning that they are no longer separate, distinct chunks of text. Loggings from two requests get intertwined/mixed up.
For instance:
Started GET /foobar
...
Completed 200 OK in 2ms (Views: 0.4ms | ActiveRecord: 0.8ms)
Patient Load (wait, that's from another request that has nothing to do with foobar!)
[ blank space ]
Something else
This is maddening, because I can't tell what's happening within one single request.
This is running on Passenger.
I tried to search for the same answer but couldn't find any good info. I'm not sure if you should fix server or rails code.
If you want more info about the issue here is the commit that removed old way of logging https://github.com/rails/rails/commit/04ef93dae6d9cec616973c1110a33894ad4ba6ed
If you value production log readability over everything else you can use the
PassengerMaxInstancesPerApp 1
configuration. It might cause some scaling issues. Alternatively you could stuff something like this in application.rb:
process_log_filename = Rails.root + "log/#{Rails.env}-#{Process.pid}.log"
log_file = File.open(process_log_filename, 'a')
Rails.logger = ActiveSupport::BufferedLogger.new(log_file)
Yep!, they have made some changes in the ActiveSupport::BufferedLogger so it is not any more waiting until the request has ended to flush the logs:
http://news.ycombinator.com/item?id=4483390
https://github.com/rails/rails/commit/04ef93dae6d9cec616973c1110a33894ad4ba6ed
But they have added the ActiveSupport::TaggedLogging which is very funny and you can stamp every log with any kind of mark you want.
In your case could be good to stamp the logs with the request UUID like this:
# config/application.rb
config.log_tags = [:uuid]
Then even if the logs are messed up you still can follow which of them correspond to the request you are following up.
You can make more funny things with this feature to help you in your logs study:
How to log user_name in Rails?
http://zogovic.com/post/21138929607/running-time-in-rails-logs
Well, for me the TaggedLogging solution is a no go, I can live with some logs getting lost if the server crashes badly, but I want my logs to be perfectly ordered. So, following advice from the issue comments I'm applying this to my app:
# lib/sequential_logs.rb
module ActiveSupport
class BufferedLogger
def flush
#log_dest.flush
end
def respond_to?(method, include_private = false)
super
end
end
end
# config/initializers/sequential_logs.rb
require 'sequential_logs.rb'
Rails.logger.instance_variable_get(:#logger).instance_variable_get(:#log_dest).sync = false
As far as I can say this hasn't affected my app, it is still running and now my logs make sense again.
They should add some quasi-random reqid and write it in every line regarding one single request. This way you won't get confused.
I haven't used it, but I believe Lumberjack's unit_of_work method may be what you're looking for. You call:
Lumberjack.unit_of_work do
yield
end
And all logging done either in that block or in the yielded block are tagged with a unique ID.
Let's say I have delayed_job running in the background. Tasks can be scheduled or run immediately(some are long tasks some are not)
If a task is too long, a user should be able to cancel it. Is it possible in delayed job? I checked the docs and can't seem to find a terminate method or something. They only provide a catch to cancel delayed job itself(thus cancelling all tasks...I need to just cancel a certain running task)
UPDATE
My boss(who's a great programmer btw) suggested to use Ruby Threading for this feature of ours. Is this possible? Like creating new threads per task and killing that thread while it's running?
something like:
t1 = Thread.new(task.run)
self.delay.t1.join (?) -- still reading on threads so correct me if im wrong
then to stop it i'll just use t1.stop (?) again don't know yet
Is this possible? Thanks!
It seems that my boss hit the spot so here's what we did(please tell us if there's some possibility this is bad practice so I can bring it up):
First, we have a Job model that has def execute! (which runs what it's supposed to do).
Next, we have delayed_job worker in the background, listening for new jobs. Now when you create a job, you can schedule it to run immediately or run every certain day (we use rufus for this one)
When a job is created, it checks if its supposed to run immediately. If it is, it adds itself to the delayed job queue. The execute function creates a Thread, so each job has its own thread.
User in the ui can see if a job is running(if there's a started_at and no finished_at). If it IS running, there's a button to cancel it. Canceling it just sets the job's canceled_at to Time.now.
While the job is running it also checks itself if it has a canceled_at or if Time.now is > finished_at. If so, kill the thread.
Voila! We've tested it for one job and it seems to work. Now the only problem is scaling...
If you see any problems with this please do so in the comments or give more suggestions if ever :) I hope this helps some one too!
Delayed::Job is an < ActiveRecord::Base model, so you can query it just like you normally would like Delayed::Job.all(:conditions => {:last_error => nil}).
Delayed::Job objects have a payload field which contain a serialized version of the method or job that you're attempting to run. This object is accessed by their '#payload_object' method, which loads the object in question.
You can combine these two capabilities to make queriable job workers, for instance, if you have a User model, and the user has a paperclip'ed :avatar, then you can make a method to delete unprocessed jobs like so:
class User < ActiveRecord::Base
has_attached_file :avatar, PaperclipOptions.new(:avatar)
before_create :'process_avatar_later'
def process_avatar_later
filename = Rails.root.join('tmp/avatars_for_processing/',self.id)
open(filename, 'w') do |file| file <<self.avatar.to_file end
Delayed::Job.enqueue(WorkAvatar.new(self.id, filename))
self.avatar = nil
end
def cancel_future_avatar_processing
WorkAvatar.future_jobs_for_user(self.id).each(&:destroy)
#ummm... tell them to reupload their avatar, I guess?
end
class WorkAvatar < Struct.new(:user_id, :path)
def user
#user ||= User.find(self.user_id)
end
def self.all_jobs
Delayed::Job.scoped(:conditions => 'payload like "%WorkAvatar%"')
end
def self.future_jobs_for_user(user_id)
all_jobs.scoped(:conditions => {:locked_at => nil}).select do |job|
job.payload_object.user_id == user_id
end
end
def perform
#user.avatar = File.open(path, 'rb')
#user.save()
end
end
end
It's possible someone has made a plugin make queryable objects like this. Perhaps searching on GitHub would be fruitful.
Note also that you'd have to work with any process monitoring tools you might have to cancel any running job worker processes that are being executed if you want to cancel a job that has locked_at and locked_by set.
You can wrap the task into a Timeout statement.
require 'timeout'
class TaskWithTimeout < Struct.new(:parameter)
def perform
Timeout.timeout(10) do
# ...
end
rescue Timeout::Error => e
# the task took longer than 10 seconds
end
end
No, there's no way to do this. If you're concerned about a runaway job you should definitely wrap it in a timeout as Simone suggests. However, it sounds like you're in search of something more but I'm unclear on your end goal.
There will never be a way for a user to have a "cancel" button since this would involve finding a method to directly communicate with the worker running process running the job. It would be possible to add a signal handler to the worker so that you could do something like kill -USR1 pid to have it abort the job it's currently working and move on. Would this accomplish you goal?