How to configure and run remote celery worker correctly? - rabbitmq

I'm new to celery and may be doing something wrong, but I already
spent a lot of trying to figure out how to configure celery
correctly.
So, in my environment I have 2 remote servers; one is main (it has
public IP address and most of the stuff like database server, rabbitmq
server and web server running my web application is there) and another
is used for specific tasks which I want to asynchronously invoke from
the main server using celery.
I was planning to use RabbitMQ as a broker and as results back-end.
Celery config is very basic:
CELERY_IMPORTS = ("main.tasks", )
BROKER_HOST = "Public IP of my main server"
BROKER_PORT = 5672
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
BROKER_VHOST = "/"
CELERY_RESULT_BACKEND = "amqp"
When I'm running a worker on the main server tasks are executed just
fine, but when I'm running it on the remote server only a few tasks
are executed and then worker gets stuck not being able to executed any
task. When I restart the worker it executes a few more tasks and gets
stuck again. There is nothing special inside the task and I even tried
a test task that just adds 2 numbers. I tried to run the worker
differently (demonizing and not, setting different concurrency and
using celeryd_multi), nothing really helped.
What could be the reason? Did I miss something? Do I have to run
something on the main server other than the broker (RabbitMQ)? Or is
it a bug in the celery (I tried a few version: 2.2.4, 2.3.3 and dev,
but none of them worked)?
Hm... I've just reproduced the same problem on the local worker, so I
don't really know what it is... Is it required to restart celery
worker after every N tasks executed?
Any help will be very much appreciated :)

Don't know if you ended up solving the problem, but I had similar symptoms. Turned out that (for whatever reason) print statements from within tasks was causing tasks not to complete (maybe some sort of deadlock situation?). Only some of the tasks had print statements, so when these tasks executed eventually the number of workers (set by concurrency option) were all exhausted, which caused tasks to stop executing.

Try to set your celery config to
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1
docs

Related

Monit cannot start/stop service

Monit cannot start/stop service,
If i stop the service, just stop monitoring the service in Monit.
Attached the log and config for reference.
#Monitor vsftpd#
check process vsftpd
matching vsftpd
start program = "/usr/sbin/vsftpd start"
stop program = "/usr/sbin/vsftpd stop"
if failed port 21 protocol ftp then restart
The log states: "stop on user request". The process is stopped and monitoring is disabled, since monitoring a stopped (= non existing) process makes no sense.
If you Restart service (by cli or web) it should print info: 'test' restart on user request to the log and call the stop program and continue with the start program (if no dedicated restart program is provided).
In fact one problem can arise: if the stop scripts fails to create the expected state (=NOT(check process matching vsftpd)), the start program is not called. So if there is a task running that matches vsftpd, monit will not call the start program. So it's always better to use a PID file for monitoring where possible.
Finally - and since not knowing what system/versions you are on, an assumption: The vsftpd binary on my system is really only the daemon. It is not supporting any options. All arguments are configuration files as stated in the man page. So supplying start and stop only tries to create new daemons loading start and stop file. -- If this is true, the one problem described above applies, since your vsftpd is never stopped.

Error While running Mule

While Running the Mule, I am facing the below error:
Timeout waiting for mule context to be completely started
Please let me know the work around solution for this. The same integration is working fine i.e the query fetching is happening fine with other system having mule but the same is not working in my system. Please Suggest a way to overcome this.
Thanks in Advance...!
Goutham ...Did you configured timeout in your flow? If it is configured ..
1. is it configured in Munit which we need to look into run and wait scope..
2. Or is this coming during the shutdown of mule ?
You can set a timeout value to enable the current flow to complete. However, there is no built in method or utility to check what messages are in flight. You can connect a profiler and see the active threads (or just a thread dump), this should provide you an overview of what’s happening at the JVM level.
To ensure all inflight messages are processed you can shutdown mule in two steps:
Stop the flow(s) manually (this will prevent new messages from coming)
Stop Mule
Alternatively, you can set shutdownTimeout to a milliseconds value for a flow; hwoever this is not a global value.
https://docs.mulesoft.com/mule-user-guide/v/3.8/starting-and-stopping-mule-esb
http://grepcode.com/file/repo1.maven.org/maven2/org.mule/mule-core/3.7.0/org/mule/transport/AbstractMessageDispatcher.java
The second link will provide you the internal implementation of Mule's AbstractMessageDispatcher .Hope this helps.
Thanks

Handling cache warm-up with twisted and systemd

I have a simple twisted application which I run using a systemd service, executing a script, which subsequently executes a .tac file.
The application is structured as a JSON RPC endpoint (fastjsonrpc), built into a t.w.r.Resource, which is in a t.w.s.Site, and served t.a.i.TCPServer, and the whole thing packed into a t.a.Application. This works fine.
Where I do run into trouble is when I try to warm up caches at startup. This warm-up process is pretty slow (~300 seconds), and makes systemd timeout and kill the process. Increasing the timeout is not really a viable option, since I wouldn't want this to block system boot.
Analogous code is used in a separate stack running on Flask from within Apache and wsgi. That server starts itself off and lets systemd go on while it takes its time building the caches. This behaviour is fine for me.
I've tried calling the warmup function using the following within the setup function of the t.w.r.Resource:
reactor.callLater(1, ep.warmup, None)
I've not yet tried using this from within systemd, and have been testing it from twistd directly on the command line. The server does work as expected, however it no longer responds to SIGINT (^C). Removing the callLater is all that's needed to let the server respond to SIGINT.
If the warmup function is called directly (not by callLater, i.e., the arrangement which makes systemd give up while waiting for warm up to complete), the resulting server also continues to respond to SIGINT.
Is there a better / good way to handle this sort of long-running warmup code?
Why would twistd / the reactor not respond to SIGINT? Am I missing something here?
Twisted is a single-threaded thing. It sounds like your "cache warmup" code is blocking the reactor for those 300 seconds. One easy way to fix this would be using deferToThread to let it run without blocking the reactor.

rufus-scheduler and delayed_job on Heroku: why use a worker dyno?

I'm developing a Rails 3.2.16 app and deploying to a Heroku dev account with one free web dyno and no worker dynos. I'm trying to determine if a (paid) worker dyno is really needed.
The app sends various emails. I use delayed_job_active_record to queue those and send them out.
I also need to check a notification count every minute. For that I'm using rufus-scheduler.
rufus-scheduler seems able to run a background task/thread within a Heroku web dyno.
On the other hand, everything I can find on delayed_job indicates that it requires a separate worker process. Why? If rufus-scheduler can run a daemon within a web dyno, why can't delayed_job do the same?
I've tested the following for running my every-minute task and working off delayed_jobs, and it seems to work within the single Heroku web dyno:
config/initializers/rufus-scheduler.rb
require 'rufus-scheduler'
require 'delayed/command'
s = Rufus::Scheduler.singleton
s.every '1m', :overlap => false do # Every minute
Rails.logger.info ">> #{Time.now}: rufus-scheduler task started"
# Check for pending notifications and queue to delayed_job
User.send_pending_notifications
# work off delayed_jobs without a separate worker process
Delayed::Worker.new.work_off
end
This seems so obvious that I'm wondering if I'm missing something? Is this an acceptable way to handle the delayed_job queue without the added complexity and expense of a separate worker process?
Update
As #jmettraux points out, Heroku will idle an inactive web dyno after an hour. I haven't set it up yet, but let's assume I'm using one of the various keep-alive methods to keep it from sleeping: Easy way to prevent Heroku idling?.
According to this
https://blog.heroku.com/archives/2013/6/20/app_sleeping_on_heroku
your dyno will go to sleep if he hasn't serviced requests for an hour. No dyno, no scheduling.
This could help as well: https://devcenter.heroku.com/articles/clock-processes-ruby

Heroku: What to do when your dyno/worker crashes?

I have a worker doing some processing 24/7. However, sometimes the code crashes and it needs to be restarted (even if I catch the exception, I have to restart the worker in order for it to work).
What do you do when this happens or am I doing something wrong and this shouldn't happen at all? Does your dynos/workers crash or it is just me?
thanks
Heroku is supposed to restart a worker every time it crashes. As far as I know, you don't have to select or configure anything. Whatever is in your jobs:work task will be executed as soon as it fails.
In the event that you are heavily dependent on background jobs in your web app. You could create a rake task that finds the last record to be updated and execute a background job to update it. Or perhaps automate the rake task to find the rest of the records that need updating, since the last crash.
Alternatively, you force worker restart manually as indicated in this article (using delayed_job):
heroku workers 0;
heroku workers 1;
Or perhaps you can restart a specific worker by doing (mentioned in this article):
heroku restart worker.1
By the way, try the 1.9 stack. Make sure your app is 1.9.2 compatible, before doing so. Hopefully crashes are less frequent there:
heroku stack:migrate bamboo-mri-1.9.2
In the event, that such issues still arise. Best to contact Heroku support. They are very responsive at what they do.
Latest command to restart a specific heroku web worker (2014):
heroku ps:restart web.1
(tested on Cedar stack)
At times, for instance in case of DB crashes, the worker may not restart automatically. you would need to do this.
heroku restart web.1
It worked for me.