Celery task for ML prediction hangs in execution - tensorflow

I'm trying to create a web application that receives an input from POST request and provides some ML predictions based on that input.
Since prediction model is quite heavy, I don't want that user waits for calculation to complete. Instead, I delegated heavy computation to Celery task and user can later inspect the result.
I'm using simple Flask application with Celery, Redis and Flower.
My view.py:
#ns.route('predict/')
class Predict(Resource):
...
def post(self):
...
do_categorize(data)
return jsonify(success=True)
My tasks.py file looks something like this:
from ai.categorizer import Categorizer
categorizer = Categorizer(
model_path='category_model.h5',
tokenizer_path='tokenize.joblib',
labels_path='labels.joblib'
)
#task()
def do_categorize(data):
result = categorizer.predict(data)
print(result)
# Write result to the DB
...
My predict() method inside Categorizer class:
def predict(self, value):
K.set_session(self.sess)
with self.sess.as_default():
with self.graph.as_default():
prediction = self.model.predict(np.asarray([value], dtype='int64'))
return prediction
I'm running Celery like this:
celery worker -A app.celery --loglevel=DEBUG
The problem that I'm having for the last couple of days is that categorizer.predict(data) call hangs in the middle of the execution.
I tried to run categorizer.predict(data) inside of post method and it works. But if I place it inside Celery tasks it stop working. There is no console log, if I try to debug it it just freezes on .predict().
My questions:
How can I solve this issue?
Is there any memory, CPU limit for the worker?
Are Celery tasks the "right" way to do such heavy computations?
How can I debug this problem? What am I doing wrong?
Is it correct to initialize models at the top of the file?

Thanks to this SO question I found the answer for my problem:
It turns out that is better for Keras to work with Threads pooling instead of default Process.
Lucky for me, with Celery 4.4 Threads pooling was reintroduced not a long time ago.
You can read more at Celery 4.4 Changelogs:
Threaded Tasks Pool
We reintroduced a threaded task pool using
concurrent.futures.ThreadPoolExecutor.
The previous threaded task pool was experimental. In addition it was based on the threadpool package which is obsolete.
You can use the new threaded task pool by setting worker_pool to ‘threads` or by passing –pool threads to the celery worker command.
Now you can use threads instead of processes for pooling.
celery worker -A your_application --pool threads --loginfo=INFO
If you can not use latest Celery version, you have possibility to use gevent package:
pip install gevent
celery worker -A your_application --pool gevent --loginfo=INFO

Related

How to use a different named worker pool in same verticle?

I have one verticle in my service which takes in the http requests and uses executeBlocking to talk to MySQL db. I am using named worker pool to interact with DB. Now, for pushing the application metrics (using a lib. which is blocking) I want to use a different named worker pool. As I don't want the DB operations to be interrupted with metrics so I want to have a separate worker pool.
I could use event bus and use a worker verticle to push the metrics but as that has overhead of transformation to the JsonObject, I want to use executeBlocking itself from the same verticle.
As mentioned here https://groups.google.com/d/msg/vertx/eSf3AQagGGU/9m8RizIJeNQJ
, the worker pool used in both the cases is same. So, will making a new worker verticle really help me in decoupling the threads used for DB operation and the ones used to push metrics.
Can anyone help me with a better design choice or how can I use a different worker pool if I use the same verticle ?
Try the following code (written in Kotlin, but you get the idea):
val workerExecutor1 = vertx.createSharedWorkerExecutor("executor1", 4)
val workerExecutor2 = vertx.createSharedWorkerExecutor("executor2", 4)
workerExecutor1.executeBlocking(...) // execute your db code here
workerExecutor2.executeBlocking(...) // execute your metrics code here
Don't forget to close the workerExecutor once it's not needed:
workerExecutor1.close()

Celery + HaProxy + RabbitMQ lose task messages

Probably not the best place to ask (maybe Server Fault), but I'd try here:
I have django celery sending tasks via HaProxy to a RabbitMQ cluster, and we are losing messages (tasks not being executed) every now and then.
Observed
We turned of the workers and monitored the queue size, we noticed that we started 100 jobs but only 99 showed up in the queue.
It seems to happen when other processes are using RabbitMQ for other jobs
Tried
I tried flooding RabbitMQ with dummy messages with many connections and tried to put some proper tasks in to the queue, I couldn't replicate the issue constantly.
Just wondering if anyone had experienced this before?
UPDATE:
So I dove into the code and eventually stumble on to celery/app/amqp.py, I was debugging by adding extra publish method to an non-existing exchange, see below:
log.warning(111111111)
self.publish(
body,
exchange=exchange, routing_key=routing_key,
serializer=serializer or self.serializer,
compression=compression or self.compression,
headers=headers,
retry=retry, retry_policy=_rp,
reply_to=reply_to,
correlation_id=task_id,
delivery_mode=delivery_mode, declare=declare,
**kwargs
)
log.warning(222222222)
self.publish(
body,
exchange='celery2', routing_key='celery1',
serializer=serializer or self.serializer,
compression=compression or self.compression,
headers=headers,
retry=retry, retry_policy=_rp,
reply_to=reply_to,
correlation_id=task_id,
delivery_mode=delivery_mode, declare=declare,
**kwargs
)
log.warning(333333333)
Then tried to trigger 100 tasks from project code, and then result was only 1 message got put into the celery queue, I think it's caused by ProducerPool or ConnectionPool

How to ACK celery tasks with parallel code in reactor?

I have a celery task that, when called, simply ignites the execution of some parallel code inside a twisted reactor. Here's some sample (not runnable) code to illustrate:
def run_task_in_reactor():
# this takes a while to run
do_something()
do_something_more()
#celery.task
def run_task():
print "Started reactor"
reactor.callFromThread(run_task_in_reactor)
(For the sake of simplicity, please assume that the reactor is already running when the task is received by the worker; I used the signal #worker_process_init.connect to start my reactor in another thread as soon as the worker comes up)
When I call run_task.delay(), the task finishes pretty quickly (since it does not wait for run_task_in_reactor() to finish, only schedules its execution in the reactor). And, when run_task_in_reactor() finally runs, do_something() or do_something_more() can throw an exception, which will go unoticed.
Using pika to consume from my queue, I can use an ACK inside do_something_more() to make the worker notify the correct completion of the task, for instance. However, inside Celery, this does not seems to be possible (or, at least, I do't know how to accomplish the same effect)
Also, I cannot remove the reactor, since it is a requirement of some third-party code I'm using. Other ways to achieve the same result are appreciated as well.
Use reactor.blockingCallFromThread instead.

Using django-celery chord, celery.chord_unlock keeps executing forever not calling the provided callback

I'm using Django Celery with Redis to run a few tasks like this:
header = [
tasks.invalidate_user.subtask(args = (user)),
tasks.invalidate_details.subtask(args = (user))
]
callback = tasks.rebuild.subtask()
chord(header)(callback)
So basically the same as stated in documentation.
My problem is, that when this task chord is called, celery.chord_unlock task keeps retrying forever. Tasks in header finish successfully, but because of chord_unlock never being done, callback is never called.
Guessing that my problem is with not being able to detect that the tasks from header are finished, I turned to documentation to look how can this be customized. I've found a section, describing how the synchronization is implemented, there is an example provided, what I'm missing is how do I get that example function to be called (i.e. is there a signal for this?).
Further there's a note that this method is not used with Redis backend:
This is used by all result backends except Redis and Memcached, which increment a counter after each task in the header, then applying the callback when the counter exceeds the number of tasks in the set.
But also says, that Redis approach is better:
The Redis and Memcached approach is a much better solution
What approach is that? How is it implemented?
So, why is chord_unlock never done and how can I make it detect finished header tasks?
I'm using: Django 1.4, celery 2.5.3, django-celery 2.5.5, redis 2.4.12
You don't have an example of your tasks, but I had the same problem and my solution might apply.
I had ignore_result=True on the tasks that I was adding to a chord, defined like so:
#task(ignore_result=True)
Apparently ignoring the result makes it so that the chord_unlock task doesn't know they're complete. After I removed ignore_result (even if the task only returns true) the chord called the callback properly.
I had the same error, I changed the broker to rabbitmq and chord_unlock is working until my task finishes (2-3 minutes tasks)
when using redis the task finishes and chord_unlock only retried like 8-10 times every 1s, so callback was not executing correctly.
[2012-08-24 16:31:05,804: INFO/MainProcess] Task celery.chord_unlock[5a46e8ac-de40-484f-8dc1-7cf01693df7a] retry: Retry in 1s
[2012-08-24 16:31:06,817: INFO/MainProcess] Got task from broker: celery.chord_unlock[5a46e8ac-de40-484f-8dc1-7cf01693df7a] eta:[2012-08-24 16:31:07.815719-05:00]
... just like 8-10 times....
changing broker worked for me, now I am testing #Chris solution and my callback function never receives the results from the header subtasks :S, so, it does not works for me.
celery==3.0.6
django==1.4
django-celery==3.0.6
redis==2.6
broker: redis-2.4.16 on Mac OS X
This could cause a problem such that; From the documentation;
Note:
If you are using chords with the Redis result backend and also overriding the Task.after_return() method, you need to make sure to call the super method or else the chord callback will not be applied.
def after_return(self, *args, **kwargs):
do_something()
super(MyTask, self).after_return(*args, **kwargs)
As my understanding, If you have overwritten after_return function in your task, it must be removed or at least calling super one.
Bottom of the topic:http://celery.readthedocs.org/en/latest/userguide/canvas.html#important-notes

Adjust Thread pool size when using twistd

I am going to deploy my app in the twistd way(Application,Service, etc).
I'm wondering if there is a way to adjust the thread pool size of twisted like using reactor.suggestPoolSize()
I found an API called "adjustPoolsize" in twisted.python.threadpool.ThreadPool
Can I call it directly for my purpose?
Thank you!
Recent versions of Twisted let you access the reactor's thread pool:
from twisted.internet import reactor
threadpool = reactor.getThreadPool()
threadpool.adjustPoolsize(3, 7)
However, there's no guarantee that the reactor itself won't re-adjust the size as it sees fit. If you need to control the size of the threadpool used by your application, it may be better to create your own ThreadPool instance, rather than using the reactor's.