I have a celery task that, when called, simply ignites the execution of some parallel code inside a twisted reactor. Here's some sample (not runnable) code to illustrate:
def run_task_in_reactor():
# this takes a while to run
do_something()
do_something_more()
#celery.task
def run_task():
print "Started reactor"
reactor.callFromThread(run_task_in_reactor)
(For the sake of simplicity, please assume that the reactor is already running when the task is received by the worker; I used the signal #worker_process_init.connect to start my reactor in another thread as soon as the worker comes up)
When I call run_task.delay(), the task finishes pretty quickly (since it does not wait for run_task_in_reactor() to finish, only schedules its execution in the reactor). And, when run_task_in_reactor() finally runs, do_something() or do_something_more() can throw an exception, which will go unoticed.
Using pika to consume from my queue, I can use an ACK inside do_something_more() to make the worker notify the correct completion of the task, for instance. However, inside Celery, this does not seems to be possible (or, at least, I do't know how to accomplish the same effect)
Also, I cannot remove the reactor, since it is a requirement of some third-party code I'm using. Other ways to achieve the same result are appreciated as well.
Use reactor.blockingCallFromThread instead.
Related
I have a blocking workload that I want to execute on the bounded elastic scheduler. After this work is done, a lot of work that could be executed on the parallel scheduler follows, but it will automatically continue to run on the thread from the bounded elastic scheduler.
When is it "correct" to drop the previous scheduler you set earlier in the chain? Is there ever a reason to do so if it's not strictly necessary, because of thread starvation, for example?
I can switch the scheduler of a chain by "breaking" the existing chain with flatMap, then, switchIfEmpty, and probably a few more methods. Example:
public Mono<Void> mulpleSchedulers() {
return blockingWorkload()
.doOnSuccess(result -> log.info("Thread {}", Thread.currentThread().getName())) // Thread boundedElastic-1
.subscribeOn(Schedulers.boundedElastic())
.flatMap(result -> Mono.just(result)
.subscribeOn(Schedulers.parallel()))
.doOnSuccess(result -> log.info("Thread {}", Thread.currentThread().getName())); // Thread parallel-1
}
Generally it is not a bad practice to switch the execution to another scheduler throughout your reactive chain.
To switch the execution to another scheduler in the middle of your chain you can use publishOn() operator. Then any subsequent operator call will be run on the supplied scheduler worker to such publishOn().
Take a look at 4.5.1 from reference
However, you should clearly know why do you do that. Is there any reason for this?
If you want to run some long computational process (some CPU-bound work), then it is recommended to execute it on Schedulers.parallel()
For making blocking calls it is recommended to use Schedulers.boundedElastic()
However, usually for making blocking calls we use subscribeOn(Schedulers.boundedElastic()) on "blocking publisher" directly.
Wrapping blocking calls in reactor
We want to use delay feature from activeMQ to delay particural event. How does AMQ_SCHEDULED_DELAY work internaly? In documentation is information about scheduler but no information what mechanism it utilize to delay message. For that reason we are not sure how delaying is going to affect activeMQ. Does activeMQ utilize pooling or async to achive delay.
I ask this question because people from my organization want to pick diffrent technology. I do not have any proof delay from activeMQ is any better.
Here is link to source code. I was thinking of looking up code but I'm not good in java. Can anyone help?
Default implementation of ActiveMQ does utilize the polling.
Active MQ internally keep polling for the scheduled (or delayed) messages by a background scheduler thread. This thread read the list of scheduled events (or messages) and fires the jobs, reschedule repeating jobs as needed before firing the job event.
The list of scheduled events is stored in a sorted order in internal storage of activemq. So during poll, it just read event which are scheduled for earliest processing. Since the messages are persisted during enquing, scheduling many not have visible performance impact during processing.
However before adopting, you can setup your benchmark, without worries much internal implementation detail, to see that your performance/SLA requirement are getting met.
For more details, you may refer to Javadoc of job scheduler API. For default implementation can you refers to the code.
Hope this helps.
In looking at the source code mentioned by #skadya, the term "polling" is not what I interpret. It appears to use the Java Object class' wait(long timeout) method to determine when to "wake up" the thread that runs the jobs.
So, I wouldn't call it polling. I would call it an asynchronous mechanism in which the delay / timeout is set such that the thread will wake up (e.g. to run the next scheduled job at the appropriate time) via the timeout set to a value that is appropriate for the next scheduled job's commencement.
Javadoc for Object.wait(long timeout)
Note that the implementation for Object.wait is a native (i.e. non-java) implementation provided by the JDK / JRE / JVM for a given platform. For what that's worth.
It is possible to do performance test with activemq web console. There is an option to send message with configurable delay and number of messages to send. It doesn't answer my question but it seems like best option to compare two approaches.
I'm using Twisted's adbapi to asynchronously write items to a SQL db in the item pipeline. What happens if I insert a "time.sleep(1000)" in a runInteraction of the Twisted dbapi?
Does Twisted just see the code is blocking and jump to doing something else until the block stops? (i.e. I can do any blocking thing I want within runInteraction), or have I just made my code blocking?
runInteraction runs in a thread and does not block the main reactor thread.
I have a requirement where I need to make sure only one message is being processed at a time by a mule flow.Flow is triggered by a quartz scheduler which reads one file from FTP server every time
My proposed solution is to keep a global variable "FLOW_STATUS" which will be set to "RUNNING" when a message is received and would be reset to "STOPPED" once the processing of message is done.
Any messages fed to the flow will check for this variable and abort if "FLOW_STATUS" is "RUNNING".
This setup seems to be working , but I was wondering if there is a better way to do it.
Is there any best practices around this or any inbuilt mule helper functions to achieve the same instead of relying on global variables
It seems like a more simple solution would be to set the maxActiveThreads for the flow to 1. In Mule, each message processed gets it's own thread. So setting the maxActiveThreads to 1 would effectively make your flow singled threaded. Other pending requests will wait in the receiver threads. You will need to make sure your receiver thread pool is large enough to accommodate all of the potential waiting threads. That may mean throttling back your quartz scheduler to allow time process the files so the receiver thread pool doesn't fill up. For more information on the thread pools and how to tune performance, here is a good link: http://www.mulesoft.org/documentation/display/current/Tuning+Performance
I'm using Django Celery with Redis to run a few tasks like this:
header = [
tasks.invalidate_user.subtask(args = (user)),
tasks.invalidate_details.subtask(args = (user))
]
callback = tasks.rebuild.subtask()
chord(header)(callback)
So basically the same as stated in documentation.
My problem is, that when this task chord is called, celery.chord_unlock task keeps retrying forever. Tasks in header finish successfully, but because of chord_unlock never being done, callback is never called.
Guessing that my problem is with not being able to detect that the tasks from header are finished, I turned to documentation to look how can this be customized. I've found a section, describing how the synchronization is implemented, there is an example provided, what I'm missing is how do I get that example function to be called (i.e. is there a signal for this?).
Further there's a note that this method is not used with Redis backend:
This is used by all result backends except Redis and Memcached, which increment a counter after each task in the header, then applying the callback when the counter exceeds the number of tasks in the set.
But also says, that Redis approach is better:
The Redis and Memcached approach is a much better solution
What approach is that? How is it implemented?
So, why is chord_unlock never done and how can I make it detect finished header tasks?
I'm using: Django 1.4, celery 2.5.3, django-celery 2.5.5, redis 2.4.12
You don't have an example of your tasks, but I had the same problem and my solution might apply.
I had ignore_result=True on the tasks that I was adding to a chord, defined like so:
#task(ignore_result=True)
Apparently ignoring the result makes it so that the chord_unlock task doesn't know they're complete. After I removed ignore_result (even if the task only returns true) the chord called the callback properly.
I had the same error, I changed the broker to rabbitmq and chord_unlock is working until my task finishes (2-3 minutes tasks)
when using redis the task finishes and chord_unlock only retried like 8-10 times every 1s, so callback was not executing correctly.
[2012-08-24 16:31:05,804: INFO/MainProcess] Task celery.chord_unlock[5a46e8ac-de40-484f-8dc1-7cf01693df7a] retry: Retry in 1s
[2012-08-24 16:31:06,817: INFO/MainProcess] Got task from broker: celery.chord_unlock[5a46e8ac-de40-484f-8dc1-7cf01693df7a] eta:[2012-08-24 16:31:07.815719-05:00]
... just like 8-10 times....
changing broker worked for me, now I am testing #Chris solution and my callback function never receives the results from the header subtasks :S, so, it does not works for me.
celery==3.0.6
django==1.4
django-celery==3.0.6
redis==2.6
broker: redis-2.4.16 on Mac OS X
This could cause a problem such that; From the documentation;
Note:
If you are using chords with the Redis result backend and also overriding the Task.after_return() method, you need to make sure to call the super method or else the chord callback will not be applied.
def after_return(self, *args, **kwargs):
do_something()
super(MyTask, self).after_return(*args, **kwargs)
As my understanding, If you have overwritten after_return function in your task, it must be removed or at least calling super one.
Bottom of the topic:http://celery.readthedocs.org/en/latest/userguide/canvas.html#important-notes