I created a small application (Spring Boot and camunda) to process an order process. The Order-Service receives the new order via Rest and calls the Start Event of the BPMN Order workflow. The order process contains two asynchronous JMS calls (Customer check and Warehouse Stock check). If both checks return the order process should continue.
The Start event is called within a Spring Rest Controller:
ProcessInstance processInstance =
runtimeService.startProcessInstanceByKey("orderService", String.valueOf(order.getId()));
The Send Task (e.g. the customer check) sends the JMS message into a asynchronous queue.
The answer of this service is catched by a another Spring component which then trys to send an intermediate message:
runtimeService.createMessageCorrelation("msgReceiveCheckCustomerCredibility")
.processInstanceBusinessKey(response.getOrder().getBpmnBusinessKey())
.setVariable("resultOrderCheckCustomterCredibility", response)
.correlate();
I deactivated the warehouse service to see if the order process waits for the arrival of the second call, but instead I get this exception:
1115 06:33:08.564 WARN [o.c.b.e.jobexecutor] ENGINE-14006 Exception while executing job 67d2cc24-0769-11ea-933a-d89ef3425300:
org.springframework.messaging.MessageHandlingException: nested exception is org.camunda.bpm.engine.MismatchingMessageCorrelationException: ENGINE-13031 Cannot correlate a message with name 'msgReceiveCheckCustomerCredibility' to a single execution. 4 executions match the correlation keys: CorrelationSet [businessKey=1, processInstanceId=null, processDefinitionId=null, correlationKeys=null, localCorrelationKeys=null, tenantId=null, isTenantIdSet=false]
This is my process. I cannot see a way to post my bpmn file :-(
What can't it not correlate with the message name and the business key? The JMS queues are empty, there are other messages with the same businessKey waiting.
Thanks!
Just to narrow the problem: Do a runtimeService eventSubscription query before you try to correlate and check what subscriptions are actually waiting .. maybe you have a duplicate message name? Maybe you (accidentally) have another instance of the same process running? Once you identified the subscriptions, you could just notify the execution directly without using the correlation builder ...
I want to be able to delete all job in queue, but I don't know what queue is it. I'm in perform method of my worker and I need to get the "current queue", the queue where the current job is come from.
for this time I use :
require 'sidekiq/api'
queue = Sidekiq::Queue.new
queue.each do |job|
job.delete
end
because I just use "default queue", It's work.
But now I will use many queues and I can't specify only one queue for this worker because I need use a lots for a server load balancing.
So how I can get the queue where we are in perform method?
thx.
You can't by design, that's orthogonal context to the job. If your job needs to know a queue name, pass it explicitly as an argument.
This is much faster:
Sidekiq::Queue.new.clear
These docs show that you can access all running job information wwhich includes the jid (job ID) and queue name for each job
inside the perform method you have access to the jid with the jid accessor. From that you can find the current job and get the queue name
workers = Sidekiq::Workers.new
this_worker = workers.find { |_, _, work|
work['payload']['jid'] == jid
}
queue = this_worker[2]['queue']
however, the content of Sidekiq::Workers can be up to 5 seconds out of date, so you should only try this after your worker has been running at least 5 seconds, which may not be ideal
I am using celery with a rabbitmq backend. It is producing thousands of queues with 0 or 1 items in them in rabbitmq like this:
$ sudo rabbitmqctl list_queues
Listing queues ...
c2e9b4beefc7468ea7c9005009a57e1d 1
1162a89dd72840b19fbe9151c63a4eaa 0
07638a97896744a190f8131c3ba063de 0
b34f8d6d7402408c92c77ff93cdd7cf8 1
f388839917ff4afa9338ef81c28aad75 0
8b898d0c7c7e4be4aa8007b38ccc00ea 1
3fb4be51aaaa4ac097af535301084b01 1
This seems to be inefficient, but further I have observed that these queues persist long after processing is finished.
I have found the task that appears to be doing this:
#celery.task(ignore_result=True)
def write_pages(page_generator):
g = group(render_page.s(page) for page in page_generator)
res = g.apply_async()
for rendered_page in res:
print rendered_page # TODO: print to file
It seems that because these tasks are being called in a group, they are being thrown into the queue but never being released. However, I am clearly consuming the results (as I can view them being printed when I iterate through res. So, I do not understand why those tasks are persisting in the queue.
Additionally, I am wondering if the large number queues that are being created is some indication that I am doing something wrong.
Thanks for any help with this!
Celery with the AMQP backend will store task tombstones (results) in an AMQP queue named with the task ID that produced the result. These queues will persist even after the results are drained.
A couple recommendations:
Apply ignore_result=True to every task you can. Don't depend on results from other tasks.
Switch to a different backend (perhaps Redis -- it's more efficient anyway): http://docs.celeryproject.org/en/latest/userguide/tasks.html
Use CELERY_TASK_RESULT_EXPIRES (or on 4.1 CELERY_RESULT_EXPIRES) to have a periodic cleanup task remove old data from rabbitmq.
http://docs.celeryproject.org/en/master/userguide/configuration.html#std:setting-result_expires
First of all please don't consider this question as a duplicate of this question
I have a setup an environment which uses celery and redis as broker and result_backend. My question is how can I make sure that when the celery workers crash, all the scheduled tasks are re-tried, when the celery worker is back up.
I have seen advice on using CELERY_ACKS_LATE = True , so that the broker will re-drive the tasks until it get an ACK, but in my case its not working. Whenever I schedule a task its immediately goes to the worker which persists it until the scheduled time of execution. Let me give some example:
I am scheduling a task like this: res=test_task.apply_async(countdown=600) , but immediately in celery worker logs i can see something like : Got task from broker: test_task[a137c44e-b08e-4569-8677-f84070873fc0] eta:[2013-01-...] . Now when I kill the celery worker, these scheduled tasks are lost. My settings:
BROKER_URL = "redis://localhost:6379/0"
CELERY_ALWAYS_EAGER = False
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
CELERY_ACKS_LATE = True
Apparently this is how celery behaves.
When worker is abruptly killed (but dispatching process isn't), the message will be considered as 'failed' even though you have acks_late=True
Motivation (to my understanding) is that if consumer was killed by OS due to out-of-mem, there is no point in redelivering the same task.
You may see the exact issue here: https://github.com/celery/celery/issues/1628
I actually disagree with this behaviour. IMO it would make more sense not to acknowledge.
I've had the issue, where I was using some open-source C libraries that went totaly amok and crashed my worker ungraceful without throwing an exception. For any reason whatsoever, one can simply wrap the content of a task in a child process and check its status in the parent.
n = os.fork()
if n > 0: //inside the parent process
status = os.wait() //wait until child terminates
print("Signal number that killed the child process:", status[1])
if status[1] > 0: // if the signal was something other then graceful
// here one can do whatever they want, like restart or throw an Exception.
self.retry(exc=SomeException(), countdown=2 ** self.request.retries)
else: // here comes the actual task content with its respected return
return myResult // Make sure there are not returns in child and parent at the same time.
I need to design a Redis-driven scalable task scheduling system.
Requirements:
Multiple worker processes.
Many tasks, but long periods of idleness are possible.
Reasonable timing precision.
Minimal resource waste when idle.
Should use synchronous Redis API.
Should work for Redis 2.4 (i.e. no features from upcoming 2.6).
Should not use other means of RPC than Redis.
Pseudo-API: schedule_task(timestamp, task_data). Timestamp is in integer seconds.
Basic idea:
Listen for upcoming tasks on list.
Put tasks to buckets per timestamp.
Sleep until the closest timestamp.
If a new task appears with timestamp less than closest one, wake up.
Process all upcoming tasks with timestamp ≤ now, in batches (assuming
that task execution is fast).
Make sure that concurrent worker wouldn't process same tasks. At the same time, make sure that no tasks are lost if we crash while processing them.
So far I can't figure out how to fit this in Redis primitives...
Any clues?
Note that there is a similar old question: Delayed execution / scheduling with Redis? In this new question I introduce more details (most importantly, many workers). So far I was not able to figure out how to apply old answers here — thus, a new question.
Here's another solution that builds on a couple of others [1]. It uses the redis WATCH command to remove the race condition without using lua in redis 2.6.
The basic scheme is:
Use a redis zset for scheduled tasks and redis queues for ready to run tasks.
Have a dispatcher poll the zset and move tasks that are ready to run into the redis queues. You may want more than 1 dispatcher for redundancy but you probably don't need or want many.
Have as many workers as you want which do blocking pops on the redis queues.
I haven't tested it :-)
The foo job creator would do:
def schedule_task(queue, data, delay_secs):
# This calculation for run_at isn't great- it won't deal well with daylight
# savings changes, leap seconds, and other time anomalies. Improvements
# welcome :-)
run_at = time.time() + delay_secs
# If you're using redis-py's Redis class and not StrictRedis, swap run_at &
# the dict.
redis.zadd(SCHEDULED_ZSET_KEY, run_at, {'queue': queue, 'data': data})
schedule_task('foo_queue', foo_data, 60)
The dispatcher(s) would look like:
while working:
redis.watch(SCHEDULED_ZSET_KEY)
min_score = 0
max_score = time.time()
results = redis.zrangebyscore(
SCHEDULED_ZSET_KEY, min_score, max_score, start=0, num=1, withscores=False)
if results is None or len(results) == 0:
redis.unwatch()
sleep(1)
else: # len(results) == 1
redis.multi()
redis.rpush(results[0]['queue'], results[0]['data'])
redis.zrem(SCHEDULED_ZSET_KEY, results[0])
redis.exec()
The foo worker would look like:
while working:
task_data = redis.blpop('foo_queue', POP_TIMEOUT)
if task_data:
foo(task_data)
[1] This solution is based on not_a_golfer's, one at http://www.saltycrane.com/blog/2011/11/unique-python-redis-based-queue-delay/, and the redis docs for transactions.
You didn't specify the language you're using. You have at least 3 alternatives of doing this without writing a single line of code in Python at least.
Celery has an optional redis broker.
http://celeryproject.org/
resque is an extremely popular redis task queue using redis.
https://github.com/defunkt/resque
RQ is a simple and small redis based queue that aims to "take the good stuff from celery and resque" and be much simpler to work with.
http://python-rq.org/
You can at least look at their design if you can't use them.
But to answer your question - what you want can be done with redis. I've actually written more or less that in the past.
EDIT:
As for modeling what you want on redis, this is what I would do:
queuing a task with a timestamp will be done directly by the client - you put the task in a sorted set with the timestamp as the score and the task as the value (see ZADD).
A central dispatcher wakes every N seconds, checks out the first timestamps on this set, and if there are tasks ready for execution, it pushes the task to a "to be executed NOW" list. This can be done with ZREVRANGEBYSCORE on the "waiting" sorted set, getting all items with timestamp<=now, so you get all the ready items at once. pushing is done by RPUSH.
workers use BLPOP on the "to be executed NOW" list, wake when there is something to work on, and do their thing. This is safe since redis is single threaded, and no 2 workers will ever take the same task.
once finished, the workers put the result back in a response queue, which is checked by the dispatcher or another thread. You can add a "pending" bucket to avoid failures or something like that.
so the code will look something like this (this is just pseudo code):
client:
ZADD "new_tasks" <TIMESTAMP> <TASK_INFO>
dispatcher:
while working:
tasks = ZREVRANGEBYSCORE "new_tasks" <NOW> 0 #this will only take tasks with timestamp lower/equal than now
for task in tasks:
#do the delete and queue as a transaction
MULTI
RPUSH "to_be_executed" task
ZREM "new_tasks" task
EXEC
sleep(1)
I didn't add the response queue handling, but it's more or less like the worker:
worker:
while working:
task = BLPOP "to_be_executed" <TIMEOUT>
if task:
response = work_on_task(task)
RPUSH "results" response
EDit: stateless atomic dispatcher :
while working:
MULTI
ZREVRANGE "new_tasks" 0 1
ZREMRANGEBYRANK "new_tasks" 0 1
task = EXEC
#this is the only risky place - you can solve it by using Lua internall in 2.6
SADD "tmp" task
if task.timestamp <= now:
MULTI
RPUSH "to_be_executed" task
SREM "tmp" task
EXEC
else:
MULTI
ZADD "new_tasks" task.timestamp task
SREM "tmp" task
EXEC
sleep(RESOLUTION)
If you're looking for ready solution on Java. Redisson is right for you. It allows to schedule and execute tasks (with cron-expression support) in distributed way on Redisson nodes using familiar ScheduledExecutorService api and based on Redis queue.
Here is an example. First define a task using java.lang.Runnable interface. Each task can access to Redis instance via injected RedissonClient object.
public class RunnableTask implements Runnable {
#RInject
private RedissonClient redissonClient;
#Override
public void run() throws Exception {
RMap<String, Integer> map = redissonClient.getMap("myMap");
Long result = 0;
for (Integer value : map.values()) {
result += value;
}
redissonClient.getTopic("myMapTopic").publish(result);
}
}
Now it's ready to sumbit it into ScheduledExecutorService:
RScheduledExecutorService executorService = redisson.getExecutorService("myExecutor");
ScheduledFuture<?> future = executorService.schedule(new CallableTask(), 10, 20, TimeUnit.MINUTES);
future.get();
// or cancel it
future.cancel(true);
Examples with cron expressions:
executorService.schedule(new RunnableTask(), CronSchedule.of("10 0/5 * * * ?"));
executorService.schedule(new RunnableTask(), CronSchedule.dailyAtHourAndMinute(10, 5));
executorService.schedule(new RunnableTask(), CronSchedule.weeklyOnDayAndHourAndMinute(12, 4, Calendar.MONDAY, Calendar.FRIDAY));
All tasks are executed on Redisson node.
A combined approach seems plausible:
No new task timestamp may be less than current time (clamp if less). Assuming reliable NTP synch.
All tasks go to bucket-lists at keys, suffixed with task timestamp.
Additionally, all task timestamps go to a dedicated zset (key and score — timestamp itself).
New tasks are accepted from clients via separate Redis list.
Loop: Fetch oldest N expired timestamps via zrangebyscore ... limit.
BLPOP with timeout on new tasks list and lists for fetched timestamps.
If got an old task, process it. If new — add to bucket and zset.
Check if processed buckets are empty. If so — delete list and entrt from zset. Probably do not check very recently expired buckets, to safeguard against time synchronization issues. End loop.
Critique? Comments? Alternatives?
Lua
I made something similar to what's been suggested here, but optimized the sleep duration to be more precise. This solution is good if you have few inserts into the delayed task queue. Here's how I did it with a Lua script:
local laterChannel = KEYS[1]
local nowChannel = KEYS[2]
local currentTime = tonumber(KEYS[3])
local first = redis.call("zrange", laterChannel, 0, 0, "WITHSCORES")
if (#first ~= 2)
then
return "2147483647"
end
local execTime = tonumber(first[2])
local event = first[1]
if (currentTime >= execTime)
then
redis.call("zrem", laterChannel, event)
redis.call("rpush", nowChannel, event)
return "0"
else
return tostring(execTime - currentTime)
end
It uses two "channels". laterChannel is a ZSET and nowChannel is a LIST. Whenever it's time to execute a task, the event is moved from the the ZSET to the LIST. The Lua script with respond with how many MS the dispatcher should sleep until the next poll. If the ZSET is empty, sleep forever. If it's time to execute something, do not sleep(i e poll again immediately). Otherwise, sleep until it's time to execute the next task.
So what if something is added while the dispatcher is sleeping?
This solution works in conjunction with key space events. You basically need to subscribe to the key of laterChannel and whenever there is an add event, you wake up all the dispatcher so they can poll again.
Then you have another dispatcher that uses the blocking left pop on nowChannel. This means:
You can have the dispatcher across multiple instances(i e it's scaling)
The polling is atomic so you won't have any race conditions or double events
The task is executed by any of the instances that are free
There are ways to optimize this even more. For example, instead of returning "0", you fetch the next item from the zset and return the correct amount of time to sleep directly.
Expiration
If you can not use Lua scripts, you can use key space events on expired documents.
Subscribe to the channel and receive the event when Redis evicts it. Then, grab a lock. The first instance to do so will move it to a list(the "execute now" channel). Then you don't have to worry about sleeps and polling. Redis will tell you when it's time to execute something.
execute_later(timestamp, eventId, event) {
SET eventId event EXP timestamp
SET "lock:" + eventId, ""
}
subscribeToEvictions(eventId) {
var deletedCount = DEL eventId
if (deletedCount == 1) {
// move to list
}
}
This however has it own downsides. For example, if you have many nodes, all of them will receive the event and try to get the lock. But I still think it's overall less requests any anything suggested here.