I started using the repeatable jobs in bull and everything seems to be working well for me. However, I noticed that each iteration leaves a job in Redis:
"test:foo:repeat:7a140b0cf5b3ee29cb164b7c9cc03bc3:1619132310000"
"test:foo:repeat:7a140b0cf5b3ee29cb164b7c9cc03bc3:1619132280000"
"test:foo:repeat:7a140b0cf5b3ee29cb164b7c9cc03bc3:1619132360000"
and the list keeps growing. I tried to do a job.remove() within the process function, but it threw an error... can't remove repeatable jobs I guess. I'm assuming these will eventually be cleaned up by Redis, but is there something more proactive I can do to keep Redis clean?
Needed to add the 'removeOnComplete' value to the jobOptions.
You can also clean the completed jobs by
queue.clean(0, 'completed');
Refer to https://github.com/OptimalBits/bull/issues/709#issuecomment-344561983
Related
New to ignite, i have a use case, i need to run a job to clean up. I have ignite embedded in our spring boot application, for multiple instances, i am thinking have the job run on each instance, then just query the local data and clean up those. Do you see any issue with this? I am not sure how often ignite does reshuffing data?
Thanks
Shannon
You can surely do that.
With regards to data reshuffling, it will only happen when node is added or removed to cluster. However, ignite.compute().affinityRun() family of calls guarantees that code is ran near the data.
Otherwise, you could do ignite.compute().broadcast() and only iterate on each affected cache's local entries. You don't have the aforementioned guarantee then, though.
When running Django/Celery/RabbitMQ on production server, some tasks are sent and consumed correctly. However, RabbitMQ starts using up all the CPU after processing is done. I believe this is related to the following report.
RabbitMQ on EC2 Consuming Tons of CPU
In that thread, it is suggested to set these config values:
CELERY_IGNORE_RESULT
CELERY_AMQP_TASK_RESULT_EXPIRES
I forked and customized the celery-haystack package to set both those values when calling appl_async(), however it seems to have had no effect.
I think Celery is creating a large number (one per task) of uid-named queues automatically to store results. But I don't seem to be able to stop it.
Any ideas?
I just got a day of digging into this problem myself. I think the two options you meantioned can be explained like this:
CELERY_IGNORE_RESULT: if True then the results of tasks will be ignored, hence they won't return anything where you call them with delay or apply_async.
CELERY_AMQP_TASK_RESULT_EXPIRES: the expiration time for a result stored in the result backend. You can set this option to a reasonable value so RabbitMQ can delete expired results.
The many queues generated are for storing results only. So in case you don't want to store any results, you can remove CELERY_RESULT_BACKEND option from your config file.
Have a ncie day!
We are running hangfire single threaded using BackgroundJobServerOptions.WorkerCount = 1 (because we have a requiement for ordered processing).
Most of the time this is no problem, but occasionally a job gets stuck for entirely expected reasons (eg, the actual code it is running goes into an infintite loop), but because we are running single threaded this prevents other jobs in the queue from starting.
In order to try and work around this, we delete the job, but then it stays on the queue, blocking any other job from starting:
The only way I have found to resolve this is to drop and recreate the hangfire DB which is obviously not great.
Why does deleting a running job in hangfire not also remove it from the queue? Is this weird delete behavior a bug which to be fixed in a later version, or is this behavior by design because we're running single threaded?
If this is by design then how do you cancel a processing job in a way which removes it from the queue?
Well it seems that this behavior is by design.
If the IIS app pool worker is recycled, Hangfire will start processing the next task immediately. However, without this restart Hangfire will "hang" indefinitely.
An issue was raised on github about this, however it has not been solved yet:
https://github.com/HangfireIO/Hangfire/issues/80
With no way to cancel or manually "fail" a job, this makes hangfire a lot less useful in a single threaded scenario.
Update: this has been partially or fully addressed in some later version of Hangfire.
I've got one specific job that seems to hang my celery workers every so often. I'm using rabbitmq as a broker. I've tried a couple things to fix this, to no avail:
Autoscaling the workers to allow the hung ones plenty of time to finish execution
Setting a global timeout
So I've come up a little short on what's causing this problem, and how I can fix it. Can anyone give me any pointers? The task in question is simply inserting a record into the database (MongoDB in this case.)
Update: I've added CELERYD_FORCE_EXECV. We'll see if that fixes it.
Update 2: nope!
A specific job making the child processes hang is often a symptom of IO that never completes, e.g. a web request or socket read without a timeout.
Most libraries supports setting a timeout, but if not you can always use socket.setdefaulttimeout:
import socket
#task
def http_get(url, timeout=1.0, retry_after=3.0, max_retries=None):
prev_timeout = socket.getdefaulttimeout()
socket.setdefaulttimeout(timeout)
try:
return requests.get(url)
except socket.timeout:
raise http_get.retry(exc=exc, countdown=retry_after, max_retries=max_retries)
finally:
socket.setdefaulttimeout(prev_timeout)
You are most likely hitting a infinite loop bug in Celery / Kombu (see https://github.com/celery/celery/issues/3712) that only got fixed very recently. It has not gotten into a release yet. See commit https://github.com/celery/kombu/pull/760 for details. If you cannot use a repo build for your installation a work around is to either switch to Redis or set CELERY_WORKER_PREFETCH_MULTIPLIER=0 and -P solo for now.
I have a task that takes quite a long time. So I would like to let several programs/threads/computers execute the same task to speed things up. Each task requires unique ids which are stored in a db – so I thought these ids could be obtained like this:
NHibernateSession.Current.BeginTransaction(IsolationLevel.Serializable);
list = NHibernateSession.Current.CreateCriteria<RelevantId>().SetFirstResult(0).SetMaxResults(500).List<RelevantId>();
foreach (RelevantId x in list)
{
RelevantIdsRepository.Delete(x);
}
NHibernateSession.Current.Transaction.Commit();
Unfortunately, this throws an exception after a while if several processes access the database (nr of deleted objects is not the same as batch size). Why is this? The isolation level of the db should be ok shouldn’t it? Thanks.
Best wishes,
Christian
I'm not sure that I understand what you are doing here. It looks like each process should take some ids and process them but no two processes should take the same.
It doesn't work like you implemented it. All processes are reading the same ids. After committing the transaction they disappear from the database. Until then, they are visible to everyone. Isolation level only make sure that other transactions can't read them after they got deleted. But until then, they all can read them.
It's not so easy to distribute load. You could
maintain ids in a table where each process is registering itself as the executer and commits it before starting (handling conflicts, eg. StaleObjectStateException). Make sure to clean it up even when a process crashes.
write a central service which distributes ids.
The problem that it runs slow, is possibly due to the fact that you perform multiple SQL statements in a loop.
You should see if it is not possible to delete all entities in one batch-statement.