I have a number of celery tasks that currently use a non-blocking redis lock to ensure atomic edits of a cluster of db tables for a particular user. The timing of the tasks is unpredictable and the frequency is high but sporadic. For example some are triggered by web hooks, other by user interaction.
Right now I am using autoretry_for and retry_backoff to respond to failures to acquire the non-blocking lock. This is sub-optimal since it leaves lots of idle time once the lock has been released. I can't use a blocking lock because I need to let other tasks that don't require the lock to still run.
What I need is a way to re-run any tasks that failed to acquire the non-blocking lock as soon as possible after the lock is released. It doesn't matter if some handful of non-locking tasks are run in the meantime, I just need the tasks that failed to acquire to run reasonably soon but certainly without idle time.
How can this be done? It's like I want to add the tasks to some kind of global celery chain, but post-hoc, and then fire any waiting chains every time the non-blocking lock is released. Or something like that. Which user each tasks are crunching is calculated from the task's arguments, it is not passed in through the arguments (so key-based queues won't work).
Related
I learned that when an interrupt occurs, the process goes to the ready queue rather than going through the Blocked Queue. However, in this picture, the interrupted process has moved to the blocked queue(which is a circle with pink color). I'm confused that which case goes to the ready queue and which goes to the blocking queue.
Process management in general is much more complex than this. A task is often tied to one specific processor core. Several tasks are tied to the same processor core and each of these tasks can be blocked waiting for IO. It means that any task can be interrupted at any time by an interrupt triggered by a device controller even if the task currently running on the core had nothing to do with that specific interrupt.
The diagram is thus incomplete. It doesn't take in account the complete process lifecycle. In your diagram, the process goes on the blocked queue if it is waiting for IO (after a syscall like read()). It goes to the ready queue if it was preempted by the kernel for another process to have some time on that core.
I think people often have the misconception that each process will run all the time until completion. It cannot be that way otherwise most processes would never get time on any core. Instead, if the amount of processes is higher than the amount of cores, the kernel uses the per core local APIC's timer (local APIC is on x86-64 but you will have similar mechanisms on every architecture) to give every process tied to that core a time slice. When a certain process is scheduled for a certain core, the kernel starts the timer with its time slice. When the time slice has elapsed, the local APIC triggers an interrupt letting the kernel know that another process should be scheduled on that core. This is why a process can be preempted in the middle of its execution. The process is still considered to be ready to run. It is simply that its time slice was exhausted so the kernel decides to give some time to another process. The preempted process will be given some more timer later. Since, in human terms, the time slice of each process is very short, it gives the impression that each process is running consistently without interruption when it is not really the case. (By the way this diagram is very Linux kernel specific)
I have a question about the following diagram from Operating Systems Concepts: http://unboltingbinary.in/wp-content/uploads/2015/04/image028.jpg
This diagram seems to imply that after every I/O operation, the process is placed back on the ready queue before being sent to the CPU again. However, is it possible for a process to terminate after I/O but before being sent to the ready queue?
Suppose we have a program that computes a number and then writes it to storage. In this case, does the process really need to return to the CPU after the I/O operation? It seems to me that the process should be allowed to terminate right after I/O. That way, there would be no need for a context switch.
Once one process has successfully executed a termination request on another, the threads of the terminated process should never run again, no matter what state they were in - blocked on I/O, blocked on inter-thread comms, running on a core, sleeping, whatever - they all must be stopped immediately if running and all be put in a state where they will never run again.
Anything else would be a security issue - terminated threads should not be given execution at all, (else it may not be possible to terminate the process).
Process termination requires the cpu. Changes to kernel mode structures on process exit, returning memory resources, etc. all require the cpu.
A process simply just does not evaporate. The term you want here is process rundown - I think.
In Mike Ash's GCD article, he mentions: "Custom queues can be used as a synchronization mechanism in place of locks."
Questions:
1) How does dispatch_barrier_async work differently from dispatch_async? Doesn't dispatch_async achieve the same function as dispatch_barrier_async synchronization wise?
2) Is custom queue the only option? Can't we use main queue for synchronization purpose?
First, whether a call to submit a task to a queue is _sync or _async does not in any way affect whether the task is synchronized with other threads or tasks. It only affects whether the caller is blocked until the task completes executing or if it can continue on. The _sync stands for "synchronous" and _async stands for "asynchronous", which sound similar to but are different from "synchronized" and "unsynchronized". The former have nothing to do with thread safety, while the latter are crucial.
You can use a serial queue for synchronizing access to shared data structures. A serial queue only executes one task at a time. So, if all tasks which touch a given data structure are submitted to the same serial queue, then they will never be executing simultaneously and their accesses to the data structure will be safe.
The main queue is a serial queue, so it has this same property. However, any long-running task submitted to the main queue will block user interaction. If the tasks don't have to interact with the GUI or have a similar requirement that they run on the main thread, it's better to use a custom serial queue.
It's also possible to achieve synchronization using a custom concurrent queue if you use the barrier routines. dispatch_barrier_async() is different from dispatch_async() in that the queue temporarily become a serial queue, more or less. When the barrier task reaches the head of the queue, it is not started until all previous tasks in that queue have completed. Once they do, the barrier task is executed. Until the barrier task completes, the queue will not start any subsequent tasks that it holds.
Non-barrier tasks submitted to a concurrent queue may run simultaneously with one another, which means they are not synchronized and, if they access shared data structures, they can corrupt that data structure or get incorrect results, etc.
The barrier routines are useful for read-write synchronization. It is usually safe for multiple threads to be reading from a data structure simultaneously, so long as no thread is trying to modify (write to) the data structure at the same time. A task that modifies or writes to the data structure must not run simultaneously with either readers or other writers. This can be achieved by submitting read tasks as non-barrier tasks to a given queue and submitting write tasks as barrier tasks to that same queue.
Scenario:
We have a wcf workflow with a client that does NOT use transactionflow.
The workflow contains several sequential TransactedReceiveScopes (using content-based correlation).
The TransactedReceiveScopes contain custom db operations.
Observations:
When we run SQL profiler against the first call, we see all the custom db calls, and the SaveInstance call in the profile trace.
We've noticed that, even though the SendReply is at the very end of TransactedReceiveScope, sometimes the sendreply occurs a good 10 seconds before the transaction gets committed.
We tried changing the TimeToPersist and TimeToUnload to zero, but that had no effect. (The trace shows the SaveInstance happening immediately anyway, but rather the commit seems to be delayed).
Questions:
Are our observations correct?
At what point is the transaction committed? Is this like garbage collection - i.e. it commits some time later when it's not busy?
Is there any way to control the commit delay, or is the only way to do this to use transactionflow from the client (anc then it should all commit when the client commits, including the persist).
The TransactedReceiveScope commits the transaction when the body is completed but as all execution is done through the scheduler that could be some time later. It is not related to garbage collection and there is no real way to influence it other that to avoid a busy machine and a lot of other parallel activities that could also be in the execution queue.
Can someone tell me what the statuses mean in SQL Server's sp_who command? Why might a spid be suspended? What does it mean to be "runnable"?
Thanks!
Pretty easy to find answer online. Link
dormant. SQL Server is resetting the session.
running. The session is running one or more batches. When Multiple Active Result Sets (MARS) is enabled, a session can run multiple batches. For more information, see Using Multiple Active Result Sets (MARS).
background. The session is running a background task, such as deadlock detection.
rollback. The session has a transaction rollback in process.
pending. The session is waiting for a worker thread to become available.
runnable. The session's task is in the runnable queue of a scheduler while waiting to get a time quantum.
spinloop. The session's task is waiting for a spinlock to become free.
suspended. The session is waiting for an event, such as I/O, to complete.
I believe that part of the confusion on this is that there are statuses outside of the list shown above that are seen. Three that come to mind are
Sleeping
Awaiting Command
Other