After a deadlock, why should application code wait before retrying?

After a deadlock, why should application code wait before retrying? - sql

I'm wanting to write some code to detect deadlocks, and if they occur, retry whatever DB operation was attempted up to n times. I've noticed that people often add a time delay in between retries. Here's some C# code to clarify what I mean:
void RetryIfDeadlocks(Action dbOperation, int maximumRetries)
{
try
{
dbOperation();
}
catch (DeadlockException)
{
var shouldRetry = maximumRetries > 0;
if (shouldRetry)
{
Task.Delay(millisecondsDelay: 300).Wait();
RetryIfDeadlocks(dbOperation, maximumRetries - 1);
}
else
throw;
}
}
Why should such retry logic include such a time delay between retries?

Without a delay, deadlock retries could 'slam' the network/disk/database with activity until the loop stops. It's much better to place a small delay in the loop, allowing other traffic to get through first (which may in fact be necessary to resolve the deadlock), before trying again.

Waiting is not necessary to make progress. The conflicting transaction that survived was likely granted the lock that the conflicting transactions contended on.
On the other hand, the other transaction is likely still active and might be doing similar things. Another deadlock is likely. After a small delay, the other transaction(s) are probably one or are doing other things now.
A retry makes succeeding in the second try more likely. It is not required for correctness.

Related

How can I tell selenide to refesh a page in between retries without raising an error?

I want to test a feature that requires a pageload to be updated. Since selenide only checks if the element has appeared without reloading the page, it will not pick up on the updated state.
I am currently doing it like this:
element(byAttribute("type", "submit")).click()
try {
element(byText("some specific text")).shouldBe(visible)
} catch (e: com.codeborne.selenide.ex.ElementNotFound){
refresh()
element(byText("some specific text")).shouldBe(visible)
}
This is bad in multiple ways:
it catches an error, which is not intended to be caught (otherwise it would have been an exception)
it triggers side effects, like creating error-screenshots
it only retries exactly once
it waits the full first timeout, which may not even be needed. e.g. maybe it could have refreshed immediately then be successful
in case you are wondering: the feature in question is related to a cache being warm or cold. Forcing the cache to be cold at the beginning of the test would probably be possible, but it would also add a lot of complexity (e.g. interacting directly with the database), which I would like to avoid.

I'm not familiar with selenide, but I think you can fix at least several problems you listed with the following changes:
Put the try-catch block inside a while true loop so as if the element found visible break is applied, otherwise continue looping.
In case you want to avoid infinite loop add some counter there so that in case after X iterations the element was not found visible - leave it.
Decrease the timeout for waiting for the expected condition. You did not share here where you define if, so I don't know how to change that.
The first paragraph changes could be something like the following:
element(byAttribute("type", "submit")).click()
for(int i=0;i<100;i++){
try {
element(byText("some specific text")).shouldBe(visible)
break
} catch (e: com.codeborne.selenide.ex.ElementNotFound){
refresh()
element(byText("some specific text")).shouldBe(visible)
}
}

Implementing Counting Semaphore using test&set without disabling interrupts

I am trying to implement Counting semaphore using Test&Set.How can I solve the problem of deadlock in this code without disabling interrupts? Is it the case that no other solution is possible?
void wait(semaphore *s){
while(test_and_set(&lock_wait)):
s->val--;
if(s->val<0){
s->queue.enque(This_process);
block();
}
lock_wait = false;
}
void signal(semaphore *s)
{
while(test_and_set(&lock_signal));
s->val++;
if(s->val <= 0)
{
process p = s->queue.dequeue();
wakeup(p);
}
lock_signal = false;
}

Please comment your code. And one more thing what do you got as output while implementing ? Add the output here, also ,bro !
While it is true that turning interrupts off on one processor is insufficient to guarantee atomic memory access in a multiprocessor system (because, as you mention, threads on other processors can still access shared resources), we turn interrupts off for part of the multiprocessor semaphore implementation because we do not want to be descheduled while we are doing a test and set.
If a thread holding the test and set is descheduled, no other threads can do anything with the semaphore (because its count is protected by that test and set) the thread was using while it's asleep (this is not good). In order to guarantee that this doesn't happen we'll turn interrupts on our processor off while using the test and set.

How to implement prioritized lock with only compare_and_swap?

Given only compare and swap, I know how to implement a lock.
However, how do I implement a spin lock
1) multiple threads can block on it while trying to lock
2) and then the threads are un-blocked (and acquire the lock) in the order that they blocked on it?
Is it even possible? If not, what other primitives do I need?
If so, how do I do it?
Thanks!

You are going to need a list for the waiting threads. You need to add and remove items from the list in a thread safe manner. You will need to be able to sleep threads that fail to acquire the lock. You will need to be able to wake 1 thread when the lock becomes available. In linux you can accomplish the sleep and wait thing by having the thread wait on a signal.
Now there is a lazy way to do this, you might not need to care about waking threads. Here is pseudo code for our skiplist. This is what we do to add an item.
cFails = 0
while (1) {
NewState = OldState = State;
if (cFails > 3 || OldState.Lock) {
sleep(); // not too sophisticated, because they cant be awoken
cFails = 0;
continue;
}
Look for item in skiplist
return item if we found it
// to add the item to the list we need to lock it
// ABA lock uses a version number
NewState.Lock=1;
NewState.nVer++;
if (!CAS(&State,OldState, NewState)) {
++cFails;
continue;
}
// if the thread gets preempted right here, the lock is left on, and other threads
// spinning would waste their entire time slice.
// unlock
OldState = NewState;
NewState.Lock = 0;
NewState.nVer++;
CAS(&State, OldState,NewState);
}
We expect the skiplist to usually find the item and only rarely have to add it. We rarely have a race to add, even with a lot of threads. We tested this with a worst case scenario consisting of lots of threads adding and searching for millions of items to a single list. The result is we rarely saw threads fail to get the lock. So the simple approach that is high performance for the expected case works for us. There is one bad thing that can happen - a thread gets preempted holding the lock. Thats when cFails > 3 catches this and sleeps waiting threads so we don't waste their timeslices with a million useless spins. So cFails is set high enough that it detects that the owner of the lock is not active.

Keeping time using timer interrupts an embedded microcontroller

This question is about programming small microcontrollers without an OS. In particular, I'm interested in PICs at the moment, but the question is general.
I've seen several times the following pattern for keeping time:
Timer interrupt code (say the timer fires every second):
...
if (sec_counter > 0)
sec_counter--;
...
Mainline code (non-interrupt):
sec_counter = 500; // 500 seconds
while (sec_counter)
{
// .. do stuff
}
The mainline code may repeat, set the counter to various values (not just seconds) and so on.
It seems to me there's a race condition here when the assignment to sec_counter in the mainline code isn't atomic. For example, in PIC18 the assignment is translated to 4 ASM statements (loading each byte at the time and selecting the right byte from the memory bank before that). If the interrupt code comes in the middle of this, the final value may be corrupted.
Curiously, if the value assigned is less than 256, the assignment is atomic, so there's no problem.
Am I right about this problem?
What patterns do you use to implement such behavior correctly? I see several options:
Disable interrupts before each assignment to sec_counter and enable after - this isn't pretty
Don't use an interrupt, but a separate timer which is started and then polled. This is clean, but uses up a whole timer (in the previous case the 1-sec firing timer can be used for other purposes as well).
Any other ideas?

The PIC architecture is as atomic as it gets. It ensures that all read-modify-write operations to a memory file are 'atomic'. Although it takes 4-clocks to perform the entire read-modify-write, all 4-clocks are consumed in a single instruction and the next instruction uses the next 4-clock cycle. It is the way that the pipeline works. In 8-clocks, two instructions are in the pipeline.
If the value is larger than 8-bit, it becomes an issue as the PIC is an 8-bit machine and larger operands are handled in multiple instructions. That will introduce atomic issues.

You definitely need to disable the interrupt before setting the counter. Ugly as it may be, it is necessary. It is a good practice to ALWAYS disable the interrupt before configuring hardware registers or software variables affecting the ISR method. If you are writing in C, you should consider all operations as non-atomic. If you find that you have to look at the generated assembly too many times, then it may be better to abandon C and program in assembly. In my experience, this is rarely the case.
Regarding the issue discussed, this is what I suggest:
ISR:
if (countDownFlag)
{
sec_counter--;
}
and setting the counter:
// make sure the countdown isn't running
sec_counter = 500;
countDownFlag = true;
...
// Countdown finished
countDownFlag = false;
You need an extra variable and is better to wrap everything in a function:
void startCountDown(int startValue)
{
sec_counter = 500;
countDownFlag = true;
}
This way you abstract the starting method (and hide ugliness if needed). For example you can easily change it to start a hardware timer without affecting the callers of the method.

Write the value then check that it is the value required would seem to be the simplest alternative.
do {
sec_counter = value;
} while (sec_counter != value);
BTW you should make the variable volatile if using C.
If you need to read the value then you can read it twice.
do {
value = sec_counter;
} while (value != sec_counter);

Because accesses to the sec_counter variable are not atomic, there's really no way to avoid disabling interrupts before accessing this variable in your mainline code and restoring interrupt state after the access if you want deterministic behavior. This would probably be a better choice than dedicating a HW timer for this task (unless you have a surplus of timers, in which case you might as well use one).

If you download Microchip's free TCP/IP Stack there are routines in there that use a timer overflow to keep track of elapsed time. Specifically "tick.c" and "tick.h". Just copy those files over to your project.
Inside those files you can see how they do it.

It's not so curious about the less than 256 moves being atomic - moving an 8 bit value is one opcode so that's as atomic as you get.
The best solution on such a microcontroller as the PIC is to disable interrupts before you change the timer value. You can even check the value of the interrupt flag when you change the variable in the main loop and handle it if you want. Make it a function that changes the value of the variable and you could even call it from the ISR as well.

Well, what does the comparison assembly code look like?
Taken to account that it counts downwards, and that it's just a zero compare, it should be safe if it first checks the MSB, then the LSB. There could be corruption, but it doesn't really matter if it comes in the middle between 0x100 and 0xff and the corrupted compare value is 0x1ff.

The way you are using your timer now, it won't count whole seconds anyway, because you might change it in the middle of a cycle.
So, if you don't care about it. The best way, in my opinion, would be to read the value, and then just compare the difference. It takes a couple of OPs more, but has no multi-threading problems.(Since the timer has priority)
If you are more strict about the time value, I would automatically disable the timer once it counts down to 0, and clear the internal counter of the timer and activate once you need it.

Move the code portion that would be on the main() to a proper function, and have it conditionally called by the ISR.
Also, to avoid any sort of delaying or missing ticks, choose this timer ISR to be a high-prio interrupt (the PIC18 has two levels).

One approach is to have an interrupt keep a byte variable, and have something else which gets called at least once every 256 times the counter is hit; do something like:
// ub==unsigned char; ui==unsigned int; ul==unsigned long
ub now_ctr; // This one is hit by the interrupt
ub prev_ctr;
ul big_ctr;
void poll_counter(void)
{
ub delta_ctr;
delta_ctr = (ub)(now_ctr-prev_ctr);
big_ctr += delta_ctr;
prev_ctr += delta_ctr;
}
A slight variation, if you don't mind forcing the interrupt's counter to stay in sync with the LSB of your big counter:
ul big_ctr;
void poll_counter(void)
{
big_ctr += (ub)(now_ctr - big_ctr);
}

No one addressed the issue of reading multibyte hardware registers (for example a timer.
The timer could roll over and increment its second byte while you're reading it.
Say it's 0x0001ffff and you read it. You might get 0x0010ffff, or 0x00010000.
The 16 bit peripheral register is volatile to your code.
For any volatile "variables", I use the double read technique.
do {
t = timer;
} while (t != timer);

Precisely time a function call

I am using a microcontroller with a C51 core. I have a fairly timeconsuming and large subroutine that needs to be called every 500ms. An RTOS is not being used.
The way I am doing it right now is that I have an existing Timer interrupt of 10 ms. I set a flag after every 50 interrupts that is checked for being true in the main program loop. If the Flag is true the subroutine is called. The issue is that by the time the program loop comes round to servicing the flag, it is already more than 500ms,sometimes even >515 ms in case of certain code paths. The time taken is not accurately predictable.
Obviously, the subroutine cannot be called from inside the timer interrupt due to that large time it takes to execute.The subroutine takes 50ms to 89ms depending upon various conditions.
Is there a way to ensure that the subroutine is called in exactly 500ms each time?

I think you have some conflicting/not-thought-through requirements here. You say that you can't call this code from the timer ISR because it takes too long to run (implying that it is a lower-priority than something else which would be delayed), but then you are being hit by the fact that something else which should have been lower-priority is delaying it when you run it from the foreground path ('program loop').
If this work must happen at exactly 500ms, then run it from the timer routine, and deal with the fall-out from that. This is effectively what a pre-emptive RTOS would be doing anyway.
If you want it to run from the 'program loop', then you will have to make sure than nothing else which runs from that loop ever takes more than the maximum delay you can tolerate - often that means breaking your other long-running work into state-machines which can do a little bit of work per pass through the loop.

I don't think there's a way to guarantee it but this solution may provide an acceptable alternative.
Might I suggest not setting a flag but instead modifying a value?
Here's how it could work.
1/ Start a value at zero.
2/ Every 10ms interrupt, increase this value by 10 in the ISR (interrupt service routine).
3/ In the main loop, if the value is >= 500, subtract 500 from the value and do your 500ms activities.
You will have to be careful to watch for race conditions between the timer and main program in modifying the value.
This has the advantage that the function runs as close as possible to the 500ms boundaries regardless of latency or duration.
If, for some reason, your function starts 20ms late in one iteration, the value will already be 520 so your function will then set it to 20, meaning it will only wait 480ms before the next iteration.
That seems to me to be the best way to achieve what you want.
I haven't touched the 8051 for many years (assuming that's what C51 is targeting which seems a safe bet given your description) but it may have an instruction which will subtract 50 without an interrupt being possible. However, I seem to remember the architecture was pretty simple so you may have to disable or delay interrupts while it does the load/modify/store operation.
volatile int xtime = 0;
void isr_10ms(void) {
xtime += 10;
}
void loop(void) {
while (1) {
/* Do all your regular main stuff here. */
if (xtime >= 500) {
xtime -= 500;
/* Do your 500ms activity here */
}
}
}

You can also use two flags - a "pre-action" flag, and a "trigger" flag (using Mike F's as a starting point):
#define PREACTION_HOLD_TICKS (2)
#define TOTAL_WAIT_TICKS (10)
volatile unsigned char pre_action_flag;
volatile unsigned char trigger_flag;
static isr_ticks;
interrupt void timer0_isr (void) {
isr_ticks--;
if (!isr_ticks) {
isr_ticks=TOTAL_WAIT_TICKS;
trigger_flag=1;
} else {
if (isr_ticks==PREACTION_HOLD_TICKS)
preaction_flag=1;
}
}
// ...
int main(...) {
isr_ticks = TOTAL_WAIT_TICKS;
preaction_flag = 0;
tigger_flag = 0;
// ...
while (1) {
if (preaction_flag) {
preaction_flag=0;
while(!trigger_flag)
;
trigger_flag=0;
service_routine();
} else {
main_processing_routines();
}
}
}

A good option is to use an RTOS or write your own simple RTOS.
An RTOS to meet your needs will only need to do the following:
schedule periodic tasks
schedule round robin tasks
preform context switching
Your requirements are the following:
execute a periodic task every 500ms
in the extra time between execute round robin tasks ( doing non-time critical operations )
An RTOS like this will guarantee a 99.9% chance that your code will execute on time. I can't say 100% because whatever operations your do in your ISR's may interfere with the RTOS. This is a problem with 8-bit micro-controllers that can only execute one instruction at a time.
Writing an RTOS is tricky, but do-able. Here is an example of small ( 900 lines ) RTOS targeted at ATMEL's 8-bit AVR platform.
The following is the Report and Code created for the class CSC 460: Real Time Operating Systems ( at the University of Victoria ).

Would this do what you need?
#define FUDGE_MARGIN 2 //In 10ms increments
volatile unsigned int ticks = 0;
void timer_10ms_interrupt( void ) { ticks++; }
void mainloop( void )
{
unsigned int next_time = ticks+50;
while( 1 )
{
do_mainloopy_stuff();
if( ticks >= next_time-FUDGE_MARGIN )
{
while( ticks < next_time );
do_500ms_thingy();
next_time += 50;
}
}
}
NB: If you got behind with servicing your every-500ms task then this would queue them up, which may not be what you want.

One straightforward solution is to have a timer interrupt that fires off at 500ms...
If you have some flexibility in your hardware design, you can cascade the output of one timer to a second stage counter to get you a long time base. I forget, but I vaguely recall being able to cascade timers on the x51.

Ah, one more alternative for consideration -- the x51 architecture allow two levels of interrupt priorities. If you have some hardware flexibility, you can cause one of the external interrupt pins to be raised by the timer ISR at 500ms intervals, and then let the lower-level interrupt processing of your every-500ms code to occur.
Depending on your particular x51, you might be able to also generate a lower priority interrupt completely internal to your device.
See part 11.2 in this document I found on the web: http://www.esacademy.com/automation/docs/c51primer/c11.htm

Why do you have a time-critical routine that takes so long to run?
I agree with some of the others that there may be an architectural issue here.
If the purpose of having precise 500ms (or whatever) intervals is to have signal changes occuring at specific time intervals, you may be better off with a fast ISR that ouputs the new signals based on a previous calculation, and then set a flag that would cause the new calculation to run outside of the ISR.
Can you better describe what this long-running routine is doing, and what the need for the specific interval is for?
Addition based on the comments:
If you can insure that the time in the service routine is of a predictable duration, you might get away with missing the timer interrupt postings...
To take your example, if your timer interrupt is set for 10 ms periods, and you know your service routine will take 89ms, just go ahead and count up 41 timer interrupts, then do your 89 ms activity and miss eight timer interrupts (42nd to 49th).
Then, when your ISR exits (and clears the pending interrupt), the "first" interrupt of the next round of 500ms will occur about a ms later.
Given that you're "resource maxed" suggests that you have your other timer and interrupt sources also in use -- which means that relying on the main loop to be timed accurately isn't going to work, because those other interrupt sources could fire at the wrong moment.

If I'm interpretting your question correctly, you have:
a main loop
some high priority operation that needs to be run every 500ms, for a duration of up to 89ms
a 10ms timer that also performs a small number of operations.
There are three options as I see it.
The first is to use a second timer of a lower priority for your 500ms operations. You can still process your 10ms interrupt, and once complete continue servicing your 500ms timer interrupt.
Second option - doe you actually need to service your 10ms interrupt every 10ms? Is it doing anything other than time keeping? If not, and if your hardware will allow you to determine the number of 10ms ticks that have passed while processing your 500ms op's (ie. by not using the interrupts themselves), then can you start your 500ms op's within the 10ms interrupt and process the 10ms ticks that you missed when you're done.
Third option: To follow on from Justin Tanner's answer, it sounds like you could produce your own preemptive multitasking kernel to fill your requirements without too much trouble.
It sounds like all you need is two tasks - one for the main super loop and one for your 500ms task.
The code to swap between two contexts (ie. two copies of all of your registers, using different stack pointers) is very simple, and usually consists of a series of register pushes (to save the current context), a series of register pops (to restore your new context) and a return from interrupt instruction. Once your 500ms op's are complete, you restore the original context.
(I guess that strictly this is a hybrid of preemptive and cooperative multitasking, but that's not important right now)
edit:
There is a simple fourth option. Liberally pepper your main super loop with checks for whether the 500ms has elapsed, both before and after any lengthy operations.
Not exactly 500ms, but you may be able to reduce the latency to a tolerable level.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas