Multiversion Timestamp-based concurrency control - optimistic-concurrency

In timestamp based concurrency control why do you have to reject write in transaction T_i on element x_k if some transaction with T_j where j > i already read it.
As stated in document.
If T_j is not planing to do any update at all why is it necessary to be so restrictive on T_i's actions ?

Assume that T_i occurs first and T_j goes on second. Assume T_i also writes to x. The second read of t_j should fail due to T_i already using the value of x. T_i is younger than T_j and if T_j uses the last committed version of x, it shall cause a stale value being used if T_i writes to x.
You need to abort the writing transaction t_j during a read, write or at commit time due to the the potential for a stale value being used. If the writing transaction didn't abort, and someone else read and used the old value, the database is not serializable. As you would get a different result if you ran the transactions in a different order. This is what the text quoted means by timestamp order.
Any two reads of the same value at the same time is dangerous as it causes a not accurate view of the database, it reveals a non-serializable order. If three transactions are running and all use x, then the serializable order is undefined. You need to enforce one read of x at a time, and this forces the transactions to be single file and see the last transaction's x. So t_i then t_j, then t_k in order, finishing before the next one starts.
Think what could happen even if t_j were not to write, it would use a value that technically doesn't exist in the datbase that is stale, it would have ignored the outcome of t_i if t_i wrote.
If three transactions all read x and don't write x, then it is safe to run them at the same time. You would need to know in advance that all three transactions don't write to x.
As in the whitepaper Serializable Snapshot Isolation attests, the dangerous structure is two read-write dependencies. But a read-write x followed by a read x is dangerous also due to the value being stale if both transactions run at the same time, it needs to be serializable, so you abort the second read x as there is a younger transaction using x.
I wrote a multiversion concurrency implementation in a simulation. See the simulation runner. My simulation simulates 100 threads all trying to read and write two numbers, A and B. They want to increment the number by 1. We set A to 1 and B to 2 at the beginning of the simulation.
The desired outcome is that A and B should be set to 101 and 102 at the end of the simulation. This can only happen if there is locking or serialization due to multiversion concurrency control. If you didn't have concurrency control or locking, this number will be less than 101 and 102 due to data races.
When a thread reads A or B we iterate over versions of key A or B to see if there is a version that is <= transaction.getTimestamp() and committed.get(key) == that version. If successful, it sets the read timestamp of that value as the transaction that last read that value. rts.put("A", transaction)
At commit time, we check that the rts.get("A").getTimestamp() != committingTransaction.getTimestamp(). If this check is true, we abort the transaction and try again.
We also check if someone committed since the transaction began - we don't want to overwrite their commit.
We also check for each write that the other writing transaction is younger than us then we abort. The if statement is in a method called shouldRestart and this is called on reads and at commit time and on all transactions that touched a value.
public boolean shouldRestart(Transaction transaction, Transaction peek) {
boolean defeated = (((peek.getTimestamp() < transaction.getTimestamp() ||
(transaction.getNumberOfAttempts() < peek.getNumberOfAttempts())) && peek.getPrecommit()) ||
peek.getPrecommit() && (peek.getTimestamp() > transaction.getTimestamp() ||
(peek.getNumberOfAttempts() > transaction.getNumberOfAttempts() && peek.getPrecommit())
&& !peek.getRestart()));
return defeated;
}
see the code here The or && peek.getPrecommit() means that a younger transaction can abort if a later transaction gets ahead and the later transaction hasn't been restarted (aborted) Precommit occurs at the beginning of a commit.
During a read of a key we check the RTS to see if it is lower than the reading than our transaction. If so, we abort the transaction and restart - someone is ahead of us in the queue and they need to commit.
On average, the system reaches 101 and 102 after around < 300 transaction aborts. With many runs finishing well below 200 attempts.
EDIT: I changed the formula for calculating which transactions wins. So if another transactions is younger or the other transactions has a higher number of attempts, the current transactions aborts. This reduces the number of attempts.
EDIT: the reason there was high abort counts was that a committing thread would be starved by reading threads that would abort restart due to the committing thread. I added a Thread.yield when a read fails due to an ahead transaction, this reduces restart counts to <200.

Related

Why won't a signal be updated instantly in process statement? VHDL

In VHDL, you need to use a variable in a process statement for it to be updated instantaneously. The use of a signal can be done, but it won't be updated immediately. Repeating my question from above: Why won't a signal be updated instantly in process statement?
The short answer is the VHDL execution model. The VHDL execution model does a simulation cycle in two separate steps: Update and then Execution. A limited perspective (there are many details that I have abstracted away) of the a simulation cycle is:
During the update phase all signals that will be updated during a simulation cycle are.
When signals are updated and change, processes that are sensitive to a signal changing are marked for execution.
Execute statements and processes that were marked for execution until they encounter either a wait statement or loop back to the process sensitivity list (and the process has one).
Your question is why do this? It guarantees that every compliant VHDL simulator executes the same code with exactly the same number of simulation cycles and produces the exact same result.
To understand why it would be problematic if signals updated instantaneously, consider the following code:
proc1 : process(Clk)
begin
if rising_edge(Clk) then
Reg1 <= A ;
end if ;
end process ;
proc2 : process(Clk)
begin
if rising_edge(Clk) then
Reg2 <= Reg1 ;
end if ;
end process ;
In the above code, if signals updated instantaneously like variables, and the processes ran in the order proc1 then proc2, then in the simulator we would see 1 flip-flop for A to be received by Reg2. OTOH, if the processes ran in the order proc2 then proc1, then in the simulator we would see 2 flip-flops for A to be received by Reg2.
This is also why shared variables of an ordinary type in VHDL were only introduced temporarily in 1993 and removed in 2000 when a more appropriate feature was able to be introduced (shared variables of a protected type).
Because a signal is designed to behave like a physically implemented value in the hardware, it only updates in response to determined stimuli and according to the progression of time.
In VHDL, this is reflected in that a signal assignment statement does not itself update the value of a signal. Rather, it schedules a transaction to occur on that signal that, when the appointed time comes, will trigger an event on the signal to change its value (if the assignment is to a changed value).
The default scheduling for a transaction is after a delta-delay in simulation, i.e. a simulation time instant just after all concurrently executing processes at that time complete. So if I'm operating in a clocked process and I update a signal value in a process triggered by rising_edge(clk), the new value won't be accessible within that current run of the process but will update just after the rising edge of the clock, when the process is complete.
This difference exists because VHDL is a hardware description language, not a programming language. Designs therefore must take into account the realities of hardware operation - the progression of time, the need for causal stimuli, &c. Thus in a good VHDL design, any value meant to persist through time will be defined as a signal so that the design takes into account that it ought to behave like a section of hardware. Within a process, a variable can provide an intermediate value for use in a combinatorial calculation - the synthesizer will determine whatever logic is necessary to accomplish that job, but the variable as a language element is a tool for calculations, not a way to define persistent values. Of course, variable abuse is possible and does exist... :^)

Can timestamp be used in synchronization of processes having Race Condition?

I am wondering whether timestamp can be used to solve process synchronization problem, when race condition occurs? Below is an algorithm for entry as well as exit sections for every process who wants to enter in critical section. Entry section uses FCFS (First Come First Serve) technique to give access to critical section.
interested[N] is shared array of N integers where N is number of processes.
// This section executed when process enters critical section.
entry_section (int processNumber) {
interested [processNumber] = getCurrentTimeStamp (); // Gets the current timestamp.
index = findOldestProcessNumber (interested); // Find the process number with least timestamp.
while (index != processNumber);
}
// This section executed when process leaves critical section.
exit_section (int processNumber) {
interested [processNumber] = NULL;
}
According to me, this algorithm satisfies all conditions for synchronization, i.e., Mutual Exclusion, Progress, Bounded waiting and Portability. So, Am I correct?
Thanks for giving your time.
Short and sweet, here are the two issues with this approach.
All your processes are busy-waiting. This means that even though the process cannot enter the critical section, it still cannot rest. Which means that the os scheduler needs to constantly keep scheduling all interested processes even though they're not producing a meaningful output. This hurts performance and power consumption.
This is the big one. There is no guarantee that two processes will not have the same timestamp. It may be unlikely, but likelihood is not what you're looking for when you want to guarantee mutual exclusion to prevent a race condition.
Your code is just a sketch, but most likely it will not work in all the cases.
If there are no locks and all the functions are using non-atomic operations, there are no guarantees that the code will execute correctly. It is essentially the same as the first example here, except that you are using an array and assuming you don't need the atomicity since each process will only access its own element.
Let me try to come up with a counterexample.
Few minor clarifications.
As far as I understand the omitted portion of each process is running in a loop
while(!processExitCondition)
{
// some non-critical code
...
// your critical section as in the question
entry_section (int processNumber) {
interested [processNumber] = getCurrentTimeStamp (); // Gets the current timestamp.
index = findOldestProcessNumber (interested); // Find the process number with least timestamp.
while (index != processNumber);
}
// This section executed when process leaves critical section.
exit_section (int processNumber) {
interested [processNumber] = NULL;
}
// more non-critical code
...
}
It seems to me that the scheduling portion should be busy-waiting, constantly getting the oldest process as such:
while (findOldestProcessNumber (interested) != processNumber);
as otherwise, all your threads can immediately hang in an infinite while loop, except for the first one which will execute once and hang right after that.
Now your scheduling function findOldestProcessNumber (interested); has some finite execution time and if my assumption about the presence of a process outer loop while(!processExitCondition) correct, this execution time might happen to be slower than the execution of code inside, before or after the critical section. As a result, the completed process can get back into the interested array before while findOldestProcessNumber (interested); iterates over it and if getCurrentTimeStamp (); is low fidelity (say seconds) you can get two processes entering critical section at once. Imagine adding a long sleep into findOldestProcessNumber (interested); and it will be easier to see how that might happen.
You can say it is an artificial example, but the point is that there are no guarantees of how the processes will interleave with each other, so your synchronization relies on the assumption that certain portions of the code execute certain time "large" or "small" enough. This is just an attempt to fake an atomic operation using those assumptions.
You can come up with the counter-ideas to make it work. Say you can implement getCurrentTimeStamp () to return a unique timestamp for each caller. Either a simple atomic counter with the hardware guarantee that only one process can increment it or by internally using an atomic lock (mutex), it's own critical section and busy waiting for this lock to provide each caller process a distinct system clock value if you wish to have it as a real-time. But with a separate findOldestProcessNumber (interested) call I find it hard to think of a way to make it guaranteed. I can't claim it is impossible, but the more complicated it gets the more likely you are just hiding absence of the Mutual Exclusion guarantee.
So the simplest solution with a lock (mutex) is the best. In your code snippet add a mutex around your critical section with your current entry and exit code used only for the scheduling on the first-come-first-serve basis and mutex with giving you mutual exclusion guarantee.
If you want a lockless solution you can use Boost lockfree queue or implement a lockless ring buffer to push the processes number the queue in entry_section and then wait for it to get its turn, although performance can be worse as contr-intuitive as it seems.

How to discard / ignore one of a stored procedure's many return values

I have a stored procedure that returns 2 values.
In another procedure, I call this (edit: NOT selectable) procedure but only need one of the two returned values.
Is there a way to discard the other value? I'm wondering what is a good practice, and hoping for a small performance gain.
Here is how I call the procedure without error:
CREATE or ALTER procedure my_proc1
as
declare variable v_out1 integer default null;
declare variable v_out2 varchar(10) default null;
begin
execute procedure my_proc2('my_param')
returning_values :v_out1, :v_out2;
end;
That is the only way I found to call this procedure without getting a -607 error 'unsuccessful metadata update request depth exceeded. (Recursive definition?)' whenever I use only one variable v_out1.
So my actual question is: can I avoid creating a v_out2 variable for nothing, as I will never use it (that value is only used in other procedures which also call my_proc2)?
Edit: the stored procedure my_proc2 is actually not selectable. But I made it selectable after all.
Because your stored procedure is selectable, you should call it by SELECT statement, ie
select out1, out2 from my_proc2('my_param')
and in that case you can indeed omit some of the return value(s). However, I wouldn't expect noticeable performance gain as the logic inside the SP which calculates the omitted field is still executed.
If your procedure is not selectable, then creating a wrapper SP is the only way, but again, it woulnd't give any performance gain as the code which does the hard work inside the original SP is still executed.
The answer is made to use text formatting while demonstrating "race conditions" in the multithreading programming (which SQL is) when [ab]using out-of-transaction objects (SQL sequences aka Firebird Generators).
So, the "use case".
Initial condition: table is empty, generator=0.
You start two concurrent transactions, A and B. For ease of imagining you may think those transactions were started from concurrent connections made by two persons working with your program on two networked computers. Though actually it does not matter much, if you open them transactions from one same connection - the scenario would not change a bit. Just for the ease of imagining.
The Tx.A issues UPDATE-OR-INSERT which inserts new row into the table. Doing so it up-ticks the generator. The transaction is not committed yet. Database condition: the table has one invisible (non-committed) row with auto_id=1, the generator = 1.
The Tx.B issues UPDATE-OR-INSERT too which inserts yet another row into the table. Doing so it also up-ticks the generator. The transaction maybe commits now, or maybe later, irrelevant. Database condition: the table has two rows (one or both are invisible (non-committed)) with auto_id=1 and auto_id=2, the generator = 2.
The Tx.A meets some error, throws the exception, DOWNTICKS the generator and rolls back. Database condition: the table has one row with auto_id=2 the generator = 1.
If Tx.B was not committed before, it is committed now. (this "if" just to demonstrate that it does not matter when other transactions would be committed, earlier or later, it only matters that Tx.A downticks the generator after any other transaction upticked it)
So, the final database condition: the table has one committed=visible row with auto_id=2 and the generator = 1.
Any next attempt to add yet one more row would try to up the generator 1+1=2 and then fail to insert new row with PK violation, then it would down the generator to 1 to recreate the faulty condition outlined above.
Your database stuck and without direct intervention by DB Administrator can not have data added further.
The very idea of rolling back the generator is defeating all intentions generators were created for and all expectations about generators behavior that the database and connection libraries and other programmers have.
You just placed a trap on the highway. It is only a matter of time until someone will be caught into it.
Even if you would continue guarding this hack by other hacks for now - wasting a lot of time and attention to do that scrupulously and pervasively - still one unlucky day in the future there would be another programmer, or even you would forget this gory details - and you would start using the generator in standard intended way - and would run into the trap.
Generators were not made to be backtracked during normal work.
existence of primary key is checked in the procedure before doing anything
Yep, that is the first reaction when multithreading programmer meets his first race condition. Let's just add more prior checks.
First few checks indeed can decrease probability of a clash, but it never can alleviate it completely. And the more use your program would see, the more transactions would get opened by more and more concurrent and active users - it is only a matter of time until this somewhat lowered probability would turn out still too much.
Think about it, SQL is about transactions, yet they had to invent and introduce explicitly out-of-transactions device Generator/Sequence is. If there was reliable solution without them - it would be just used instead of creating that so non-SQLish transaction boundary breaking tool.
When you say your SP "checks for PK violation" it is exactly the same as if you would drop the generator altogether and instead just issue "good old"
:new_id = ( select max(auto_id)+1 from MyTable );
By your description you actually do something like that, but in some indirect way. Something like
while exists( select * from MyTable where auto_id = gen_id(MyGen, +1))
do ;
:new_id = gen_id(MyGen, 0);
You may feel, that because you mentioned generators, you somehow overcame the cross-transaction invisibility problem. But you did not, because the very check "if PK was already taken" is done against in-transaction table.
That changes nothing, your two transactions Tx.A and Tx.B would not see each other's records, because they both did not committed yet. Now it only takes some unlucky Tx.C that would fail and downtick the generator to them collide on the same ID.
Or not, you do not even need Tx.C and downticking at all!
Here we bump into the multithreading idea about "atomic operations".
Let's look at it again.
while exists( select * from MyTable where auto_id = gen_id(MyGen, +1))
do ;
:new_id = gen_id(MyGen, 0);
In a single-threaded application that code is okay: you just keep running the generator up until the free slot, then you just query the value without changing it. "What could possibly go wrong?" But in multithreaded environment it is rooks waiting to be stepped over. Example:
Initial condition, table has 100 rows (auto_id goes from 1 to 100), the generator = 100.
Tx.A starts adding the row, upticks the generator in the while loop and exits the loop. It does not yet pass to the second line where local variable gets assigned. Not yet. The generator = 101, rows not added yet.
Tx.B starts adding the row, upticks the generator in the while loop and exits the loop. The generator = 102, rows not added yet.
Tx.A goes to the second line and reads gen_id(MyGen,0) into a variable for new row. While it was 101 out of the loop, it is 102 now!
Tx.B goes to the second line and reads gen_id(MyGen,0) and gets 102 too.
Tx.A and Tx.B both try to insert new row with auto_id=102
RACE CONDITIONS - both Tx.A and Tx.B try to commit their work. One of them succeeds, another fails. Which one? It is not predictable. A lucky one commits, an unlucky one fails.
The failed transaction downticks the generator.
Final condition: the table has 101 rows, the auto_id consistently goes from 1 to 100 and then skips to 102. The generator = 101, which his less than MAX(auto_id)
Now you might want to add more hacks, I mean more prior checks before actually inserting rows and committing. It will make mistakes yet less probable, right? Wrong. The more checks you do - the slower gets the code. The slower gets the code - the greater gets probability, that while one thread runs throw all them checks there happens another thread that interferes and alters the situation that was checked a moment ago.
The fundamental issue with multithreading is that any check is SEPARATE action. And between those actions the situation MAY change. Your procedure may check whatever it wants BEFORE actually inserting the row. It would not warrant much. Because when you finally gets at the row inserting statement, all the checks you did in the PAST are a matter of past. And the situation is potentially already altered. And warrants your checks were giving in the PAST only belong to that past, not to the moment at hands.
And even if you no more look for warranting sure thing, still adding every new check you can not even be sure if doing so you just decreased or increased probability of failure. Because multithreading is a bitch, it is flowing chaotically out of your control.
So, remember the KISS principle. Until proven otherwise - you most probably do not need SP2 at all, you only need one single UPDATE-OR-INSERT statement.
PS. There was a pretty fun game in my school days, it was called Pascal Robots. There are also C Robots I heard and probably implementation for other many languages. With Pascal Robots though came a number of already coded robots, demonstrating different strategies and approaches. Some of them were really thought out in very intrinsic details. And there was one robot which program was PRIMITIVE. It only had two loops: if you do not see an enemy - keep turning your radar around, if you do see an enemy - keep running to it and shooting at it. That was all. What could this idiot do against sophisticated robots having creative attack and defense strategies, flanking maneuvers, optimal distance to maintain by back and forth movements, escape tricks and more? Those sophisticated robots employed very extensive checks and very thought through hacks to be triggered by those checks. So... ...so that primitive idiot was second or maybe third best robot in the shipped set. there was only one or two smarties who could outwit it. With ALL the other robots this lean-and-fast idiot finished them before they could run through all their checks and hacks thrice. That is what multithreading does to programming. It was astonishing to watch those battles, which went so against out single-threaded intuition.

SQL MIN_ACTIVE_ROWVERSION() value does not change for a long while

We're troubleshooting a sort of Sync Framework between two SQL Server databases, in separate servers (both SQL Server 2008 Enterprise 64 bits SP2 - 10.0.4000.0), through linked server connections, and we reached to a point in which we're sort of stuck.
The logic to identify which are the records "pending to be synced" is of course based on ROWVERSION values, including the use of MIN_ACTIVE_ROWVERSION() to avoid dirty reads.
All SELECT operations are encapsulated in SPs on each "source" side. This is a schematic sample of one SP:
PROCEDURE LoaderRetrieve(#LastStamp bigint, #Rows int)
BEGIN
...
(vars handling)
...
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
Select TOP (#Rows) Field1, Field2, Field3
FROM Table
WHERE [RowVersion] > #LastStampAsRowVersionDataType
AND [RowVersion] < #MinActiveVersion
Order by [RowVersion]
END
The approach works just fine, we usually sync records with the expected rate of 600k/hour (job every 30 seconds, batch size = 5k), but at some point, the sync process does not find any single record to be transferred, even though there are several thousand of records with a ROWVERSION value greater than the #LastStamp parameter.
When checking the reason, we've found that the MIN_ACTIVE_ROWVERSION() has a value less than (or slightly greater, just 5 or 10 increments) the #LastStamp being searched. This of course shouldn't be a problem since the MIN_ACTIVE_ROWVERSION() approach was introduced to avoid dirty reads and posterior issues, BUT:
The problem we see in some occasions, during the above scenario occurs, is that the value for MIN_ACTIVE_ROWVERSION() does not change during a long (really long) period of time, like 30/40 minutes, sometimes more than one hour. And this value is by far less than the ##DBTS value.
We first thought this was related to a pending DB transaction not yet committed. As per MSDN definition about the MIN_ACTIVE_ROWVERSION() (link):
Returns the lowest active rowversion value in the current database. A rowversion value is active if it is used in a transaction that has not yet been committed.
But when checking sessions (sys.sysprocesses) with open_tran > 0 during the duration of this issue, we couldn't find any session with a waittime greater than a few seconds, only one or two occurrences of +/- 5 minutes waittime sessions.
So at this point we're struggling to understand the situation: The MIN_ACTIVE_ROWVERSION() does not change during a huge period of time, and no uncommitted transactions with long waits are found within this time frame.
I'm not a DBA and could be the case that we're missing something in the picture to analyze this problem, doing some research on forums and blogs couldn't found any other clue. So far the open_tran > 0 was the valid reason, but under the circumstances I've exposed, it's clear that there's something else and don't know why.
Any feedback is appreciated.
well, I finally find the solution after digging a bit more.
The problem is that we were looking for sessions with a long waittime, but the real deal was to find sessions which have an active batch since a while.
If there's a session where open_tran = 1, to obtain exactly since when this transaction is open (and of course still active, not yet committed), the field last_batch from sys.sysprocesses shall be checked.
Using this query:
select
batchDurationMin= DATEDIFF(second,last_batch,getutcdate())/60.0,
batchDurationSecs= DATEDIFF(second,last_batch,getutcdate()),
hostname,open_tran,* from sys.sysprocesses a
where spid > 50
and a.open_tran >0
order by last_batch asc
we could identify a session with an open tran being active 30+ minutes. And with hostname values and some more checks within the web services (and also using dbcc inputbuffer) we found the responsible process.
So, the final question actually is "there's indeed an active session with an uncommitted transaction", therefore the MIN_ACTIVE_ROWVERSION() does not change. We were just looking processes with the wrong criteria.
Now that we know which process behaves like this, next step will be to improve it.
Hope this results useful to someone else.

Can I avoid locking rows for unnecessary amounts of time [in Django]?

Take the following code for example (ignore the lack of usefulness in its functionality, as it's just a simple example to include the things I need):
#transaction.commit_on_success
def test_for_success(username)
person = Person.objects.select_for_update().get(username=username)
response = urllib2.urlopen(URL_TO_SOME_SLOW_API, some_data)
if '<YES>' in response.read():
person.successes += 1
person.save()
My question pertaining the example has to do with when the queries hit the database. Clearly the first query will lock the Person row, and then I'm calling a slow API, which could take 3 seconds to respond, causing the row to be locked for 3 seconds. Am I understanding this correctly, and in the case of slow API hits happening in my transaction, if I move the location of my queries so that a SELECT FOR UPDATE doesn't happen until after all the slow API requests, will this have the seemingly obvious effect of not locking my rows for seconds at a time (the case for select_for_update in my application is unavoidable)? Or, am I misunderstanding, and somehow none of the SQL actually hits the database until the end of the transaction?
Your assumptions about your code are correct. If you look at the select_for_update() docs, this action does lock those rows in the database until they are unlocked. This would in effect lock out for the duration of your urllib request.
If you were to move the database call into the conditional after the request, you are correct again that the database would be locked for a much shorter amount of time (though if that is called alot will still have some clients who block on the call due to contention).