SQL Database transactions and PRAM/FIFO consistency - sql

Do databases provide PRAM consistency ( http://en.wikipedia.org/wiki/PRAM_consistency ) for transactions for multiple clients.
Example:
Assume we have two tables X and Y each with single record and single column Value of type int, initially set to 0.
Two clients connect to the database.
Client 1 does
BEGIN TRAN
UPDATE X SET Value = 1
COMMIT
BEGIN TRAN
UPDATE Y SET Value = 1
COMMIT
Client 2 does
SELECT TOP 1 Value FROM Y // statement 1
SELECT TOP 1 Value FROM X // statement 2
Let's assume, that statement 1 yielded value 1 from table Y. Is it guaranteed by RDBMS (lets say MS SQL Server), that under that condition statement 2 will always yield 1 form table X?
In other words, will other clients always see transactions committed by some client in the same order, in which that client committed them?
More general question: exactly what type of consistency is guaranteed by RDBMSes, if it is not PRAM?

Do databases provide PRAM consistency (
http://en.wikipedia.org/wiki/PRAM_consistency ) for transactions for
multiple clients.
Kind of. The direct consistency model appears to be atomic. (But see below.) SQL databases have broader concerns than just what happens in RAM.
exactly what type of consistency is guaranteed by RDBMSes, if it is not PRAM?
ACID: atomic, consistent, isolated, and durable.
Consistent here doesn't mean the same thing that consistent means in concurrency models. Here consistent means that a transaction changes a database from one valid state to another valid state. Neither successful nor failed transactions can leave the database in an invalid state.
For example, if you have a constraint that says email addresses in "users"."email_address" must be unique, a successful transaction can't write a duplicate email address into that column. Neither can an unsuccessful transaction.
And neither can a transaction that was still executing when somebody killed the database server.
Transaction isolation is the ACID concept closest in behavior to PRAM consistency, I think. A SQL dbms typically has several options for setting the isolation level of a transaction.
SQL Server's options

Well, a simple answer is that the SERIALIZABLE isolation level implies total consistency. So yes, under that model a compliant RDBMS will provide PRAM guarantees.
Your use of transactions is confusing in the question. It seems that a "client" is to you what a "transaction" is to the PRAM model/Wikipedia page. The explicit transactions in the question seem to serve no purpose.
For SQL Server, the answer is yes. Statement 2 will always read 1 provided that statement 1 read one. This is true for all isolation levels.
The SQL Standard, however, does not mandate this! READ COMMITTED can be satisfied by reading committed data from wildly varying points in time, for example. This is not causally consistent.

Related

Is it possible to lock on a value of a column in SQL Server?

I have a table that looks like that:
Id GroupId
1 G1
2 G1
3 G2
4 G2
5 G2
It should at any time be possible to read all of the rows (committed only). When there will be an update I want to have a transaction that will lock on group id, i.e. there should at any given time be only one transaction that attempts to update per GroupId.
It should ideally be still possible to read all committed rows (i.e. other transaction/ordinary reads that will not try to acquire the "update per group lock" should be still able to read).
The reason I want to do this is that an update can not rely on "outdated" data. I.e. I do make some calculations in a transaction and another transaction cannot edit row with id 1 or add a new row with the same GroupId after these rows were read by the first transaction (even though the first transaction would never modify the row itself it will be dependent on it's value).
Another "nice to have" requirement is that sometimes I would need the same requirement "cross group", i.e. the update transaction would have to lock 2 groups at the same time. (This is not a dynamic number of groups, but rather just 2)
Here are some ideas. I don't think any of them are perfect - I think you will need to give yourself a set of use-cases and try them. Some of the situations I tried after applying locks
SELECTs with the WHERE filter as another group
SELECTs with the WHERE filter as the locked group
UPDATES on the table with the WHERE clause as another group
UPDATEs on the table where ID (not GrpID!) was not locked
UPDATEs on the table where the row was locked (e.g., IDs 1 and 2)
INSERTs into the table with that GrpId
I have the funny feeling that none of these will be 100%, but the most likely answer is the second one (setting the transaction isolation level). It will probably lock more than desired, but will give you the isolation you need.
Also one thing to remember: if you lock many rows (e.g., there are thousands of rows with the GrpId you want) then SQL Server can escalate the lock to be a full-table lock. (I believe the tipping point is 5000 locks, but not sure).
Old-school hackjob
At the start of your transaction, update all the relevant rows somehow e.g.,
BEGIN TRAN
UPDATE YourTable
SET GrpId = GrpId
WHERE GrpId = N'G1';
-- Do other stuff
COMMIT TRAN;
Nothing else can use them because (bravo!) they are a write within a transaction.
Convenient - set isolation level
See https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver15#isolation-levels-in-the-
Before your transaction, set the isolation level high e.g., SERIALIZABLE.
You may want to read all the relevant rows at the start of your transaction (e.g., SELECT Grp FROM YourTable WHERE Grp = N'Grp1') to lock them from being updated.
Flexible but requires a lot of coding
Use resource locking with sp_getapplock and sp_releaseapplock.
These are used to lock resources, not tables or rows.
What is a resource? Well, anything you want it to be. In this case, I'd suggest 'Grp1', 'Grp2' etc. It doesn't actually lock rows. Instead, you ask (via sp_getapplock, or APPLOCK_TEST) whether you can get the resource lock. If so, continue. If not, then stop.
Anything code referring to these tables needs to be reviewed and potentially modified to ask if it's allowed to run or not. If something doesn't ask for permission and just does it, there's no actual real locks stopping it (except via any transactions you've explicity specified).
You also need to ensure that errors are handled appropriately (e.g., still releasing the app_lock) and that processes that are blocked are re-tried.

How is Oracle ACID compliant when it does not honour 'isolation' property fully?

Claim: Oracle does not honour isolation property in ACID properties.
As per Wikipedia page on ACID
"Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."
This can happen only if the transactions are serializable. Yes, Oracle has a transaction level called Serializable but it is not true serializability and is only snapshot isolation.
Read https://blog.dbi-services.com/oracle-serializable-is-not-serializable/
An excerpt from Wiki page of Snapshot isolation (https://en.wikipedia.org/wiki/Snapshot_isolation)
"In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle".
There are weaker isolation levels but they are not sufficient to guarantee that the sequence of transactions would lead to the outcome that would be obtained if they were executed sequentially. To guarantee it, serializability is a must.
Q1) Since Oracle does not provide it (its serializaibility is not a true one), it does not honor isolation 100 percent. How can then it be called ACID compliant?
Q2) Looks like Oracle was treated here leniently with regard to isolation. Is this leniency extended to other databases as well?
Q3) If we take an unforgiving stance and say (isolation means 100 percent isolation - no less is accepted), won't Oracle's claim of being ACID compliant fall to pieces? What about other relational databases? Will they be able to make the cut or will they fall short like Oracle?
SQL Server has SERIALIZABLE in addition to SNAPSHOT. But, at least in SQL Server, for most practical purposes SERIALIZABLE is useless, as it's too expensive, and not really effective. And you use special constructs for the few transactions that actually need to be serialized (ie run one-at-a-time).
SERIALIZABLE is to expensive because transaction ordering is accomplished by some combination eliminating concurrency, and by generating run-time failures (deadlocks). Both of which are very expensive and troublesome.
SERIALIZABLE is not really effective, because doesn't actually accomplish complete transaction isolation. To do so would require every transactions to exclusively lock all data it reads, to prevent two transactions from reading the same data, and then writing.
The classic example is where two sessions run
SELECT salary FROM emp where id = 1
and then, compute a new value based on the existing in the client, and then
UPDATE emp set salary = :newSalary
The only way to make this work right is to place an exclusive lock on the first read, so a second session can't read too.
In Oracle this is accomplished with SELECT ... FOR UPDATE, and in SQL Server with an UPDLOCK hint. Or with an explicit "application lock", Oracle's DMBS_LOCK, or SQL Server's sp_getapplock.

Generating a sequence number in a PostgreSQL database - concurrency & isolation levels

Similar question; Update SQL with consecutive numbering
I want to be able to generate a sequence number by incrementing a column num in a table called SeqNum. The SeqNum table layout;
|num|
|===|
| 0 |
The query being run;
BEGIN TRANSACTION
UPDATE SeqNum
SET num = num + 1
SELECT num from SeqNum
COMMIT TRANSACTION
My question is if I have multiple processes running this query with a READ COMMITTED isolation level at the same time, will the select clause always return a unique updated value. I'm assuming this will be consistent and no two processes would ever return the same num... Obviously this is all running in the one transaction. If it wasn't in a transaction I would expect it to potentially return duplicate values.
I'm not sure how the behavior changes (if at all) depending on the isolation level.
In PostgreSQL, you can request any of the four standard transaction isolation levels. But internally, there are only three distinct isolation levels, which correspond to the levels Read Committed, Repeatable Read, and Serializable. When you select the level Read Uncommitted you really get Read Committed ...1
In read committed isolation level dirty reads are not possible per standard, which means these transactions cannot read data written by a concurrent uncommitted transaction. That can only happen in the read uncommitted isolation level per standard (but won't happen in PostgreSQL: the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen).
In short your select clause won't return a unique value always. Neither will, if you rewrite it to UPDATE ... RETUNRING ..., but the time window will be really small, so chances will be much more lower to multiple transactions return the same value.
But lucky for you, the only thing in PostgreSQL, which doesn't affected by transactions, is the sequence:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values. 2
Because sequences are non-transactional, changes made by setval are not undone if the transaction rolls back. 2

Locking database table between query and insert

Forgive me if this is a silly question (I'm new to databases and SQL), but is it possible to lock a table, similar to the lock keyword in C#, so that I can query the database to see if a condition is met, then insert a row afterwards while ensuring the state of the table has not changed between the two actions?
In this case, I have a table transactions which has two columns: user and product. This is a many-to-one relationship; multiple users can have the same product. However, the number of products is limited.
When a user adds a product to their account, I want to first check if the total number of items with the same product value to see if it is under a certain threshold, then add the transaction afterwards. However, since this is a multithreaded application, multiple transactions can come in at the same time. I want to make sure that one of these is rejected, and one succeeds, such that the number of transactions with the same product value can never be higher than the limit.
Rough pseudo-code for what I am trying to do:
my_user, my_product = ....
my_product_count = 0
for each transaction in transactions:
if product == my_product:
my_product_count += 1
if my_product_count < LIMIT:
insert my_user, my_product into transactions
return SUCCESS
else:
return FAILURE
I am using SQLAlchemy with SQLite3, if that matters.
Not needed if you do both operations in a transaction - which is supported by databases. Databases do maintain locks to guarantee transactional integrity. In fact that is one of the four pillars of what a database does - they are called ACID guaranetees (for (Atomicity, Consistency, Isolation, Durability).
So, in your case, to ensure consistence you would make both operations in one transaction and seat the transaction parameters in such a way to block reads on the already read rows.
SQL locking is WAY more powerfull than the lock statement because, among other things, databases per definition have multiple threads (users) hitting the same data - something that is exceedingly rare in programming (where same data access is avoided in multi threaded programming as much as possible).
I suggest a good book about SQL - because you need to simply LEARN some fundamental concepts at one point, or you will make mistakes that cost money.
Transactions allow you to use multiple SQL statements atomically.
(SQLite implements transactions by locking the entire database, but the exact mechanism doesn't matter, and you might want to use another database later anyway.)
However, you don't even need to bother with explicit transactions if your desired algorithm can be handled with single SQL statement, like this:
INSERT INTO transactions(user, product)
SELECT #my_user, #my_product
WHERE (SELECT COUNT(*)
FROM transactions
WHERE product = #my_product) < #LIMIT;

Is non-atomic value SELECT specific to SQL Server or it is possible in other DBMS-es?

My answer to my question Is a half-written values reading prevented when SELECT WITH (NOLOCK) hint? cites a script illustrating catching non-atomic reads (SELECTs) of partly-updated values in SQL Server.
Is such non-atomic (partly updated, inserted, deleted) value reading problem specific to SQL Server?
Is it possible in other DBMS-es?
Update:
Not long time ago I believed that READ UNCOMMITTED transaction isolation level (also achieved through WITH(NOLOCK) hint in SQL Server) permitted reading (from other transactions) the uncommitted (or committed, if not yet changed) values but not partly modified (partly updated, partly inserted, partly deleted) values.
Update2:
The first two answers deviated the discussion to attacking READ UNCOMMITTED (isolation level ) phenomena specified by ANSI/ISO SQL-92 specifications.
This question is not about this.
Is non-atomicity of a value (not row!) is compliant with READ UNCOMMITTED and dirty read at all?
I believed that READ UNCOMMITTED did imply reading of uncommitted rows in their entirety but not partly modified values.
Does the definition of "dirty read" include possibility of value modification non-atomicity?
Is it a bug or by design?
or by ANSI SQL92 definition of "dirty read"? I believed that "dirty read" did include atomic reading uncommitted rows but non-atomically modified values...
Is it possible in other DBMS-es?
As far as I know the only other databases that allow READ UNCOMMITTED are DB2, Informix and MySQL when using a non-transactional engine.
All hell would break loose if atomic statements were in fact not atomic.
I can answer this for MSSQL - all single statements are atomic, "dirty reads" refers to the
possibility of reading a "phantom row" that might not exist after TX is committed/rolled back.
There is a difference between Atomicity and READ COMMITTED if the implementation of the latter relies on locking.
Consider transactions A and B. Transaction A is a single SELECT for all records with a status of 'pending' (perhaps a full scan on a very large table so it takes several minutes).
At 3:01 transaction A reads record R1 in the database and sees its status is 'New' so doesn't return it or lock it.
At 3:02 transaction B updates record R1 from 'New' to 'Pending' and record R2000 from 'New' to 'Pending' (single statement)
At 3:03 transaction B commits
At 3:04 transaction A reads record R2000, sees it is 'Pending' and committed and returns it (and locks it).
In this situation, the select in transaction A has only seen part of Transaction B, violating atomicity. Technically though, the select has only returned committed records.
Databases relying on locking reads suffer from this problem because the only solution would be to lock the entirety of the table(s) being read so no-one can update any records in any of them. This would make it impractical for any concurrent activity.
In practice, most OLTP applications have very quick transactions operating on very small data volumes (relative to the database size), and concurrent operations tend to hit different 'slices' of data so the situation occurs very rarely. Even if it does happen, it doesn't necessarily result in a noticeable problem and even when it does they are very hard to reproduce and fixing them would require a whole new architecture. In short, despite being a theoretical problem, in practice it often isn't worth worrying about.
That said, an architect should be aware of the potential issue, be able to assess the risk for a particular application and determine alternatives.
That's one reason why SQL Server added non-locking consistent reads in 2005.
Database theory requires that in all isolation levels, the individual UPDATE or INSERT statements are atomic. Their intermediate results should not be visible to read uncommitted transactions. This has been stated in a paper by a group of well-known database experts. http://research.microsoft.com/apps/pubs/default.aspx?id=69541
However, as read uncommitted results are not considered transactionally consistent by definition, it is possible that implementations may contain bugs that result in part-updated row sets to be returned and these bugs have not been noticed in tests because of the difficulty to determine the validity of the returned result sets.