Using XLOCK In SELECT Statements - sql

Is using XLOCK (Exclusive Lock) in SELECT statements considered bad practice?
Let's assume the simple scenario where a customer's account balance is $40. Two concurrent $20 puchase requests arrive. Transaction includes:
Read balance
If customer has enough money, deduct the price of the product from the balance
So without XLOCK:
T1(Transaction1) reads $40.
T2 reads $40.
T1 updates it to $20.
T2 updates it to $20.
But there should be $0 left in the account.
Is there a way to prevent this without the use of XLOCK? What are the alternatives?

When you perform an update, you should update directly into the data item to prevent these issues. One safe way to do this is demonstrated in the sample code below:
CREATE TABLE #CustomerBalance (CustID int not null, Balance decimal(9,2) not null)
INSERT INTO #CustomerBalance Values (1, 40.00)
DECLARE #TransactionAmount decimal(9,2) = 19.00
DECLARE #RemainingBalance decimal(9,2)
UPDATE #CustomerBalance
SET #RemainingBalance = Balance - #TransactionAmount,
Balance = #RemainingBalance
SELECT #RemainingBalance
(No column name)
21.00
One advantage of this method is that the row is locked as soon as the UPDATE statement starts executing. If two users are updating the value "simultaneously", because of how the database works, one will start updating the data before the other. The first UPDATE will prevent the second UPDATE from manipulating the data until the first one is completed. When the second UPDATE starts processing the record, it will see the value that has been updated into the Balance by the first update.
As a side effect of this, you will want to have code that checks the balance after your update, and roll back the value if you have "overdrawn" the balance, or whatever is necessary. That is why this sample code returns the remaining balance in the variable #RemainingBalance.

Depending on how you place the queries, isolation level READ COMMITTED should do the job.
Suppose the following code to be performed:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
start transaction;
update account set balance=balance-20 where accountid = 'XY';
commit;
Assume T1 executes statement update account set balance=balance-20 where accountid = 'XY'; it will place a write lock on the record with accountid='XY'.
If now a second transaction T2 executes the same statement before T1 has committed, than the statement of T2 is blocked until T1 commits.
Afterwards, T2 continues. At the end, balance will be reduced by 40.

Your question is based on the assumption that using XLOCK is a bad practice. While it is correct that putting this hint everywhere all the time is generally not the best possible approach, there is no other way to achieve the required functionality in your particular situation.
When I encountered the same problem, I have found that the combination of XLOCK, HOLDLOCK placed in the verification select in the same transaction usually gets the job done. (I had a stored procedure that performs all necessary validations and then updated the Accounts table only if everything is fine. Yes, in a single transaction.)
However, there is one important caveat: if your database has RCSI enabled, other readers will be able to get past the lock by getting previous value from the version store. In this case, adding READCOMMITTEDLOCK turns off optimistic versioning for the row(s) in question and reverts the behaviour back to standard RC.

Related

How to establish read-only-once implement within SAP HANA?

Context: I am a long-time MSSQL developer... What I would like to know is how to implement a read-only-once select from SAP HANA.
High-level pseudo-code:
Collect request via db proc (query)
Call API with request
Store results of the request (response)
I have a table (A) that is the source of inputs to a process. Once a process has completed it will write results to another table (B).
Perhaps this is all solved if I just add a column to table A to avoid concurrent processors from selecting the same records from A?
I am wondering how to do this without adding the column to source table A.
What I have tried is a left outer join between tables A and B to get rows from A that have no corresponding rows (yet) in B. This doesn't work, or I haven't implemented such that rows are processed only 1 time by any of the processors.
I have a stored proc to handle batch selection:
/*
* getBatch.sql
*
* SYNOPSIS: Retrieve the next set of criteria to be used in a search
* request. Use left outer join between input source table
* and results table to determine the next set of inputs, and
* provide support so that concurrent processes may call this
* proc and get their inputs exclusively.
*/
alter procedure "ACOX"."getBatch" (
in in_limit int
,in in_run_group_id varchar(36)
,out ot_result table (
id bigint
,runGroupId varchar(36)
,sourceTableRefId integer
,name nvarchar(22)
,location nvarchar(13)
,regionCode nvarchar(3)
,countryCode nvarchar(3)
)
) language sqlscript sql security definer as
begin
-- insert new records:
insert into "ACOX"."search_result_v4" (
"RUN_GROUP_ID"
,"BEGIN_DATE_TS"
,"SOURCE_TABLE"
,"SOURCE_TABLE_REFID"
)
select
in_run_group_id as "RUN_GROUP_ID"
,CURRENT_TIMESTAMP as "BEGIN_DATE_TS"
,'acox.searchCriteria' as "SOURCE_TABLE"
,fp.descriptor_id as "SOURCE_TABLE_REFID"
from
acox.searchCriteria fp
left join "ACOX"."us_state_codes" st
on trim(fp.region) = trim(st.usps)
left outer join "ACOX"."search_result_v4" r
on fp.descriptor_id = r.source_table_refid
where
st.usps is not null
and r.BEGIN_DATE_TS is null
limit :in_limit;
-- select records inserted for return:
ot_result =
select
r.ID id
,r.RUN_GROUP_ID runGroupId
,fp.descriptor_id sourceTableRefId
,fp.merch_name name
,fp.Location location
,st.usps regionCode
,'USA' countryCode
from
acox.searchCriteria fp
left join "ACOX"."us_state_codes" st
on trim(fp.region) = trim(st.usps)
inner join "ACOX"."search_result_v4" r
on fp.descriptor_id = r.source_table_refid
and r.COMPLETE_DATE_TS is null
and r.RUN_GROUP_ID = in_run_group_id
where
st.usps is not null
limit :in_limit;
end;
When running 7 concurrent processors, I get a 35% overlap. That is to say that out of 5,000 input rows, the resulting row count is 6,755. Running time is about 7 mins.
Currently my solution includes adding a column to the source table. I wanted to avoid that but it seems to make a simpler implement. I will update the code shortly, but it includes an update statement prior to the insert.
Useful references:
SAP HANA Concurrency Control
Exactly-Once Semantics Are Possible: Here’s How Kafka Does It
First off: there is no "read-only-once" in any RDBMS, including MS SQL.
Literally, this would mean that a given record can only be read once and would then "disappear" for all subsequent reads. (that's effectively what a queue does, or the well-known special-case of a queue: the pipe)
I assume that that is not what you are looking for.
Instead, I believe you want to implement a processing-semantic analogous to "once-and-only-once" aka "exactly-once" message delivery. While this is impossible to achieve in potentially partitioned networks it is possible within the transaction context of databases.
This is a common requirement, e.g. with batch data loading jobs that should only load data that has not been loaded so far (i.e. the new data that was created after the last batch load job began).
Sorry for the long pre-text, but any solution for this will depend on being clear on what we want to actually achieve. I will get to an approach for that now.
The major RDBMS have long figured out that blocking readers is generally a terrible idea if the goal is to enable high transaction throughput. Consequently, HANA does not block readers - ever (ok, not ever-ever, but in the normal operation setup).
The main issue with the "exactly-once" processing requirement really is not the reading of the records, but the possibility of processing more than once or not at all.
Both of these potential issues can be addressed with the following approach:
SELECT ... FOR UPDATE ... the records that should be processed (based on e.g. unprocessed records, up to N records, even-odd-IDs, zip-code, ...). With this, the current session has an UPDATE TRANSACTION context and exclusive locks on the selected records. Other transactions can still read those records, but no other transaction can lock those records - neither for UPDATE, DELETE, nor for SELECT ... FOR UPDATE ... .
Now you do your processing - whatever this involves: merging, inserting, updating other tables, writing log-entries...
As the final step of the processing, you want to "mark" the records as processed. How exactly this is implemented, does not really matter.
One could create a processed-column in the table and set it to TRUE when records have been processed. Or one could have a separate table that contains the primary keys of the processed records (and maybe a load-job-id to keep track of multiple load jobs).
In whatever way this is implemented, this is the point in time, where this processed status needs to be captured.
COMMIT or ROLLBACK (in case something went wrong). This will COMMIT the records written to the target table, the processed-status information, and it will release the exclusive locks from the source table.
As you see, Step 1 takes care of the issue that records may be missed by selecting all wanted records that can be processed (i.e. they are not exclusively locked by any other process).
Step 3 takes care of the issue of records potentially be processed more than once by keeping track of the processed records. Obviously, this tracking has to be checked in Step 1 - both steps are interconnected, which is why I point them out explicitly. Finally, all the processing occurs within the same DB-transaction context, allowing for guaranteed COMMIT or ROLLBACK across the whole transaction. That means, that no "record marker" will ever be lost when the processing of the records was committed.
Now, why is this approach preferable to making records "un-readable"?
Because of the other processes in the system.
Maybe the source records are still read by the transaction system but never updated. This transaction system should not have to wait for the data load to finish.
Or maybe, somebody wants to do some analytics on the source data and also needs to read those records.
Or maybe you want to parallelise the data loading: it's easily possible to skip locked records and only work on the ones that are "available for update" right now. See e.g. Load balancing SQL reads while batch-processing? for that.
Ok, I guess you were hoping for something easier to consume; alas, that's my approach to this sort of requirement as I understood it.

What is the difference between inconsistent analysis and non-repeatable reads?

I've seen a lot of comparison to inconsistent analysis and dirty reads and non-repeatable reads to dirty reads but i can't seem to grasp the difference between an inconsistent (incorrect) analysis vs a non repeatable read?
Is there a better way to explain this.
My confusion is in the fact that they are both multiple reads part of a transaction where a second (or third) transaction makes updates that are committed.
Incorrect analysis - the data read by the second transaction was committed by the transaction that made the change. Inconsistent analysis involves multiple reads (two or more) of the same row and each time the information is changed by another transaction, thus producing different results each time, and hence inconsistent.
Where as
Non Repeatable Reads occur when one transaction attempts to access the same data twice and a second transaction modifies the data between the first transaction's read attempts. This may cause the first transaction to read two different values for the same data, causing the original read to be non-repeatable.
I cant quite figure out how are they different.
Thank you.
In my view, the following is an example of an inconsistent analysis, but not a non-repeatable read.
The example uses a table for bank accounts:
CREATE TABLE ACCOUNT(
NO NUMERIC(10) NOT NULL PRIMARY KEY,
BALANCE NUMERIC(9,2) NOT NULL);
INSERT INTO ACCOUNT VALUES (1, 100.00);
INSERT INTO ACCOUNT VALUES (2, 200.00);
For performance reasons, the bank has another table that stores the sum of all account balances redundantly (therefore, this table has always only one row):
CREATE TABLE TOTAL(AMOUNT NUMERIC(9,2) NOT NULL);
INSERT INTO TOTAL SELECT SUM(BALANCE) FROM ACCOUNT;
Suppose that transaction A wants to check whether the redundant sum is indeed correct. It first computes the sum of the account balances:
START TRANSACTION; -- PostgreSQL and SQL Standard, switches autocommit off
SELECT SUM(BALANCE) FROM ACCOUNT;
Now the owner of account 1 makes a deposit of 50 dollars. This is done in transaction B:
START TRANSACTION;
UPDATE ACCOUNT SET BALANCE = BALANCE + 50 WHERE NO = 1;
UPDATE TOTAL SET AMOUNT = AMOUNT + 50;
COMMIT;
Finally, transaction A continues and reads the redundant sum:
SELECT AMOUNT FROM TOTAL;
It will see the increased value, which is different from the sum it computed (probably causing a false alarm).
In this example, transaction A did not read any table row twice, therefore this cannot be a non-repeatable read. However, it did not see a unique database state - some part of the information was from the old state before the update of transaction B, and some part from the new state after the update.
But this is certainly very related to a non-repeatable read: If A had read the ACCOUNT rows again, this would be a non-repeatable read. It seems that the same internal mechanisms that prevent non-repeatable reads also prevent this problem: One could keep read rows locked until the end of the transaction, or use multi-version concurrency control with the version at the begin of the transaction.
However, there is also one nice solution here, namely to get all data in one query. At least Oracle and PostgreSQL guarantee that a single query is evaluated with respect to only one state of the database:
SELECT SUM(BALANCE) AS AMOUNT, 'SUM' AS PART FROM ACCOUNT
UNION ALL
SELECT AMOUNT, 'TOTAL' AS PART FROM TOTAL;
In a formal model of transaction schedules with reads and writes this also looks very similar to a non-repeatable read: First A does read(x), then B does write(x), write(y), and then A does read(y). If only a single object would be involved, this would be a non-repeatable read.

Avoiding deadlock when updating table

I've got a 3-tier app and have data cached on a client side, so I need to know when data changed on the server to keep this cache in sync.
So I added a "lastmodification" field in the tables, and update this field when a data change. But some 'parent' lastmodification rows must be updated in case child rows (using FK) are modified.
Fetching the MAX(lastmodification) from the main table, and MAX from a related table, and then MAX of these several values was working but was a bit slow.
I mean:
MAX(MAX(MAIN_TABLE), MAX(CHILD1_TABLE), MAX(CHILD2_TABLE))
So I switched and added a trigger to this table so that it update a field in a TBL_METADATA table:
CREATE TABLE [TABLE_METADATA](
[TABLE_NAME] [nvarchar](250) NOT NULL,
[TABLE_LAST_MODIFICATION] [datetime] NOT NULL
Now related table can update the 'main' table last modification time by just also updating the last modification in the metadata table.
Fetching the lastmodification is now fast
But ... now I've random deadlock related to updating this table.
This is due to 2 transactions modifying the TABLE_METADATA at a different step, and then locking each other.
My question: Do you see a way to keep this lastmodification update without locking the row?
In my case I really don't care if:
The lastmodification stay updated even if the transaction is rollback
The 'dirty' lastmodification (updated but not yet committed) is
overwritten by a new value
In fact, I really don't need these update to be in the transaction, but as they are executed by the trigger it's automatically in the current transaction.
Thank you for any help
As far as I know, you cannot prevent a U-lock. However, you could try reducing the number of locks to a minimum by using with (rowlock).
This will tell the query optimiser to lock rows one by one as they are updated, rather than to use a page or table lock.
You can also use with (nolock) on tables which are joined to the table which is being updated. An alternative to this would be to use set transaction isolation level read uncommitted.
Be careful using this method though, as you can potentially create corrupted data.
For example:
update mt with (rowlock)
set SomeColumn = Something
from MyTable mt
inner join AnotherTable at with (nolock)
on mt.mtId = at.atId
You can also add with (rowlock) and with (nolock)/set transaction isolation level read uncommitted to other database objects which often read and write the same table, to further reduce the likelihood of a deadlock occurring.
If deadlocks are still occurring, you can reduce read locking on the target table by self joining like this:
update mt with (rowlock)
set SomeColumn = Something
from MyTable mt
where mt.Id in (select Id from MyTable mt2 where Column = Condition)
More documentation about table hints can be found here.

MS SQL table hints and locking, parallelism

Here's the situation:
MS SQL 2008 database with table that is updated approximately once a minute.
The table structure is similar to following:
[docID], [warehouseID], [docDate], [docNum], [partID], [partQty]
Typical working cycle:
User starts data exchange from in-house developed system:
BEGIN TRANSACTION
SELECT * FROM t1
WHERE [docDate] BETWEEN &DateStart AND &DateEnd
AND [warehouseID] IN ('w1','w2','w3')
...then system performs rather long processing of the data selected, generates the list of [docID]s to delete from t1, then goes
DELETE FROM t1 WHERE [docID] IN ('d1','d2','d3',...,'dN')
COMMIT TRANSACTION
Here, the problem is that while 1st transaction processes selected the data, another reads it too and then they together populate the same data in in-house system.
At first, I inserted (TABLOCKX) table hint into SELECT query. And it worked pretty well until users started to complain about system's performance.
Then I changed hints to (ROWLOCK, XLOCK, HOLDLOCK), assuming that it would:
exclusively lock...
selected rows (instead of whole table)...
until the end of transaction
But this seems making a whole table lock anyway. I have no access to database itself, so I can't just analyze these locks (actually, I have no idea yet how to do it, even if I had access)
What I would like to have as a result:
users are able to process data related with different warehouses and dates in parallel
as a result of 1., avoid duplication of downloaded data
Except locks, other solutions I have are (although they both seem clumsy):
Implement a flag in t1, showing that the data is under processing (and then do 'SELECT ... WHERE NOT [flag]')
Divide t1 into two parts: header and details, and apply locks separately.
I beleive that I might misunderstood some concepts with regards to transaction isolation levels and/or table hints and there is another (better) way.
Please, advise!
You may change a concept of workflow.
Instead of deleting records update them with setting extra field Deprecated from 0 to 1.
And read data not from the table but from the view where Deprecated = 0.
BEGIN TRANSACTION
SELECT * FROM vT1
WHERE [docDate] BETWEEN &DateStart AND &DateEnd
AND [warehouseID] IN ('w1','w2','w3')
where vT1 view looks like this:
select *
from t1
where Deprecated = 0
And the deletion will look like this:
UPDATE t1 SET Deprecated = 1 WHERE [docID] IN ('d1','d2','d3',...,'dN')
COMMIT TRANSACTION
Using such a concept you will achieve two goals:
decrease probability of locks
get history of movings on warehouses

How to exclude one statement from current Sql transaction for Sql id generator?

I would like to implement id generator to be able to have unique records identification for multiple tables and be able to assign id to structures of new records formed on client side.
Usually obvious and standard answer is Guid, but I want to use int because of space efficiency and human readability.
It's ok to have gaps in id sequence - which will happen with unfinished transactions, lost client connections and so on.
For implementation I would have a table Counters with field NextId int and increment that counter any time id is requested. I may increment that id by more than 1 when I need range of ids for multiple or bulk inserts.
To avoid locking bottlenecks when updating Counters table I need to make id requests atomic and outside of any other transactions. So my question is how to do that ?
It's not a problem on application level - it can make one atomic transaction request to get pool of ids and then use those ids in another bigger transaction to insert records.
But what do I do if I want to get new ids inside Stored Procedure or Trigger ?
If I wrap that update Counter set NextId=NextId+1 table request into nested transaction begin tran ... commit tran it's not going to exclude it from locking until outer big transaction ends.
Is there any way to exclude that one Sql statement from current transaction so that locking ends right when statement ends and it does not participate in rollback if outer transaction is rolled back.
You need to use a second connection. You cannot have multiple transactions at once per connection.