Synchronization of queries to a MariaDB database - sql

Because of some high-availability considerations, I design a system where multiple processes will communicate/synchronize via the database (most likely MariaDB, but I am open to looking into PostgreSQL and MySQL options).
One of the requirements identified is that a process must take a piece of work from the database, without letting another process take the same piece of work concurrently.
Specifically, here is the race condition I have in mind:
Process A starts a SQL transaction and runs SELECT * FROM requests WHERE ReservedTS IS NULL ORDER BY CreatedTS LIMIT 100. Here ReservedTS and CreatedTS are DATETIME columns storing the time the piece of work was created by a work submitter process and reserved by a work executor process correspondingly.
Process B starts a transaction, runs the same query and gets the same set of results.
Process A runs UPDATE requests WHERE id IN (<list of IDs selected above>) AND ReservedTS IS NULL SET ReservedTS=NOW()
Process B runs the same query, however, because its transaction has its own snapshot of the data, the ReservedTS will appear not null to Process B, so the items get reserved twice.
Process A commits the transaction.
Process B commits the transaction, overwriting the values of process A.
Could you please help to resolve the above data race?

You can easily do that by using exclusive locks:
For simplification the test table:
CREATE TABLE t1 (id int not null auto_increment primary key, reserved int);
INSERT INTO t1 VALUES (0,0), (1,0);
Process A:
BEGIN
SELECT id, reserved from t1 where id=2 and reserved=0 FOR UPDATE;
UPDATE t1 SET reserved=1 WHERE id=2 and reserved=0;
COMMIT
If Process B tries to update the same entry before Process A finished the transaction it has to wait until lock was released (or a timeout occurred):
update t1 set reserved=1 where id=2 and reserved=0;
Query OK, 0 rows affected (12.04 sec)
Rows matched: 0 Changed: 0 Warnings: 0
And as you can see, Process B didn't update anything.

Related

In sybase, how would I lock a stored procedure that is executing and alter the table that the stored procedure returns?

I have a table as follows:
id status
-- ------
1 pass
1 fail
1 pass
1 na
1 na
Also, I have a stored procedure that returns a table with top 100 records having status as 'na'. The stored procedure can be called by multiple nodes in an environment and I don't want them to fetch duplicate data. So, I want to lock the stored procedure while it is executing and set the status of the records obtained from the stored procedure to 'In Progress' and return that table and then release the lock, so that different nodes don't fetch the same data. How would I accomplish this?
There is already a solution provided for similar question in ms sql but it shows errors when using in sybase.
Assuming Sybase ASE ...
The bigger issue you'll likely want to consider is whether you want a single process to lock the entire table while you're grabbing your top 100 rows, or if you want other processes to still access the table?
Another question is whether you'd like multiple processes to concurrently pull 100 rows from the table without blocking each other?
I'm going to assume that you a) don't want to lock the entire table and b) you may want to allow multiple processes to concurrently pull rows from the table.
1 - if possible, make sure the table is using datarows locking (default is usually allpages); this will reduce the granularity of locks to the row level (as opposed to page level for allpages); the table will need to be datarows if you want to allow multiple processes to concurrently find/update rows in the table
2 - make sure the lock escalation setting on the table is high enough to ensure a single process's 100 row update doesn't lock the table (sp_setpglockpromote for allpages, sp_setrowlockpromote for datarows); the key here is to make sure your update doesn't escalate to a table-level lock!
3 - when it comes time to grab your set of 100 rows you'll want to ... inside a transaction ... update the 100 rows with a status value that's unique to your session, select the associated id's, then update the status again to 'In Progress'
The gist of the operation looks like the following:
declare #mysession varchar(10)
select #mysession = convert(varchar(10),##spid) -- replace ##spid with anything that
-- uniquely identifies your session
set rowcount 100 -- limit the update to 100 rows
begin tran get_my_rows
-- start with an update so that get exclusive access to the desired rows;
-- update the first 100 rows you find with your ##spid
update mytable
set status = #mysession -- need to distinguish your locked rows from
-- other processes; if we used 'In Progress'
-- we wouldn't be able to distinguish between
-- rows update earlier in the day or updated
-- by other/concurrent processes
from mytable readpast -- 'readpast' allows your query to skip over
-- locks held by other processes but it only
-- works for datarows tables
where status = 'na'
-- select your reserved id's and send back to the client/calling process
select id
from mytable
where status = #mysession
-- update your rows with a status of 'In Progress'
update mytable
set status = 'In Progress'
where status = #mysession
commit -- close out txn and release our locks
set rowcount 0 -- set back to default of 'unlimited' rows
Potential issues:
if your table is large and you don't have an index on status then your queries could take longer than necessary to run; by making sure lock escalation is high enough and you're using datarows locking (so the readpast works) you should see minimal blocking of other processes regardless of how long it takes to find the desired rows
with an index on the status column, consider that all of these updates are going to force a lot of index updates which is probably going to lead to some expensive deferred updates
if using datarows and your lock escalation is too low then an update could look the entire table, which would cause another (concurrent) process to readpast the table lock and find no rows to process
if using allpages you won't be able to use readpast so concurrent processes will block on your locks (ie, they won't be able to read around your lock)
if you've got an index on status, and several concurrent processes locking different rows in the table, there could be a chance for deadlocks to occur (likely in the index tree of the index on the status column) which in turn would require your client/application to be coded to expect and address deadlocks
To think about:
if the table is relatively small such that table scanning isn't a big cost, you could drop any index on the status column and this should reduce the performance overhead of deferred updates (related to updating the indexes)
if you can work with a session specific status value (eg, 'In Progress - #mysession') then you could eliminate the 2nd update statement (could come in handy if you're incurring deferred updates on an indexed status column)
if you have another column(s) in the table that you could use to uniquely identifier your session's rows (eg, last_updated_by_spid = ##spid, last_updated_date = #mydate - where #mydate is initially set to getdate()) then your first update could set the status = 'In Progress', the select would use ##spid and #mydate for the where clause, and the second update would not be needed [NOTE: This is, effectively, the same thing Gordon is trying to address with his session column.]
assuming you can work with a session specific status value, consider using something that will allow you to track, and fix, orphaned rows (eg, row status remains 'In Progress - #mysession' because the calling process died and never came back to (re)set the status)
if you can pass the id list back to the calling program as a single string of concatenated id values you could use the method I outline in this answer to append the id's into a #variable during the first update, allowing you to set status = 'In Progress' in the first update and also allowing you to eliminate the select and the second update
how would you tell which rows have been orphaned? you may want the ability to update a (small)datetime column with the getdate() of when you issued your update; then, if you would normally expect the status to be updated within, say, 5 minutes, you could have a monitoring process that looks for orphaned rows where status = 'In Progress' and its been more than, say, 10 minutes since the last update
If the datarows, readpast, lock escalation settings and/or deadlock potential is too much, and you can live with brief table-level locks on the table, you could have the process obtain an exclusive table level lock before performing the update and select statements; the exclusive lock would need to be obtained within a user-defined transaction in order to 'hold' the lock for the duration of your work; a quick example:
begin tran get_my_rows
-- request an exclusive table lock; wait until it's granted
lock table mytable in exclusive mode
update ...
select ...
update ...
commit
I'm not 100% sure how to do this in Sybase. But, the idea is the following.
First, add a new column to the table that represents the session or connection used to change the data. You will use this column to provide isolation.
Then, update the rows:
update top (100) t
set status = 'in progress',
session = #session
where status = 'na'
order by ?; -- however you define the "top" records
Then, you can return or process the 100 ids that are "in progress" for the given connection.
Create another table, proc_lock, that has one row
When control enters the stored procedure, start a transaction and do a select for update on the row in proc_lock (see this link). If that doesn't work for Sybase, then you could try the technique from this answer to lock the row.
Before the procedure exits, make sure to commit the transaction.
This will ensure that only one user can execute the proc at a time. When the second user tries to execute the proc, it will block until the first user's lock on the proc_lock row is released (e.g. when transaction is committed)

Procedure containing transaction leading to deadlock when executed in parallel in SQL Server

Current scenario: we have a T-SQL procedure which fetches unprocessed data from Table1 and dumps into Table2 with certain manipulations on each record. The entire process is being executed in a transaction and a flag is maintained to signify if a record has been processed.
A SQL Server job has been created to execute this procedure at an interval of 2 min. (another job has been created to load Table1 with fresh data at an interval of 1 min)
In case when the first time job execution requires more than 2 min, the same job gets triggered again, this is leading to a dead lock.
The transaction within the T-SQL is for each record so shouldn't the table be released after processing each record so that even the same same job is triggered again it would be able to read the unprocessed rows.
Database isolation level maintained is "Read committed snapshot"
We were under the assumption that SQL Server would allow the process to run in parallel. What are we doing wrong here?
Note: we are a bunch of naive developers experimenting with SQL Server. Let us know if we are being dumb!
Here is the gist of what the procedure does
pr_process_data
begin
select column1, column2, column3... from `table1`
fetch into cursor
open cursor
while fetch status = 0
begin loop
begin transaction
"queries to validate datatype, length of all the columns"
if validation succeeds
update `table1' valid flag = 'y'
else
update `table1` valid flag = 'n'
commit
end loop
insert into `table2` select * from `table1' where valid = 'y'
end
The number of columns may vary anywhere between 20 to 40 and the number of validation can be anywhere between 3-5
Hope the example is clear enough!

SQL Server : distributed transaction and duplicated rows

I have 1 core SQL Server and many secondary SQL Servers that transfer data to the core server.
Every secondary SQL Server has linked core server and stored procedure that runs from time to time.
This is the code from a stored procedure (some fields are deleted, but it's not improtant)
BEGIN DISTRIBUTED TRANSACTION
SELECT TOP (#ReceiptsQuantity)
MarketId, CashCheckoutId, ReceiptId, GlobalReceiptId
INTO #Receipts
FROM dbo.Receipt
WHERE Transmitted = 0
SELECT ReceiptId, Barcode, GoodId
INTO #ReceiptGoodsStrings
FROM ReceiptGoodsStrings
WHERE ReceiptGoodsStrings.ReceiptId in (SELECT ReceiptId FROM #Receipts)
INSERT INTO [SyncServer].[POSServer].[dbo].[Receipt]
SELECT * FROM #Receipts
INSERT INTO [SyncServer].[POSServer].[dbo].[ReceiptGoodsStrings]
SELECT * FROM #ReceiptGoodsStrings
UPDATE Receipt
SET Transmitted = 1
WHERE ReceiptId in (SELECT ReceiptId FROM #Receipts)
DROP TABLE #Receipts
DROP TABLE #ReceiptGoodsStrings
COMMIT TRANSACTION
There are s two tables: Receipts has many ReceiptGoodsStrings (key ReceiptID)
It's working fine. But sometimes on core server I have duplicated rows in Receipts and ReceiptGoodsStrings. It's happening very rarely and I cannot understand why.
Maybe I chose the wrong way to transfer data?
It seems it is a concurrency problem.
There is a possibility that two concurrent transaction open and both read from your Receipt table. Each session is going to write to its own temp tables (#Receipts and #ReceiptGoodsStrings). At the end, clients intermittently lock [SyncServer].[POSServer].[dbo].[Receipt] and [SyncServer].[POSServer].[dbo].[ReceiptGoodsStrings] to stuff rows from temp tables to destination and both of them perform an update.
Thus, both transactions are succesfully completed and you have duplicate rows!
Fortunately, you can use UPDLOCK hint on your first select from Receipt table to lock the rows/pages you already read while inside a transaction. The other client will have to wait for the lock to be released by the first client performing COMMIT. Then, second one will continue, read only new rows to be transmitted and copy them and only them.
SELECT TOP (#ReceiptsQuantity)
MarketId, CashCheckoutId, ReceiptId, GlobalReceiptId
INTO #Receipts
FROM dbo.Receipt WITH (UPDLOCK)
WHERE Transmitted = 0
EDIT
At the end, pay attention to the interval you use to call the sync transaction. It may be that the interval is too short, so the transaction is not yet finished while the new one is starting. In this case you can expect to get the duplicated rows because. You could try to increase the interval.

SELECT COUNT(SomeId) with INSERT later to same SomeId: Appropriate locking strategy?

I am using SQL Server 2012. I have a repeatable read transaction where I perform this query:
select count(SomeId)
from dbo.MyTable
where SomeId = #SomeId
SomeId is a column whose value may repeat in the table (think foreign key). However, SomeId is not a member of any index nor is it a foreign key.
Later in the transaction, I insert a record into dbo.MyTable with the same #SomeId, thus changing what the select count(*) would return were I to run it again:
insert into dbo.MyTable (SomeId, ...)
values (#SomeId, ...)
Several threads in my application can execute this transaction at the same time. Because of this, I'm getting deadlocks on the insert statement. At first, I thought an updlock would be appropriate on the select statement, but I quickly realized that it wouldn't work because I'm not actually updating the rows selected by the select count(SomeId).
My question is this: is there any way to avoid a potentially expensive table lock? Is there any way to lock just rows that involve SomeId, even if they haven't been inserted yet (strange, I know)? I want to force other threads to wait while the original transaction completes its work but I don't want to lock rows unnecessarily.
EDIT
Here's what I'm trying to accomplish:
I only want to insert up to eight rows for a particular SomeId. There are several unrelated processes that can start one of these transactions, potentially at the same time. The select count detects whether there are already eight rows and causes the operation to fail for that process. If the count is less than eight, that same transaction performs additional work, then inserts a record at the end, thus effectively incrementing the count were the select count to be run again. I am hitting the deadlock on the insert statement.
If you have several processes that try to do the same thing and you don't want to have more records than some number, you will need to actually prevent those processes from running at the the same time.
One way would be to read the counts with exclusive lock:
select count(SomeId)
from dbo.MyTable with (xlock)
where SomeId = #SomeId
This way those records will be blocked until the transaction completes.
You should create an index for the SomeId column though as it will be most likely that the locks will be on held on the index level this way.

What type of Transaction IsolationLevel should be used to ignore inserts but lock the selected row?

I have a process that starts a transaction, inserts a record into Table1, and then calls a long running web service (up to 30 seconds). If the web service call fails then the insert is rolled back (which is what we want). Here is an example of the insert (it is actually multiple inserts into multiple tables but I am simplifying for this question):
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (#UserId, 1)
I have a second process that queries Table1 from the first step like this:
SELECT TOP 1 * FROM Table1 WHERE StatusTypeId=2
and then updates that row for a user. When process 1 is running, Table1 is locked so process 2 will not complete until process 1 finishes which is a problem because a long delay is introduced while process 1 finishes its web service call.
Process 1 will only ever insert a StatusTypeId of 1 and it is also the only operation that inserts into Table1. Process 2 will only query on StatusTypeId = 2. I want to tell Process 2 to ignore any inserts into Table1 but lock the row that it selects. The default isolation level for Process 2 is waiting on too much but I have a fear that IsolationLevel.ReadUncommitted allows reading of too much dirty data. I do not want two users running Process 2 and then accidentally getting the same row.
Is there a different IsolationLevel to use other than ReadUncommitted that says ignore inserted rows but make sure the select locks the row that is selected?
Regarding the SELECT being blocked by the insert this should be avoidable by providing appropriate indexes.
Test Table.
CREATE TABLE Table1
(
UserId INT PRIMARY KEY,
StatusTypeId INT,
AnotherColumn varchar(50)
)
insert into Table1
SELECT number, (LEN(type)%2)+1, newid()
FROM master.dbo.spt_values
where type='p'
Query window one
BEGIN TRAN
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (5000, 1)
WAITFOR DELAY '00:01';
ROLLBACK
Query window two (Blocks)
SELECT TOP 1 *
FROM Table1
WHERE StatusTypeId=2
ORDER BY AnotherColumn
But if you retry the test after adding an index it won't block CREATE NONCLUSTERED INDEX ix ON Table1 (StatusTypeId,AnotherColumn)
Regarding your locking of rows for Process 2 you can use the following (the READPAST hint will allow 2 concurrent Process 2 transactions to begin processing different rows rather than one blocking the other). You might find this article by Remus Rusanu relevant
BEGIN TRAN
SELECT TOP 1 *
FROM Table1 WITH (UPDLOCK, READPAST)
WHERE StatusTypeId=2
ORDER BY AnotherColumn
/*
Rest of Process Two's code here
*/
COMMIT
Edit: Having re-read the question, the lock on any insert should not effect any select under READ COMMITTED this could be an issue with your indexes.
However, from your comments and rest of the question it seems you want only one transaction to be able to read a row at a time, which is not what an isolation level prevents.
They prevent
Dirty Read - reading uncommitted data in a transaction which could be rolled back - occurs in READ UNCOMMITTED, prevented in READ COMMITTED, REPEATABLE READ, SERIALIZABLE
Non Repeatable Reads - a row is updated whilst being read in an uncommitted transaction, meaning the same read of a particular row can occur twice in a transaction and produce a different results - occurs in READ UNCOMMITTED, READ COMMITTED. prevented in REPEATABLE READ, SERIALIZABLE
phantom rows - a row is inserted or deleted whilst being read in an uncommited transaction, meaning that the same read of multiple rows can occur twice in a transaction and produce different results, with either added or missing rows - occurs in READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, prevented in SERIALIZABLE