Select and insert at the same moment SQL 2016 - sql

I have simple table in my application where is
ID | VALUE | DATE | ITEMID
All the time some items are inserting data threw my webservice to this table. I am inserting only data that is 5 or more minutes old. At the same time when I am using my web application I am selecting some max, min and current values for items. From time to time I got
Transaction (Process ID 62) was deadlocked on lock resources with
another process and has been chosen as the deadlock victim. Rerun the
transaction.
Is there any chance to get data and for that moment when I am selecting, don't care about inserts?

You can use with(nolock) hint,
This will get you uncommitted data.
See the answer here
What is "with (nolock)" in SQL Server?
And here
http://sqlserverplanet.com/tsql/using-with-nolock

If using SQL Server 2016:
Problem:
When we write a SELECT statement, shared locks are held on resources - say the TableX from which max, min and current values are being selected. Now for this moment if you INSERT data in these locked tables (TableX), then because INSERT requires an exclusive lock on the resource (TableX), the INSERT statement will wait for the SELECT to finish.
Solution:
As suggested in posts below, we can use WITH (NOLOCK) or WITH (READ UNCOMMITTED) table hints so that SELECT statements don't hold a lock on the underlying resources (TableX). But the problem is, this SELECT might read uncommitted data that was changed during a transaction and then later was rolled back to its initial state (also called reading dirty data).
If you want to make sure that the SELECT only reads committed data and also doesn't block the INSERTs to write to TableX, then at database level SET READ_COMMITTED_SNAPSHOT ON.

You can change the SELECT query to avoid deadlocks as:
SELECT ITEMID, MAX(VALUE), MIN(VALUE)
FROM table_name WITH (NOLOCK)
GROUP BY ITEMID, DATE

Related

Synchronization of queries to a MariaDB database

Because of some high-availability considerations, I design a system where multiple processes will communicate/synchronize via the database (most likely MariaDB, but I am open to looking into PostgreSQL and MySQL options).
One of the requirements identified is that a process must take a piece of work from the database, without letting another process take the same piece of work concurrently.
Specifically, here is the race condition I have in mind:
Process A starts a SQL transaction and runs SELECT * FROM requests WHERE ReservedTS IS NULL ORDER BY CreatedTS LIMIT 100. Here ReservedTS and CreatedTS are DATETIME columns storing the time the piece of work was created by a work submitter process and reserved by a work executor process correspondingly.
Process B starts a transaction, runs the same query and gets the same set of results.
Process A runs UPDATE requests WHERE id IN (<list of IDs selected above>) AND ReservedTS IS NULL SET ReservedTS=NOW()
Process B runs the same query, however, because its transaction has its own snapshot of the data, the ReservedTS will appear not null to Process B, so the items get reserved twice.
Process A commits the transaction.
Process B commits the transaction, overwriting the values of process A.
Could you please help to resolve the above data race?
You can easily do that by using exclusive locks:
For simplification the test table:
CREATE TABLE t1 (id int not null auto_increment primary key, reserved int);
INSERT INTO t1 VALUES (0,0), (1,0);
Process A:
BEGIN
SELECT id, reserved from t1 where id=2 and reserved=0 FOR UPDATE;
UPDATE t1 SET reserved=1 WHERE id=2 and reserved=0;
COMMIT
If Process B tries to update the same entry before Process A finished the transaction it has to wait until lock was released (or a timeout occurred):
update t1 set reserved=1 where id=2 and reserved=0;
Query OK, 0 rows affected (12.04 sec)
Rows matched: 0 Changed: 0 Warnings: 0
And as you can see, Process B didn't update anything.

How to SELECT COUNT from tables currently being INSERT?

Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).
Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.
If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.
Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)
For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.

SQL Server : distributed transaction and duplicated rows

I have 1 core SQL Server and many secondary SQL Servers that transfer data to the core server.
Every secondary SQL Server has linked core server and stored procedure that runs from time to time.
This is the code from a stored procedure (some fields are deleted, but it's not improtant)
BEGIN DISTRIBUTED TRANSACTION
SELECT TOP (#ReceiptsQuantity)
MarketId, CashCheckoutId, ReceiptId, GlobalReceiptId
INTO #Receipts
FROM dbo.Receipt
WHERE Transmitted = 0
SELECT ReceiptId, Barcode, GoodId
INTO #ReceiptGoodsStrings
FROM ReceiptGoodsStrings
WHERE ReceiptGoodsStrings.ReceiptId in (SELECT ReceiptId FROM #Receipts)
INSERT INTO [SyncServer].[POSServer].[dbo].[Receipt]
SELECT * FROM #Receipts
INSERT INTO [SyncServer].[POSServer].[dbo].[ReceiptGoodsStrings]
SELECT * FROM #ReceiptGoodsStrings
UPDATE Receipt
SET Transmitted = 1
WHERE ReceiptId in (SELECT ReceiptId FROM #Receipts)
DROP TABLE #Receipts
DROP TABLE #ReceiptGoodsStrings
COMMIT TRANSACTION
There are s two tables: Receipts has many ReceiptGoodsStrings (key ReceiptID)
It's working fine. But sometimes on core server I have duplicated rows in Receipts and ReceiptGoodsStrings. It's happening very rarely and I cannot understand why.
Maybe I chose the wrong way to transfer data?
It seems it is a concurrency problem.
There is a possibility that two concurrent transaction open and both read from your Receipt table. Each session is going to write to its own temp tables (#Receipts and #ReceiptGoodsStrings). At the end, clients intermittently lock [SyncServer].[POSServer].[dbo].[Receipt] and [SyncServer].[POSServer].[dbo].[ReceiptGoodsStrings] to stuff rows from temp tables to destination and both of them perform an update.
Thus, both transactions are succesfully completed and you have duplicate rows!
Fortunately, you can use UPDLOCK hint on your first select from Receipt table to lock the rows/pages you already read while inside a transaction. The other client will have to wait for the lock to be released by the first client performing COMMIT. Then, second one will continue, read only new rows to be transmitted and copy them and only them.
SELECT TOP (#ReceiptsQuantity)
MarketId, CashCheckoutId, ReceiptId, GlobalReceiptId
INTO #Receipts
FROM dbo.Receipt WITH (UPDLOCK)
WHERE Transmitted = 0
EDIT
At the end, pay attention to the interval you use to call the sync transaction. It may be that the interval is too short, so the transaction is not yet finished while the new one is starting. In this case you can expect to get the duplicated rows because. You could try to increase the interval.

SELECT COUNT(SomeId) with INSERT later to same SomeId: Appropriate locking strategy?

I am using SQL Server 2012. I have a repeatable read transaction where I perform this query:
select count(SomeId)
from dbo.MyTable
where SomeId = #SomeId
SomeId is a column whose value may repeat in the table (think foreign key). However, SomeId is not a member of any index nor is it a foreign key.
Later in the transaction, I insert a record into dbo.MyTable with the same #SomeId, thus changing what the select count(*) would return were I to run it again:
insert into dbo.MyTable (SomeId, ...)
values (#SomeId, ...)
Several threads in my application can execute this transaction at the same time. Because of this, I'm getting deadlocks on the insert statement. At first, I thought an updlock would be appropriate on the select statement, but I quickly realized that it wouldn't work because I'm not actually updating the rows selected by the select count(SomeId).
My question is this: is there any way to avoid a potentially expensive table lock? Is there any way to lock just rows that involve SomeId, even if they haven't been inserted yet (strange, I know)? I want to force other threads to wait while the original transaction completes its work but I don't want to lock rows unnecessarily.
EDIT
Here's what I'm trying to accomplish:
I only want to insert up to eight rows for a particular SomeId. There are several unrelated processes that can start one of these transactions, potentially at the same time. The select count detects whether there are already eight rows and causes the operation to fail for that process. If the count is less than eight, that same transaction performs additional work, then inserts a record at the end, thus effectively incrementing the count were the select count to be run again. I am hitting the deadlock on the insert statement.
If you have several processes that try to do the same thing and you don't want to have more records than some number, you will need to actually prevent those processes from running at the the same time.
One way would be to read the counts with exclusive lock:
select count(SomeId)
from dbo.MyTable with (xlock)
where SomeId = #SomeId
This way those records will be blocked until the transaction completes.
You should create an index for the SomeId column though as it will be most likely that the locks will be on held on the index level this way.

What type of Transaction IsolationLevel should be used to ignore inserts but lock the selected row?

I have a process that starts a transaction, inserts a record into Table1, and then calls a long running web service (up to 30 seconds). If the web service call fails then the insert is rolled back (which is what we want). Here is an example of the insert (it is actually multiple inserts into multiple tables but I am simplifying for this question):
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (#UserId, 1)
I have a second process that queries Table1 from the first step like this:
SELECT TOP 1 * FROM Table1 WHERE StatusTypeId=2
and then updates that row for a user. When process 1 is running, Table1 is locked so process 2 will not complete until process 1 finishes which is a problem because a long delay is introduced while process 1 finishes its web service call.
Process 1 will only ever insert a StatusTypeId of 1 and it is also the only operation that inserts into Table1. Process 2 will only query on StatusTypeId = 2. I want to tell Process 2 to ignore any inserts into Table1 but lock the row that it selects. The default isolation level for Process 2 is waiting on too much but I have a fear that IsolationLevel.ReadUncommitted allows reading of too much dirty data. I do not want two users running Process 2 and then accidentally getting the same row.
Is there a different IsolationLevel to use other than ReadUncommitted that says ignore inserted rows but make sure the select locks the row that is selected?
Regarding the SELECT being blocked by the insert this should be avoidable by providing appropriate indexes.
Test Table.
CREATE TABLE Table1
(
UserId INT PRIMARY KEY,
StatusTypeId INT,
AnotherColumn varchar(50)
)
insert into Table1
SELECT number, (LEN(type)%2)+1, newid()
FROM master.dbo.spt_values
where type='p'
Query window one
BEGIN TRAN
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (5000, 1)
WAITFOR DELAY '00:01';
ROLLBACK
Query window two (Blocks)
SELECT TOP 1 *
FROM Table1
WHERE StatusTypeId=2
ORDER BY AnotherColumn
But if you retry the test after adding an index it won't block CREATE NONCLUSTERED INDEX ix ON Table1 (StatusTypeId,AnotherColumn)
Regarding your locking of rows for Process 2 you can use the following (the READPAST hint will allow 2 concurrent Process 2 transactions to begin processing different rows rather than one blocking the other). You might find this article by Remus Rusanu relevant
BEGIN TRAN
SELECT TOP 1 *
FROM Table1 WITH (UPDLOCK, READPAST)
WHERE StatusTypeId=2
ORDER BY AnotherColumn
/*
Rest of Process Two's code here
*/
COMMIT
Edit: Having re-read the question, the lock on any insert should not effect any select under READ COMMITTED this could be an issue with your indexes.
However, from your comments and rest of the question it seems you want only one transaction to be able to read a row at a time, which is not what an isolation level prevents.
They prevent
Dirty Read - reading uncommitted data in a transaction which could be rolled back - occurs in READ UNCOMMITTED, prevented in READ COMMITTED, REPEATABLE READ, SERIALIZABLE
Non Repeatable Reads - a row is updated whilst being read in an uncommitted transaction, meaning the same read of a particular row can occur twice in a transaction and produce a different results - occurs in READ UNCOMMITTED, READ COMMITTED. prevented in REPEATABLE READ, SERIALIZABLE
phantom rows - a row is inserted or deleted whilst being read in an uncommited transaction, meaning that the same read of multiple rows can occur twice in a transaction and produce different results, with either added or missing rows - occurs in READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, prevented in SERIALIZABLE