SQL Server row versioning overhead on INSERT/SELECT workflow

SQL Server row versioning overhead on INSERT/SELECT workflow - sql

SQL Server 2016
Workflow consists of several continuous inserts from one writer and occasional select from a separate reader which returns several rows (all same table). Insert latency is prioritized over select performance. There are no updates/deletes and the selects will never need to return rows that have recently been inserted.
Both ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT are set to ON.
The issue is that whenever a select query is sent via SqlCommand.ExecuteReader, there is a significant spike in insert latency until SqlCommand.ExecuteReader returns with a SqlDataReader. Since insert latency is important, this degradation needs to be minimized. Select is under read committed isolation level.
Using NOLOCK table hint in the select query does not show this same spike in insert latency & given the use case of the table, dirty reads aren't a concern since they can't happen.
Using READPAST table hint gives similar results to no hint (read committed snapshot).
I haven't found anything online that explains this discrepancy. What overhead is there with read committed snapshot (current state) that impacts insert latency that is not seen when NOLOCK is used?

Related

How to reduce downtime of table during inserts SQL Server

I have an operative table, call it Ops. The table gets queried by our customers via a web service every other second.
There are two processes that affect the table:
Deleting expired records (daily)
Inserting new records (weekly)
My goal is to reduce downtime to a minimum during these processes. I know Oracle, but this is the first time I'm using SQL Server and T-SQL. In Oracle, I would do a truncate to speed up the first process of deleting expired records and a partition exchange to insert new records.
Partition Exchanges for SQL Server seem a bit harder to handle, because from what I can read, one has to create file groups, partition schemes and partition functions (?).
What are your recommendations for reducing downtime?

A table is not offline because someone is deleting or inserting rows. The table can be read and updated concurrently.
However, under the default isolation level READ COMMITTED readers are blocked by writers and writers are blocked by readers. This means that a SELECT statement can take longer to complete because a not-yet-committed transaction is locking some rows the SELECT statement is trying to read. The SELECT statement is blocked until the transaction completes. This can be a problem if the transaction takes long time, since it appears as the table was offline.
On the other hand, under READ COMMITTED SNAPSHOT and SNAPSHOT isolation levels readers don't block writers and writers don't block readers. This means that a SELECT statement can run concurrently with INSERT, UPDATE and DELETE statements without waiting to acquire locks, because under these isolation levels SELECT statements don't request locks.
The simplest thing you can do is to enable READ COMMITTED SNAPSHOT isolation level on the database. When this isolation level is enabled it becomes the default isolation level, so you don't need to change the code of your application.
ALTER DATABASE MyDataBase SET READ_COMMITTED_SNAPSHOT ON

If your problem is "selects getting blocked," you can try 'NO LOCK' hint. But be sure to read the implications. You can check https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/ for details.

SQL 2008+ NOLOCK vs READPAST Considerations for Reporting Accuracy

Understanding the final decision is business decision, what are the accuracy considerations between NOLOCK & READPAST running in SQL 2008 R2? I would like to have a better understanding before discussing changes with the business area.
I have inherited a number of queries, used to create data views for management reporting. ‘WITH (NOLOCK)’ is used liberally but inconsistently. The data being read is from the production server of a widely used application that is constantly being updated. We are migrating from a SQL 2005 server to a SQL 2008 R2 server. These reports want data fresher than the 24 hour old data on the archive server. The use of NOLOCK suggests a past decision; potential for conflict exists and it a bit of accuracy loss is acceptable. Data is used to populate dashboards for human awareness/decision making.
All the Queries are SELECT, with Read Only access for the data view login. The majority of the queries are single table with a few 2 and 3 table joins. Given the low level of joins WITH () seems a better choice than SET TRANSACTION ISOLATION LEVEL {}
Table Hints (Transact-SQL) http://msdn.microsoft.com/en-us/library/ms187373.aspx (as well as multiple questions on SO) says that NOLOCK and/or READUNCOMMITTED are likely to have duplicate read issues, in addition to missing locked records.
READPAST looks like the more accurate, as it will only miss locked records without a chance of duplicates. But I am not sure the level of missing locked records is consistent between it and NOLOCK.
There is good article by Tim Chapman comparing the two but it was written in 2007, most of the comments revolve around 2000 & 2005, with one comment indicating READPAST is problematic in 2008 R2
References
Effect of NOLOCK hint in SELECT statements
When should you use "with (nolock)"
Using NOLOCK and READPAST table hints in SQL Server (By Tim Chapman)
Edit:
Snapshot isolation is suggested in two answers below. Snapshot isolation is dependent setting of the DB, this Q/A https://serverfault.com/questions/117104/how-can-i-tell-if-snapshot-isolation-is-turned-on describes how to see what setting are in place on the database. I now know it is disabled, I am reading for reports from a major applications database. Changing the setting is not an option. +- a couple of percent accuracy is acceptable, application (OLTP) impact is not acceptable. Most simple queries do not need lock considerations but in some extreme cases, lock consideration is required. With the advent of Snapshot isolation for SQL 2005, little information is available on NOLOCK & READPAST behavior in SQL 2008 or higher. Yet they remain my only choices.

A better option worth consideration is enabling READ COMMITTED SNAPSHOT for the database itself. This uses versioning in the tempdb to capture the state of a table at the beginning of the transaction.
There is a very good read on various aspects of NOLOCK, READPAST etc, at http://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/
WITH (NOLOCK) can provide incorrect results if someone is updating the table when you are selecting from it. If a page-split happens as a result of an insert while you are reading the table, and the new page happens to be beyond the point you've read, WITH (NOLOCK) will have already returned rows from the old page, and will then return duplicate rows from the new page. This is just a single example of why (NOLOCK) is bad.
WITH (READPAST) will skip any records that are being updated or inserted while you are reading from the table. Neither option is good in a busy database.
In light of the recent edit to your question where you state you cannot change the database setting for READ COMMITTED SNAPSHOT, perhaps you should consider using a stored procedure to gather data for you reports, and setting the transaction isolation level at the beginning of the stored proc using SET TRANSACTION ISOLATION LEVEL SNAPSHOT;. In order to do this, you would need to change the database option 'allow snapshot isolation'.
From SQL Server Books Online:
SNAPSHOT
Specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction. The transaction can only recognize data modifications that were committed before the start of the transaction. Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction. The effect is as if the statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.
Except when a database is being recovered, SNAPSHOT transactions do not request locks when reading data. SNAPSHOT transactions reading data do not block other transactions from writing data. Transactions writing data do not block SNAPSHOT transactions from reading data.
During the roll-back phase of a database recovery, SNAPSHOT transactions will request a lock if an attempt is made to read data that is locked by another transaction that is being rolled back. The SNAPSHOT transaction is blocked until that transaction has been rolled back. The lock is released immediately after it has been granted.
The ALLOW_SNAPSHOT_ISOLATION database option must be set to ON before you can start a transaction that uses the SNAPSHOT isolation level. If a transaction using the SNAPSHOT isolation level accesses data in multiple databases, ALLOW_SNAPSHOT_ISOLATION must be set to ON in each database.
A transaction cannot be set to SNAPSHOT isolation level that started with another isolation level; doing so will cause the transaction to abort. If a transaction starts in the SNAPSHOT isolation level, you can change it to another isolation level and then back to SNAPSHOT. A transaction starts the first time it accesses data.
A transaction running under SNAPSHOT isolation level can view changes made by that transaction. For example, if the transaction performs an UPDATE on a table and then issues a SELECT statement against the same table, the modified data will be included in the result set.

NOLOCK can cause duplicate data to be read, data to be missed and the query to actually fail with an error messaged (something with "data movement").
On the other hand, a non-NONLOCK query can also read duplicate data and mis data! It is by no means a consistent snapshot of the database. The difference is that it will not read uncommitted data and will never fail.
The problem with NOLOCK is mostly that it can fail randomly, so you need retry. Also, the probability of wrong data being read is slightly higher.
NOLOCK has a big advantage when you're doing table scans: SQL Server can use allocation order scanning instead of index-order scans. TABLOCK has the same effect. This can be a significant speedup in the presence of fragmentation.
Consider just using the snapshot isolation level as it gets rid of all of these concerns. It comes with some other trade-offs and you don't get allocation-order scans. But it permanently and comprehensively removes locking problems.

Answering my own question after stress testing with SQLQueryStress http://www.datamanipulation.net/sqlquerystress/ (this is wonderful tool that is extremely easy to use). Results from SQLQueryStress are tested against SQL Server Profiler; the accuracy is the same as SQL Server Profiler, though the precession is two decimal places of a second less (but is sufficient for this test).
As mentioned in the question, the primary concern is application performance impact, with report accuracy and performance the secondary consideration. All testing occurred on the test server with where the test application is active and has some minor activity.
After downloading and becoming familiar with SQLQueryStress I set up a simple ‘ReportQuery’ to act as resource hog. It set to run 15 Iterations with 15 threads (225 total queries). Total run time is around 28 seconds, with an average iteration time of 1.49 seconds.
Created an Add/Delete ‘ApplicationQuery’ to represent ongoing application activity. It is set to run 2000 iterations with 1 thread. There are two versions, with a select statement (runs 31 seconds) and without a select statement (runs 28 seconds). These represent normal peak time application activity.
10 test runs of each of three versions of “ReportQuery” are run, this is to identify if there is any performance benefit between ‘with(nolock)’, ‘with(readpast), and without hints . Results indicate no significant difference the ReportQuery runs constantly in about 28 seconds with an average 1.5 second iteration time.
There are no big outliers so, decision to drop to 5 test runs for the following tests.
5 test runs of ApplicationQuery with a select statement; with one of each of three versions of “ReportQuery” also running. In each of 15 totals test, the ApplicationQuery is manually started, with the ReportQuery manually started immediately after. This scenario represents a resource heavy report query struggling with the applications ongoing activity for resources.
Repeated the test runs but this time used ApplicationQuery without a select statement.
Results: In every case the ApplicationQuery was throttled back to almost no forward progress, while the ReportQuery was running.
The ReportQuery had no significant loss of performance when struggling for resources with multiple ApplicationQuery’s against the database.
The ApplicationQuery was able to run queries parallel to the ReportQuery, but progress was very slow while competing for resources. In essence the total time to run 2000 Application Add/Delete Queries was extended by the time used by the ReportQuery.
The initial question was about which was more accurate, becomes pointless. There is essentially no report or application performance difference between using or not using the hints NOLOCK or READPAST, so don’t use either in a busy database and get the highest accuracy possible.
‘ReportQuery’
select
ID
, [TABLE_NAME]
, NUMBER
, FIELD
, OLD_VALUE
, NEW_VALUE
, SYSMODUSER
, SYSMODTIME
, SYSMODCOUNT
from dbo.UPMCINCIDENTMGMTAUDITRECORDSM1
where Number like '%'
or NUMBER like '2010-01-01'
‘ApplicationQuery’ (with Select Statement)
select *
from dbo.UPMCINCIDENTMGMTAUDITRECORDSM1
where FIELD = 'JJTestingPerformance'
insert into dbo.UPMCINCIDENTMGMTAUDITRECORDSM1 (ID
, [TABLE_NAME]
, NUMBER
, FIELD
, OLD_VALUE
, NEW_VALUE
)
values ('Test+Time'
, 'none'
, 'tst01'
, 'JJTestingPerformance'
, 'No Value'
, 'Test'
)
delete from dbo.UPMCINCIDENTMGMTAUDITRECORDSM1
where FIELD = 'JJTestingPerformance'
‘ApplicationQuery’ (without Select Statement)
insert into dbo.UPMCINCIDENTMGMTAUDITRECORDSM1 (ID
, [TABLE_NAME]
, NUMBER
, FIELD
, OLD_VALUE
, NEW_VALUE
)
values ('Test+Time'
, 'none'
, 'tst01'
, 'JJTestingPerformance'
, 'No Value'
, 'Test'
)
delete from dbo.UPMCINCIDENTMGMTAUDITRECORDSM1
where FIELD = 'JJTestingPerformance'

In SQL Server 2005, when does a Select query block Inserts or Updates to the same or other table(s)?

In the past I always thought that select query would not blocks other insert sql. However, recently I wrote a query that takes a long time (more than 2 min) to select data from a table. During the select, a number of insert sql statements were timing out.
If select blocks insert, what would be the solution way to prevent the timeout without causing dirty read?
I have investigate option of using isolation snapshot, but currently I have no access to change the client's database to enable the “ALLOW_SNAPSHOT_ISOLATION”.
Thanks

When does a Select query block Inserts or Updates to the same or
other table(s)?
When it holds a lock on a resource that is mutually exclusive with one that the insert or update statement needs.
Under readcommitted isolation level with no additional locking hints then the S locks taken out are typically released as soon as the data is read. For repeatable read or serializable however they will be held until the end of the transaction (statement for a single select not running in an explicit transaction).
serializable will often take out range locks which will cause additional blocking over and above that caused by the holding of locks on the rows and pages actually read.

READPAST might be what you're looking for - check out this article.

Strange query performance of SQL Server 2005

I noticed some performance problems with my DB. Such query (just for a example):
SELECT *
FROM ActionHistory
WHERE ObjectId = #id"
...executes at random with different reads and duration. ObjectId is Foreign Key, with index on it.
With SQL Profiler I found, that sometimes the results are: 5 reads, 0 duration, but in another case: 5 reads, 200 duration. Such big durations occurs accidentally.
I use distributed transaction with WCF. Such results I got when I was the only user at that time, so it likely not to be a locks or something else.
What is the reason of such behaviour: low reads, but high query duration ?

In general, distributed transactions are extremely expensive. Try disabling distributed transactions in your environment to see if that changes anything.

Since the query is exactly the same each time and the reads are the same, then it's most likely due to locking. Sometimes another query is executing and may have a lock on the records that need to be accessed. Waiting for the lock to be released would cause a slowdown.
Using SQL Profiler to compare start/stop times for queries you can identify overlapping queries that may cause locking.
This is not an indication of a problem, just an explanation of the differences you're seeing.

Enable read committed snapshot in the database:
ALTER DATABASE ... SET READ_COMMITTED_SNAPSHOT ON;
This will miraculously change your reads that occur under the default read-committed isolation into snapshot reads, which are not hindered by locks. See Choosing Row Versioning-based Isolation Levels for details, including the runtime resource usage caused by enabling snapshot reads.

sql queries and inserts

I have a random question. If I were to do a sql select and while the sql server was querying my request someone else does a insert statement... could that data that was inputted in that insert statement also be retrieved from my select statement?

Queries are queued, so if the SELECT occurs before the INSERT there's no possibility of seeing the newly inserted data.
Using default isolation levels, SELECT is generally given higher privilege over others but still only reads COMMITTED data. So if the INSERT data has not been committed by the time the SELECT occurs--again, you wouldn't see the newly inserted data. If the INSERT has been committed, the subsequent SELECT will include the newly inserted data.
If the isolation level allowed reading UNCOMMITTED (AKA dirty) data, then yes--a SELECT occurring after the INSERT but before the INSERT data was committed would return that data. This is not recommended practice, because UNCOMMITTED data could be subject to a ROLLBACK.

If the SELECT statement is executed before the INSERT statement, the selected data will certainly not include the new inserted data.
What happens in MySQL with MyISAM, the default engine, is that all INSERT statements require a table lock; as a result, once an INSERT statement is executed, it first waits for all existing SELECTs to complete before locking the table, performs the INSERT, and then unlocks it.
For more information, see: Internal Locking Methods in the MySQL manual

No, a SELECT that is already executing that the moment of the INSERT will never gather new records that did not exist when the SELECT statement started executing.
Also if you use the transactional storage engine InnoDB, you can be assured that your SELECT will not include rows that are currently being inserted. That's the purpose of transaction isolation, or the "I" in ACID.
For more details see http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html because there are some nuances about read-committed and read-uncommitted transaction isolation modes.

I don't know particulars for MySQL, but in SQL Server it would depend on if there were any locking hints used, and the default behavior for locks. You have a couple of options:
Your SELECT locks the table, which means the INSERT won't process until your select is finished.
Your SELECT is able to do a "dirty read" which means the transaction doesn't care if you get slightly out-of-date data, and you miss the INSERT
Your SELECT is able to do a "dirty read" but the INSERT happens before the SELECT hits that row, and you get the result that was added.

The only way you do that is with a "dirty read".
Take a look at MYSql's documentation on TRANSACTION ISOLATION LEVELS to get a better understanding of what that is.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas