How to Select UNCOMMITTED rows only in SQL Server? - sql

I am working on DW project where I need to query live CRM system. The standard isolation level negatively influences performance. I am tempted to use no lock/transaction isolation level read uncommitted. I want to know how many of selected rows are identified by dirty read.

Maybe you can do this:
SELECT * FROM T WITH (SNAPSHOT)
EXCEPT
SELECT * FROM T WITH (READCOMMITTED, READPAST)
But this is inherently racy.

Why do you need to know that?
You use TRANSACTION ISOLATION LEVER READ UNCOMMITTED just to indicate that SELECT statement won't wait till any update/insert/delete transactions are finished on table/page/rows - and will grab even dirty records. And you do it to increase performance. Trying to get information about which records were dirty is like punch blender to your face. It hurts and gives you nothing, but pain. Because they were dirty at some point, and now they aint. Or still dirty? Who knows...
upd
Now about data quality.
Imagine you read dirty record with query like:
SELECT *
FROM dbo.MyTable
WITH (NOLOCK)
and for example got record with id = 1 and name = 'someValue'. Than you want to update name, set it to 'anotherValue` - so you do following query:
UPDATE dbo.MyTable
SET
Name = 'anotherValue'
WHERE id = 1
So if this record exists you'l get actual value there, if it was deleted (even on dirty read - deleted and not committed yet) - nothing terrible happened, query won't affect any rows. Is it a problem? Of course not. Becase in time between your read and update things could change zillion times. Just check ##ROWCOUNT to make sure query did what it had to, and warn user about results.
Anyway it depends on situation and importance of data. If data MUST be actual - don't use dirty reads

The standard isolation level negatively influences performance
So why don't you address that? You know dirty reads are inconsistent reads, so you shouldn't use them. The obvious answer is to use snapshot isolation. Read Implementing Snapshot or Read Committed Snapshot Isolation in SQL Server: A Guide.
But the problem goes deeper actually. Why do you encounter blocking? Why are reads blocked by writes? A DW workload should not be let loose on the operational transactional data, this is why we have ETL and OLAP products for. Consider cubes, columnstores, powerpivot, all the goodness that allows for incredibly fast DW and analysis. Don't burden the business operational database with your analytically end-to-end scans, you'll have nothing but problems.

Related

Using NOLOCK for reading single static row. Whats the harm?

Can anyone with DEADLOCK experience enlighten me?
I read that it can cause log file corruption - is that possible? I think MS would never do that. Also if "some situations", like mine, are okay with DEADLOCK, why not use it?
I have no datasets, return tables (like other posts in Stack Overflow). I have one SQL statement with ID select which returns only one row like:
sqlstr = "SELECT Parameter1 FROM Companies WITH (NOLOCK) WHERE ID = 25
Also, this parameter does not change. But as this is a heavy load aspnet application (not a web site) and I run this kind of query again and again, every SQL read causes a lock in SQL server. If possible I'd prefer to avoid that.
Every post in this site is about multiple records, recordsets, dirty reads. I could not find anything about "reading single record which is not changing all the time".
Any expert's opinion, please?
This simple select statement when executed without any lock/nolock hints under default transaction isolation level , obtains a shared lock on the row, It means other users can also read this row while its being read by this query.
On the other hand when you specify WITH (NOLOCK) query hint, it does not obtain any locks at all. In this case again other users can read this row as well but you might be reading a dirty row (data that has not been committed to disk yet and is in the process of being modified).
So in either case this simple select will not cause a deadlock. So really the question you should be asking yourself is, should users be able to see dirty data or not? and in most cases the answer would be no.
Therefore do not worry about getting deadlocks with this select query. as long as you are using default transaction isolation level. In a more strict isolation level like seriallizable a select can lock out other users but in default isolation level you should be ok.
NOLOCK has two main disadvantages: It can return uncommitted data (you don't seem worried about that) and it can cause queries to spuriously fail under very rare circumstances. Never will NOLOCK cause physical database corruption.
Consider using snapshot isolation for transactions that only read data. Readers under SI do not lock or block. SI takes them out of the picture. It provides perfect consistency for read-only transactions. Be sure to find out about the drawbacks.
It isn't worth it.
NOLOCK is often exploited as a magic way to speed up database reads, but I try to avoid using it whever possible.
The result set can contain rows that have not yet been committed, that are often later rolled back.
An error or Result set can be empty, be missing rows or display the same row multiple times.
This is because other transactions are moving data at the same time you're reading it.
READ COMMITTED adds an additional issue where data is corrupted within a single column where multiple users change the same cell simultaneously.
There are other side-effects too, which result in sacrificing the speed increase you were hoping to gain in the first place.
Now you know, never use it again.
After deep searches and asking questions to many experts I found out that using NOLOCK hint causes no problem in this scenario, yet its not advised. nothing wrong with NOLOCK but as I use sql2014 I "should" use ISOLATION LEVEL option. Its a method came instead of NOLOCK. For example for huge table selects that cause deadlocks:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
SELECT * FROM HugeTable;
COMMIT TRANSACTION;
is very handy.
I had HugeTable and a web form that uses sqlAdapter and Radgrid to show this data. Whenever I run this report, though indexes and paging of radgrid is fine, it caused deadlock, which makes sense. I changed select statement of sqlAdapter to above sentence, its perfect now.
best.

Lock issues on large recordset

I have a database table that I use as a queue system, where separate process that talk to each other create and read entries in the table. For example, when a user initiates a search an entry is created, then another process that runs every second or two will pick up that new entry, update the status and then do a search, updating the entry again when the search is complete. This all seems to work well with thousands of searches per hour.
However, I have a master admin screen that lets me view the status of all of these 'jobs' but it runs very slowly. I basically return all entries in the table for the last hour so I can keep an eye on what's going on. I think that I am running into lock issues of some sort. I only need to read each entry, and don't really care if it the data is a little bit out of date. I just use a standard 'Select * from Table' statement so maybe it is waiting for other locks to expire before returning data as the jobs are constantly updating the data.
Would this be handled better by a certain kind of cursor to return each row one at a time, etc? Any other ideas?
Thanks
If you really don't care if the data is a bit out of date... or if you only need the data to be 99.99% accurate, consider using WITH (NOLOCK):
SELECT * FROM Table WITH (NOLOCK);
This will instruct your query to use the READ UNCOMMITTED ISOLATION LEVEL, which has the following behavior:
Specifies that dirty reads are allowed. No shared locks are issued to
prevent other transactions from modifying data read by the current
transaction, and exclusive locks set by other transactions do not
block the current transaction from reading the locked data.
Be aware that NOLOCK may cause some inaccuracies in your data, so it probably isn't a good idea to use it throughout the rest of your system.
You need FROM yourtable WITH (NOLOCK) table hint.
You may also want to look at transaction isolation in your update process, if you aren't already
An alternative to NOLOCK (which can lead to very bad things, such as missed rows or duplicated rows) is to allow read committed snapshot isolation at the database level and then issue your query with:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

Is there any perfomance issues when inserting into a large SQL Server table which is being queried?

I use SQL Server. I got a large table - millions of rows. And I iterate through them (SELECT .. WHERE ..). This is a long operation (and I assume can't be shorter).
So what am I asking is if there will be any problems to insert data into that table in the progress of selecting? If yes, what should I do to reduce that? Same questing for update command (with indexed parameters of course).
Yes, you will have performance, and more specifically, locking and blocking issues. If your SELECT statements are using indexes, which they should be, these indexes will be locked every time that you INSERT data into the table. Since the table is relatively large, the lock will probably be long enough to block your SELECT statements, and deadlocks are likely as well.
This might be a scenario where you need to re-evaluate your table structure, and possibly even consider denormalizing to avoid this.
You might also consider Enabling Row Versioning-Based Isolation Levels, assuming that you can throughly test the rest of your system to understand the impact.
The answer is yes, absolutely. A simple solution (if it's an acceptable trade off within your application) is to specify the NOLOCK locking hint. IE:
select * from table with NOLOCK
The tradeoff is that you won't get a consistent read, but in many cases this isn't problem.
It's generally not a good idea to have long running queries on a database with frequent updates. This decrease performance significantly because of locking.
It might be a good idea to look into data warehouses and see if that is something that you could use. That would enable you to have the transactions on a separate database and the bulk load from it in to another database that would have your warehouse.
This would greatly improve performance for both inserts and queries. The trans-actional database could have no indexes, and the warehouse could have all the indexes you want.
You could also put the warehouse in a column store database. That would give you the best query time with the minimal effort because there isn't any need to create indexes in a column store, all you would have to do is to design the schema properly. The drawback with column stores is how ever that inserts, updates and deletes are very slow compared to relational databases. But bulk loading from the transactional database should do the trick. If you require the data to be very up to date, you could bulk load every few minutes. If you just need data from the previous day you could bulk load into the warehouse each night.
The possibilities are endless. If you want to look into column store warehouses you could try MonetDB. Its an open source column store so you could try it out and see if that's anything that suits you.
Do not assume execution time can't be shorter. If you query a date range, an index on date is a must!
Solve your problem indexing on date field:
-- please use correct names for your_table and date_field --
CREATE INDEX index_name ON your_table date_field
Warehousing, as per #Gisli, is a good option: build a copy of the data elsewhere, and run your long-running queries there, freeing up the "main" database for OLTP processing.
If this is not an option, you can mess around with snapshot isolation (something I know about, but have never worked with personally). Esssentially, this will take a "snapshot" of the database at the point in time you start the query, and will execute the query as if no subsequent changes were made to the database, even if changes are made to the database while the query is running. More importantly, any such changes are "real" and permanent. Think of it like a short-term branching of your database.
The duration of the branch (snapshot) is where I get weak. I believe you can have the snapshot last for the duration of the query, which means you'd (possibly) never be able to get the same results for a given query twice (if the data changes while you are running it); or you can create a "saved" snapshot that can be re-used over and over until you get around to deleting it. Be wary with this, you don't want your system to get cluttered up with old forgotten branches of past data!
There is no PROBLEM. SQL Serve is built to deal with this kind of situations, you just need to set the correct isolation level on the transactions.
There are several possible scenarios, for example, if you don't mind reading the data that is being inserted, set your isolation level to read uncommited on your read transaction. If you are inserting values in a range and reading values on another range, you can use SERIALIZABLE.
Take a look at the possible isolation levels:
http://msdn.microsoft.com/en-us/library/ms173763.aspx

What is the purpose of ROWLOCK on Delete and when should I use it?

Ex)
When should I use this statement:
DELETE TOP (#count)
FROM ProductInfo WITH (ROWLOCK)
WHERE ProductId = #productId_for_del;
And when should be just doing:
DELETE TOP (#count)
FROM ProductInfo
WHERE ProductId = #productId_for_del;
The with (rowlock) is a hint that instructs the database that it should keep locks on a row scope. That means that the database will avoid escalating locks to block or table scope.
You use the hint when only a single or only a few rows will be affected by the query, to keep the lock from locking rows that will not be deleted by the query. That will let another query read unrelated rows at the same time instead of having to wait for the delete to complete.
If you use it on a query that will delete a lot of rows, it may degrade the performance as the database will try to avoid escalating the locks to a larger scope, even if it would have been more efficient.
Normally you shouldn't need to add such hints to a query, because the database knows what kind of lock to use. It's only in situations where you get performance problems because the database made the wrong decision, that you should add such hints to a query.
Rowlock is a query hint that should be used with caution (as is all query hints).
Omitting it will likely still result in the exact same behaviour and providing it will not guarantee that it will only use a rowlock, it is only a hint afterall. If you do not have a very in depth knowledge of lock contention chances are that the optimizer will pick the best possible locking strategy, and these things are usually best left to the database engine to decide.
ROWLOCK means that SQL will lock only the affected row, and not the entire table or the page in the table where the data is stored when performing the delete. This will only affect other people reading from the table at the same time as your delete is running.
If a table lock is used it will cause all queries to the table to wait until your delete has completed, with a row lock only selects reading the specific rows will be made to wait.
Deleting top N where N is a number of rows will most likely lock the table in any case.
SQL Server defaults to page locks. This is the most efficient way for SQL server to process multiple date sets. But SQL server is not multi-user friendly sometimes; therefore you may need to incorporate locking methods so you can get your data to flow in and out of the database. This is why people approach that problem by using locking hints.
If everyone designed there database tables so that everything processed each row at page width - the system would be very fast. But no one spends that detailed amount of time.
So, you might see people use with(nolock) on their SELECT statements and the use of with(rowlock) on their UPDATE and DELETE statements. An INSERT does not matter because it will lock the PAGE automatically. Sometimes by using with(rowlock), you can get better multi-user (multiple user connections) performance.
The problem with(nolock) is that you can return the committed record sitting there in the DB already, plus the dirty record that is about to update the sitting record; thus a double return of records to your SELECT statement. If you know the personality of your system on how the data runs through it, you can use with(nolock) to your advantage quite a bit though.
When do you know when to use with(rowlock)? When your system isn't letting user play nice with each other in the same table / record. Though, query re-write / tune first and then adjust your locking as a last resort.
But as a DBA, always blame the developer's code. It is your solemnly sworn duty to do such. If you are the developer writing this code, just blame yourself.

Should I break down large SQL queries (MS)

This is in regards to MS SQL Server 2005.
I have an SSIS package that validates data between two different data sources. If it finds differences it builds and executes a SQL update script to fix the problem. The SQL Update script runs at the end of the package after all differences are found.
I'm wondering if it is necessary or a good idea to some how break down the sql update script into multiple transactions and whats the best way to do this.
The update script looks similar to this, but longer (example):
Update MyPartTable SET MyPartGroup = (Select PartGroupID From MyPartGroupTable
Where PartGroup = "Widgets"), PartAttr1 = 'ABC', PartAttr2 = 'DEF', PartAttr3 = '123'
WHERE PartNumber = 'ABC123';
For every error/difference found an additional Update query is added to the Update Script.
I only expect about 300 updates on a daily basis, but sometimes there could be 50,000. Should I break the script down into transactions every say 500 update queries or something?
don't optimize anything before you know there is a problem. if it is running fast, let it go. if it is running slow, make some changes.
No, I think the statement is fine as it is. It won't make much a of a difference in speed at all.
Billy Makes a valid point if you do care about the readability of the query(you should if it is a query that will be seen or used in the future.).
Would your system handle other processes reading the data that has yet to be updated? If so, you might want to perform multiple transactions.
The benefit of performing multiple transactions is that you will not continually accumulate locks. If you perform all these updates at once, SQL Server will eventually run out of small-grained lock resources (row/key) and upgrade to a table lock. When it does this, nobody else will be able to read from these tables until the transaction completes (unless they use dirty reads or are in snapshot mode).
The side effect is that other processes that read data may get inconsistent results.
So if nodoby else needs to use this data while you are updating, then sure, do all the updates in one transaction. If there are other processes that need to use the table, then yes, do it in chunks.
It shouldn't be a problem to split things up. However, if you want to A. maintain consistency between the items, and/or B. perform slightly better, you might want to use a single transaction for the while thing.
BEGIN TRANSACTION;
//Write 500 things
//Write 500 things
//Write 500 things
COMMIT TRANSACTION;
Transactions exist for just this reason -- where program logic would be clearer by splitting up queries but where data consistency between multiple actions is desired.
All records affected by the query will be either locked or copied into tempdb if the transaction operates in SNAPSHOT isolation level.
IF the number of records is high enough, the locks may be escalated.
If transaction isolation level is not SNAPSHOT, then a concurrent query will not be able to read the locked records which may be a concurrency problem for your application.
If transaction isolation level is SNAPSHOT, then tempdb should contain enough space to accomodate the old versions of the records, or the query will fail.
If either of this is a problem for you, then you should split the update into several chunks.