SQL Server locking on SELECT - sql

I have built a web service that queries new data using an iterator (bigint) from a big table (100+ million rows) in an accounting system (SQL Server 2008 R2 Standard Edition).
The provider of the database has forced us to read uncommited transactions to ensure that we do not lock up the table for inserts (updates are never made).
Lately this has caused us trouble due to rollbacks. The accounting system has rollback rows that was already read by the web service due to errors and timeouts causing my system to store data that never should existed.
I think reading commited data would solve this but the accounting system provider will not let us since they are worried that it will lock inserts into the table.
Can the select actually block inserts and how would we best solve it?

Try selecting only data that you know isn't dirty (i.e. in the process of being written). For instance, if your table has a column createddate filled with the date the row was inserted on the database, then create a where condition retrieving only rows inserted after 5 minutes have passed.
Example
SELECT col1, col2, col3, createddate
FROM table WITH(NOLOCK)
WHERE createddate < dateadd(minute,-5,getdate())

This is not a technical issue as much as a design issue. I would recommend that you create a new table that is a copy of this table with committed transactions after whatever time frame is necessary to determine the committed data. Otherwise I would recommend finding another vendor for your accounting system.

Related

DROP TABLE or DELETE TABLE? Which is best practice?

Working on redesigning some databases in my SQL SERVER 2012 instance.
I have databases where I put my raw data (from vendors) and then I have client databases where I will (based on client name) create a view that only shows data for a specific client.
Because of the this data being volatile (Google Adwords & Google DFA) I typically just delete the last 6 days and insert 7 days everyday from the vendor databases. Doing this gives me comfort in knowing that Google has had time to solidify its data.
The question I am trying to answer is:
1. Instead of using views, would it be better use a 'SELECT INTO' statement and DROP the table everyday in the client database?
I'm afraid that by automating my process using the 'DROP TABLE' method will not scale well longterm. While testing it myself, it seems that performance is improved because it does not have to scan the entire table for the date range. I've also tested this with an index on the 'date' column and performance still seemed better with the 'DROP TABLE' method.
I am looking for best practices here.
NOTE: This is my first post. So I am not too familiar with how to format correctly. :)
Deleting rows from a table is a time-consuming process. All the deleted records get logged, and performance of the server suffers.
Instead, databases offer truncate table. This removes all the rows of the table without logging the rows, but keeps the structure intact. Also, triggers, indexes, constraints, stored procedures, and so on are not affected by the removal of rows.
In some databases, if you delete all rows from a table, then the operation is really truncate table. However, SQL Server is not one of those databases. In fact the documentation lists truncate as a best practice for deleting all rows:
To delete all the rows in a table, use TRUNCATE TABLE. TRUNCATE TABLE
is faster than DELETE and uses fewer system and transaction log
resources. TRUNCATE TABLE has restrictions, for example, the table
cannot participate in replication. For more information, see TRUNCATE
TABLE (Transact-SQL)
You can drop the table. But then you lose auxiliary metadata as well -- all the things listed above.
I would recommend that you truncate the table and reload the data using insert into or bulk insert.

Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005

I have a database with a main table (lets call it Owner) and several sub tables with holdings (like Cars, Books etc).
For example:
Owner has columns: owner_id, name
Cars has columns: owner_id (foreign key), brand
Books has columns: owner_id (foreign key), title, author
My program should calculate statistics like How many BMW owners also owns a Harry Potter book using various third party libraries. I want to read all rows from all tables at the same time and then do the analysis in non-sql code.
I want to read all tables using separate Select * From X statements. I cannot use one big join since it would return too many rows ((owners * cars * books) instead of (owners + cars + books)). A Union doesn't cut it either since the tables contain different columns of different types.
I have set
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
but I'm having some issues anyway.
If I stress the database by running two threads, one randomly inserting or deleting and the other reading I sometimes get inconsistent results, like Cars have been deleted between reading Owners and reading the Cars table.
I have a few questions:
What's the proper way of preventing modification when reading from multiple tables one by one? No table must be modified until all have been read.
I'm using SQL Server 2005 (on network) and SQL Server 2005 Express (local). Can I explicitly get locks on multiple tables at the same time?
If I run against my local SQL Server Express database, I can't make it work no matter what I do. If I run against my networked SQL server 2005 database, I can make it work (with some effort). Does SQL Server Express support transaction isolation level SERIALIZABLE? I believe it should. The differences could be due to a slow network connection but I don't know.
On my local db, I can not prevent modification in between reads. That is, one thread is randomly deleting a random owner (first cars, then books, then owner) or inserting a new owner (insert owner, insert 2 cars, insert 2 books). Another thread is reading using:
Begin Tran
Select owner_id From Owner
Select owner_id, brand From Cars
Select owner_id, title, author From Books
Commit Tran
No matter what I do, sometimes I get an owner with zero cars or zero books. This should never happen since all inserts and deletes are in a single transaction. I seems like the express server doesn't lock Owner, Cars and Books statements at the same time.
On the networked SQL Server 2005, it works fine but it could be because of a slow connection and thus lower probability of simultaneous execution.
On my local db, I am starting every transaction with a dummy Select from all tables to prevent deadlocking. I don't understand why this prevents deadlocking but not modification of the tables. This is not necessary on the networked SQL Server 2005.
At the moment, I can't tell if I've misunderstood something about transaction isolation or if it's an issue with differences between SQL Server Express and SQL Server 2005. Any help or insights would be greatly appreciated.
Your choice of loading all data in one go means very few options:
Use sp_getapplock to serialise access through the relevant code
Use TABLOCKX, HOLDLOCK on the reads in a transaction
You have issues because SET TRANSACTION ISOLATION LEVEL SERIALIZABLE only affects isolation of the locks: you need to control duration (HOLDLOCK) and granularity + mode (TABLOCKX)
I sometimes get inconsistent results,
Unless you will later on do batch processing on a database that is otherwise not used, better get used to certain fluctuations which WILL NOT MATTER ANYWAY.
Unless you have very few entries the changes wont matter in absolute numbers. You deal with statistics anyway. Use READ COMMITED and deal with the inconsitencies by accepting that the data set is not static.
Anything else will totally kill perforamnce.
OR go batch processing.
Alternatively: use a SNAPSHOT to seal a "view in time" of the database.

Does Adding a Column Lock a Table in SQL Server 2008?

I want to run the following on a table of about 12 million records.
ALTER TABLE t1
ADD c1 int NULL;
ALTER TABLE t2
ADD c2 bit NOT NULL
DEFAULT(0);
I've done it in staging and the timing seemed fine, but before I do it in production, I wanted to know how locking works on the table during new column creation (especially when a default value is specified). So, does anyone know? Does the whole table get locked, or do the rows get locked one by one during default value insertion? Or does something different altogether happen?
Prior to SQL Server 11 (Denali) the add non-null column with default will run an update behind the scenes to populate the new default values. Thus it will lock the table for the duration of the 12 million rows update. In SQL Server 11 this is no longer the case, the column is added online and no update occurs, see Online non-NULL with values column add in SQL Server 11.
Both in SQL Server 11 and prior a Sch-M lock is acquired on the table to modify the definition (add the new column metadata). This lock is incompatible with any other possible access (including dirty reads). The difference is in the duration: prior to SQL Server 11 this lock will be hold for a size-of-data operation (update of 12M rows). In SQL Server 11 the lock is only held for a short brief. In the pre-SQL Server 11 update of the rows no row lock needs to be acquired because the Sch-M lock on the table guarantees that there cannot be any conflict on any individual row.
Yes, it will lock the table.
A table, as a whole, has a single schema (set of columns, with associated types). So, at a minimum, a schema lock would be required to update the definition of the table.
Try to think about how things would work contrariwise - if each row was updated individually, how would any parallel queries work (especially if they involved the new columns)?
And default values are only useful during INSERT and DDL statements - so if you specify a new default for 10,000,000 rows, that default value has to be applied to all of those rows.
Yes, it will lock.
DDL statements issue a Schema Lock (see this link) which will prevent access to the table until the operation completes.
There's not really a way around this, and it makes sense if you think about it. SQL needs to know how many fields are in a table, and during this operation some rows will have more fields than others.
The alternative is to make a new table with the correct fields, insert into, then rename the tables to swap them out.
I have not read how the lock mechanism works when adding a column, but I am almost 100% sure row by row is impossible.
Watch when you do these types of things in SQL Server Manager with drag and drop (I know you are not doing this here, but this is a public forum), as some changes are destructive (fortunately, SQL Server 2008, at least R2, is safer here as it tells you "no can do" rather than just do it).
You can run both column additions in a single statement, however, and reduce the churn.

Auditing data changes in SQL Server 2008

I am trying to find a highly efficient method of auditing changes to data in a table. Currently I am using a trigger that looks at the INSERTED and DELETED tables to see what rows have changed and inserts these changes into an Audit table.
The problem is this is proving to be very inefficient (obviously!). It's possible that with 3 thousand rows inserted into the database at one time (which wouldn't be unusual) that 215000 rows would have to be inserted in total to audit these rows.
What is a reasonable way to audit all this data without it taking a long time to insert in to the database? It needs to be fast!
Thanks.
A correctly written trigger should be fast enough.
You could also look at Change Data Capture
Auditing in SQL Server 2008
I quite often use AutoAudit:
AutoAudit is a SQL Server (2005, 2008, 2012) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing
INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table Delete logs all
final values to the Audit table
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table
Update: (Original edit was rejected, but I'm re-adding it):
A major upgrade to version 3.20 was released in November 2013 with these added features:
Handles tables with up to 5 PK columns
Performance improvements up to 90% faster than version 2.00
Improved historical data retrieval UDF
Handles column/table names that need quotename [ ]
Archival process to keep the live Audit tables smaller/faster but retain the older data in archive AutoAudit tables
As others already mentioned - you can use Change Data Capture, Change Tracking, and Audit features in SQL Server, but to keep it simple and use one solution to track all SQL Server activities including these DML operations I suggest trying ApexSQL Comply. You can disable all other, and leave DML auditing option only
It uses a centralized repository for captured information on multiple SQL Server instances and their databases.
It would be best to read this article first, and then decide on using this tool:
http://solutioncenter.apexsql.com/methods-for-auditing-sql-server-data-changes-part-9-the-apexsql-solution/
SQL Server Notifications on insert update delete table change
SqlTableDependency C# componenet provides the low-level implementation to receive database notification creating SQL Server Queue and Service Broker.
Have a look at http://www.sqltabledependency.it/
For any record change, SqlTableDependency's event handler will get a notification containing modified table record values as well as DML - insert, update, delete - change executed on your database table.
You could allow the table to be self auditing by adding additional columns, for example:
For an INSERT - this is a new record and it's existence in the table is the audit itself.
With a DELETE - you can add columns like IsDeleted BIT \ DeletingUserID INT \ DeletingTimestamp DATETIME to your table.
With an UPDATE you add columns like IsLatestVersion BIT \ ParentRecordID INT to track version changes.

How to figure out which record has been deleted in an effiecient way?

I am working on an in-house ETL solution, from db1 (Oracle) to db2 (Sybase). We needs to transfer data incrementally (Change Data Capture?) into db2.
I have only read access to tables, so I can't create any table or trigger in Oracle db1.
The challenge I am facing is, how to detect record deletion in Oracle?
The solution which I can think of, is by using additional standalone/embedded db (e.g. derby, h2 etc). This db contains 2 tables, namely old_data, new_data.
old_data contains primary key field from tahle of interest in Oracle.
Every time ETL process runs, new_data table will be populated with primary key field from Oracle table. After that, I will run the following sql command to get the deleted rows:
SELECT old_data.id FROM old_data WHERE old_data.id NOT IN (SELECT new_data.id FROM new_data)
I think this will be a very expensive operation when the volume of data become very large. Do you have any better idea of doing this?
Thanks.
Which edition of Oracle ? If you have Enterprise Edition, look into Oracle Streams.
You can grab the deletes out of the REDO log rather than the database itself
One approach you could take is using the Oracle flashback capability (if you're using version 9i or later):
http://forums.oracle.com/forums/thread.jspa?messageID=2608773
This will allow you to select from a prior database state.
If there may not always be deleted records, you could be more efficient by:
Storing a row count with each query iteration.
Comparing that row count to the previous row count.
If they are different, you know you have a delete and you have to compare the current set with the historical data set from flashback. If not, then don't bother and you've saved a lot of cycles.
A quick note on your solution if flashback isn't an option: I don't think your select query is a big deal - it's all those inserts to populate those side tables that will really take a lot of time. Why not just run that query against the sybase production server before doing your update?