Bit of a long shot here, but I have a simple query below:
begin transaction
update s
set s.SomeField = null
from someTable s (NOLOCK)
rollback transaction
This runs in ~30 seconds sitting close to the SQL Server box. Are there any tricks I can use to improve the speed. The table has 144,306 rows in it.
thanks.
The single largest component of the performance of a large UPDATE command like this is going to be the speed of your DB log.
For best performance:
Make sure the DB log (LDF file) is on a separate physical spindle from the DB data (MDF file)
Avoid parity RAID for the log volume, such as RAID-5; RAID-1 or RAID-10 are better
Make sure that the DB log file is pre-grown, and that it's physically contiguous on disk
Make sure your server has enough RAM -- ideally, at least enough to hold all of the DB pages containing the modified rows
Using SSDs for your data drive may also help, because the command will create a large number of dirty buffers, which be flushed to disk later by the lazy writer; this can make other operations on the DB slow while it's happening.
If there's no constraint on it, and you really need to set all values of that column to NULL, then I would test dropping the column and re-adding it.
Not sure if that would be faster or not, but I'd investigate it.
Try disabling the index temporarily.
You could change the syntax of your query slightly, but I had no difference in my testing by doing that. I was using STATISTICS IO and STATISTICS TIME.
You mention the column is indexed. You could disable it / re-enable it as part of your transaction. The t-sql for that is simple, see this - http://blog.sqlauthority.com/2007/05/17/sql-server-disable-index-enable-index-alter-index/
I've had to do that in the past for similar jobs and it has worked out well for me.
Try to implement like this
Disable Index
Drop the column
Create the column
Rebuild index
I can guess that it will improve performance.
Related
I realize everyone wants to just look the other way on this question. I appreciate it if you continue to read on. Of course the log grows when adding a field to a large table. Let me just explain to my best ability:
We have a database upgrade utility that we deploy to our customers. In that utility we manipulate the database with changes that are specific to our version.
Our testing department is seeing varying results locally vs Virtual and Virtual vs Virtual machine. Some VMs do not have much log growth while others grow by 30gb. The database is set to SIMPLE. The transaction log shouldn't be used, "technically". I understand that the log is used as a cache till a disc is free enough to accept the change requested. I know there is not much to be done on the SQL side. We are stuck with some sort of shrink to handle the change after the upgrade is complete.
I am curious why Physical and VM would act differently and what to look for in a VM environment to see if this is going to be problematic. Do I look at something on the disc, MSINFO32, CPU? I have looked to make sure there is no compression on the VM. I also did profiles and looked at fn_dblog to see the indexes growing on the specific table. I just can't figure out why some grow exponentially while others do not grow. Also if you know of any DB_Owner permission level shrink style commands i would appreciate it. Currently we are testing a checkpoint since ShrinkDB will not be available to us due to permission level.
--table has usually has between 10 and 20 million records. It is a table that has 10 fields strictly typed
IF col_length('[dbo].[foo]','Field1') IS NULL
BEGIN
ALTER TABLE [dbo].[foo] ADD [Field1] smalldatetime NOT NULL CONSTRAINT [df_Field1] DEFAULT (GETDATE())
END
--This has growth and can get a bit out of control on VM. Physical machines it does not affect as much.
--Try 2 hoping Getdate was the issue
IF col_length('[dbo].[foo]','Field1') IS NULL
BEGIN
ALTER TABLE [dbo].[foo] ADD [Field1] smalldatetime NULL
END
DECLARE #TheDate smalldatetime = GETDATE()
UPDATE [dbo].[foo] SET [Field1] = #TheDate
--This was even more problematic to the log and took considerably longer
Doing a CHECKPOINT after each column change would be my first try for SIMPLE recovery model, honestly. SQL doesn't let us have much control of how the transaction log behaves other than recovery mode, backups, growth size, and checkpoints.
Check how many VLF's exist and their sizes with sys.dm_db_log_info or DBCC LOGINFO.
Check the recovery model - maybe it's either not SIMPLE or not always SIMPLE on some machines?
Check indexes and index fragmentation; that difference might matter, and it'll change after each column. Make sure you have a clustered index.
Check total table size.
I need to update a table in sql server 2008 along the lines of the Merge statement - delete, insert, updates. Table is 700k rows and I need users to still have read access to it assuming an isolation level of read committed.
I tried things like ALTER TABLE table SET (LOCK_ESCALATION=DISABLE) to no avail. I tested by doing a select top 50000 * from another window, obvious read uncommitted worked :). Is there anyway around this without changing the user's isolation level and retaining an 'all or nothing' transaction behaviour?
My current solution of a cursor that commits in batches of n may allow users to work but loses the transactional behaviour. Perhaps I could just make the bulk update fast enough to always be less than 30 seconds (for timeout). The problem is the user's target db's are on very slow machines with only 512mb ram. Not sure the processor but assume it is really slow and I don't have access to them at this time!
I created a test that causes an update statement to need to run against all 700k rows:
I tried an update with a left join on my dev box (quite fast) and it was 17 seconds
The merge statement was 10 seconds
The FORWARD ONLY cursor was slower than both
These figures are acceptable on my machine but I would feel more comfortable if I could get the query time down to less than 5 seconds before allowing locks.
Any ideas on preventing any locking on the table/rows or making it faster still?
It sounds like this table may be queried a lot but not updated much. If it really is a true read-only table for everyone else, but you want to update it extremely quickly, you could create a script that uses this method (could even wrap it in a transaction, I believe, although it's probably unnecessary):
Make a copy of the table named TABLENAME_COPY
Update the copy table
Rename the original table to TABLENAME_ORIG
Rename the copy table to TABLENAME
You would only experience downtime in between steps 3 and 4, and a scripted table rename would probably be quicker than an update of so many rows.
This does all assume that no else can update the table while your process is running, but they will still be able to read it fully at any point except between 3 & 4
I have a big table with around 70 columns in SQL Server 2008. A multithreaded .NET application is calling a stored proc on database to insert into / update the table. Frequency is around 3 times a second.
I have made weekly partitions on table since almost every query has a datetime constraint on the table.
Sometimes it takes a long time to insert/update the table. I am suspicious that sometimes INSERTION makes UPDATE wait; sometimes UPDATE makes INSERTION wait. Is it possible?
How can I design the table to avoid such locks? Performance is the main issue here.
You're right that you're probably hitting deadlocks causing things to wait. A couple things to check first;
Are your indexes correct?
If your DB is in 'Full' recovery mode do you need it? Simple recovery really speeds up inserts/updates, but you loose point-in-time restores for backups.
Are you likely to have multiple threads writing the same record? If not, NOLOCK might be your friend here, but that would mean your data might be inconsitent for a second or two on occasion.
I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.
From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.
We have a weekly maintenance plan to shrink all user databases and rebuild their indexes. This has working fine until we created a read-only database, now each time the plan runs it fails when it starts processing this database due to its read only state.
As far as I can see we have two options remove the read only flag from the database, this is possible but as the database is only updated once a quarter it makes sense from a performance point of view to make use of the read-only feature. Or manually select the database that the plan should run for i.e. all the users databases apart from the read only one, this then requires people to remember to add any new databases into the plan.
Does anyone have any suggestions of a better way of doing this?
Thanks
Neil
why are you shrinking the database in the first place?
also there's no need to maintain read opnly db's like that.
I'd remove the read only flag if you don't want to customise the maint plan.
Why are you shrinking DBs too? If the database grows to a given size, then this is probably it's natural current size.
Also remember that an index rebuild (rule of thumb) require free space of 120% of target table size. Eg 500 MB table needs 600 MB free space.
It's pointless to shrink then rebuild... and you'll have horrendous file fragmentation too
I suppose could modify the maintenance plan to start with a 'Execute T-SQL Statement' step, which removes the readonly flag (ALTER DATABASE database-name SET READ_WRITE) and add a final step to reset it:
ALTER DATABASE database-name SET READ_ONLY