Sql server bulk load into a table - sql

I need to update a table in sql server 2008 along the lines of the Merge statement - delete, insert, updates. Table is 700k rows and I need users to still have read access to it assuming an isolation level of read committed.
I tried things like ALTER TABLE table SET (LOCK_ESCALATION=DISABLE) to no avail. I tested by doing a select top 50000 * from another window, obvious read uncommitted worked :). Is there anyway around this without changing the user's isolation level and retaining an 'all or nothing' transaction behaviour?
My current solution of a cursor that commits in batches of n may allow users to work but loses the transactional behaviour. Perhaps I could just make the bulk update fast enough to always be less than 30 seconds (for timeout). The problem is the user's target db's are on very slow machines with only 512mb ram. Not sure the processor but assume it is really slow and I don't have access to them at this time!
I created a test that causes an update statement to need to run against all 700k rows:
I tried an update with a left join on my dev box (quite fast) and it was 17 seconds
The merge statement was 10 seconds
The FORWARD ONLY cursor was slower than both
These figures are acceptable on my machine but I would feel more comfortable if I could get the query time down to less than 5 seconds before allowing locks.
Any ideas on preventing any locking on the table/rows or making it faster still?

It sounds like this table may be queried a lot but not updated much. If it really is a true read-only table for everyone else, but you want to update it extremely quickly, you could create a script that uses this method (could even wrap it in a transaction, I believe, although it's probably unnecessary):
Make a copy of the table named TABLENAME_COPY
Update the copy table
Rename the original table to TABLENAME_ORIG
Rename the copy table to TABLENAME
You would only experience downtime in between steps 3 and 4, and a scripted table rename would probably be quicker than an update of so many rows.
This does all assume that no else can update the table while your process is running, but they will still be able to read it fully at any point except between 3 & 4

Related

How do I lock out writes to a specific table while several queries execute?

I have a table set up in my sql server that keeps track of inventory items (in another database) that have changed. This table is fed by several different triggers. Every 15 minutes a scheduled task runs a batch file that executes a number of different queries that send updates on the items flagged in this table to update several ecommerce websites. The last query in the batch file resets the flags.
As you can imagine there is potential to lose changes if an item is flagged while this batch file is running. I have worked around this by replaying the last 25 hours of updates every 24 hours, just in case this scenario happened. It works, but IMO is kind of clumsy.
What I would like to do is delay any writes to this table until my script finishes, and resets the flags on all the rows that were flagged when the script started. Then allow all of these delayed writes to happen.
I've looked into doing this with table hints (TABLOCK) but this seems to be limited to one query--unless I'm misunderstanding what I have read, which is certainly possible. I have several that run in succession. TIA.
Alex
Could you modify your script into a stored procedure that extracts all the data into a temporary table using a select statement that applies a lock to the production table. You could then drop your lock on the main table and do all your processing in the temporary table (or permanent table built for the purpose) away from the live system. It will be a lot slower and put more load on your SQL box but speed shouldn't be an issue if you have a point in time snapshot of it.
If that option is not applicable then maybe you could play with wrapping the whole thing in a transaction and putting a table lock on your production table with the first select statement.
Good luck mate

delete first X row Ingres ANSI

I have 730000+ records which I need to delete in Ingres db which work with ANSI92 and I need to delete then without overload db, simple delete where search condition, doesn't work, DB just use all memory and trowing error. thinking to run it in loop, and delete by portions 10-20K of records .
i tried to use top and it didn't work
delete top (10)from TABLE where web_id <0 ;
, also was trying to use Limit also didnt work
DELETE FROM from TABLE where web_id <0 LIMIT 10;
any ideas how to do it ? Thank you !
You could use a session temporary table to hold the first 10 tids (tuple id's) and then delete based on those:
declare global temporary table session.tenrows as
select first 10 tid the_tid from "table" where web_id<0
on commit preserve rows with norecovery;
delete from "table" where tid in (select the_tid from session.tenrows);
When you say "without overload db", do you mean avoiding hitting the force-abort limit of the transaction log file? If so what might work for you is:
set session with on_logfull=notify;
delete from table where web_id<0;
This would automatically commit your transaction at points where force-abort is reached then carry on, rather than rolling back and reporting an error.
A downside of using this setting is that it can be tricky to unpick what has/hasn't been done if any other error should occur (your work will likely be partially committed), but since this appears to be a straight delete from a table it should be quite obvious which rows remain and which don't.
The "set session" statement must be run at the start of a transaction.
I would advise not running concurrent sessions with "on_logfull=notify" (there have been bugs in this area, whether they're fixed in your installation depends on your version/patch level).

Best way to do a long running schema change (or data update) in MS Sql Server?

I need to alter the size of a column on a large table (millions of rows). It will be set to a nvarchar(n) rather than nvarchar(max), so from what I understand, it will not be a long change. But since I will be doing this on production I wanted to understand the ramifications in case it does take long.
Should I just hit F5 from SSMS like I execute normal queries? What happens if my machine crashes? Or goes to sleep? What's the general best practice for doing long running updates? Should it be scheduled as a job on the server maybe?
Thanks
Please DO NOT just hit F5. I did this once and lost all the data in the table. Depending on the change, the update statement that is created for you actually stores the data in memory, drops the table, creates the new one that has the change you want, and populates the data from memory. However in my case one of the changes I made was adding a unique constraint so the population failed, and as the statement was over the data in memory was dropped. This left me with the new empty table.
I would create the table you are changing, with the change(s) you want, as a new table. Then select * into the new table, then re-name the tables in a single statement. If there is potential for data to be entered into the table while this is running and that is an issue, you may want to lock the table.
Depending on the size of the table and duration of the statement, you may want to save the locking and re-naming for later, and after the initial population of the new table do a differential population of new data and re-name the tables.
Sorry for the long post.
Edit:
Also, if the connection times out due to duration, then run the insert statement locally on the DB server. You could also create a job and run that, however it is essentially the same thing.

Improve update performance when setting column to null

Bit of a long shot here, but I have a simple query below:
begin transaction
update s
set s.SomeField = null
from someTable s (NOLOCK)
rollback transaction
This runs in ~30 seconds sitting close to the SQL Server box. Are there any tricks I can use to improve the speed. The table has 144,306 rows in it.
thanks.
The single largest component of the performance of a large UPDATE command like this is going to be the speed of your DB log.
For best performance:
Make sure the DB log (LDF file) is on a separate physical spindle from the DB data (MDF file)
Avoid parity RAID for the log volume, such as RAID-5; RAID-1 or RAID-10 are better
Make sure that the DB log file is pre-grown, and that it's physically contiguous on disk
Make sure your server has enough RAM -- ideally, at least enough to hold all of the DB pages containing the modified rows
Using SSDs for your data drive may also help, because the command will create a large number of dirty buffers, which be flushed to disk later by the lazy writer; this can make other operations on the DB slow while it's happening.
If there's no constraint on it, and you really need to set all values of that column to NULL, then I would test dropping the column and re-adding it.
Not sure if that would be faster or not, but I'd investigate it.
Try disabling the index temporarily.
You could change the syntax of your query slightly, but I had no difference in my testing by doing that. I was using STATISTICS IO and STATISTICS TIME.
You mention the column is indexed. You could disable it / re-enable it as part of your transaction. The t-sql for that is simple, see this - http://blog.sqlauthority.com/2007/05/17/sql-server-disable-index-enable-index-alter-index/
I've had to do that in the past for similar jobs and it has worked out well for me.
Try to implement like this
Disable Index
Drop the column
Create the column
Rebuild index
I can guess that it will improve performance.

CREATE TRIGGER is taking more than 30 minutes on SQL Server 2005

On our live/production database I'm trying to add a trigger to a table, but have been unsuccessful. I have tried a few times, but it has taken more than 30 minutes for the create trigger statement to complete and I've cancelled it.
The table is one that gets read/written to often by a couple different processes. I have disabled the scheduled jobs that update the table and attempted at times when there is less activity on the table, but I'm not able to stop everything that accesses the table.
I do not believe there is a problem with the create trigger statement itself. The create trigger statement was successful and quick in a test environment, and the trigger works correctly when rows are inserted/updated to the table. Although when I created the trigger on the test database there was no load on the table and it had considerably less rows, which is different than on the live/production database (100 vs. 13,000,000+).
Here is the create trigger statement that I'm trying to run
CREATE TRIGGER [OnItem_Updated]
ON [Item]
AFTER UPDATE
AS
BEGIN
SET NOCOUNT ON;
IF update(State)
BEGIN
/* do some stuff including for each row updated call a stored
procedure that increments a value in table based on the
UserId of the updated row */
END
END
Can there be issues with creating a trigger on a table while rows are being updated or if it has many rows?
In SQLServer triggers are created enabled by default. Is it possible to create the trigger disabled by default?
Any other ideas?
The problem may not be in the table itself, but in the system tables that have to be updated in order to create the trigger. If you're doing any other kind of DDL as part of your normal processes they could be holding it up.
Use sp_who to find out where the block is coming from then investigate from there.
I believe the CREATE Trigger will attempt to put a lock on the entire table.
If you have a lots of activity on that table it might have to wait a long time and you could be creating a deadlock.
For any schema changes you should really get everyone of the database.
That said it is tempting to put in "small" changes with active connections. You should take a look at the locks / connections to see where the lock contention is.
That's odd. An AFTER UPDATE trigger shouldn't need to check existing rows in the table. I suppose it's possible that you aren't able to obtain a lock on the table to add the trigger.
You might try creating a trigger that basically does nothing. If you can't create that, then it's a locking issue. If you can, then you could disable that trigger, add your intended code to the body, and enable it. (I do not believe you can disable a trigger during creation.)
Part of the problem may also be the trigger itself. Could your trigger accidentally be updating all rows of the table? There is a big differnce between 100 rows in a test database and 13,000,000. It is a very bad idea to develop code against such a small set when you have such a large dataset as you can have no way to predict performance. SQL that works fine for 100 records can completely lock up a system with millions for hours. You really want to know that in dev, not when you promote to prod.
Calling a stored proc in a trigger is usually a very bad choice. It also means that you have to loop through records which is an even worse choice in a trigger. Triggers must alawys account for multiple record inserts/updates or deletes. If someone inserts 100,000 rows (not unlikely if you have 13,000,000 records), then looping through a record based stored proc could take hours, lock the entire table and cause all users to want to hunt down the developer and kill (or at least maim) him because they cannot get their work done.
I would not even consider putting this trigger on prod until you test against a record set simliar in size to prod.
My friend Dennis wrote this article that illustrates why testing a small volumn of information when you have a large volumn of information can create difficulties on prd that you didn't notice on dev:
http://blogs.lessthandot.com/index.php/DataMgmt/?blog=3&title=your-testbed-has-to-have-the-same-volume&disp=single&more=1&c=1&tb=1&pb=1#c1210
Run DISABLE TRIGGER triggername ON tablename before altering the trigger, then reenable it with ENABLE TRIGGER triggername ON tablename