Query Session no longer respond - sql

I'm trying to execute the following T-SQL Statement:
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION
DECLARE #nRows INT = 1
DECLARE #DataCancellazione DATE = DATEADD(DAY, -7, GETDATE())
CREATE TABLE #IDToDel (ID BIGINT)
WHILE #nRows > 0
BEGIN
INSERT INTO #IDToDel
SELECT TOP 5000 LogID
FROM MioDB.Test
WHERE CAST(ReceivedDate AS date) < #DataCancellazione
SELECT #nRows = ##ROWCOUNT
DELETE RM WITH (PAGLOCK)
FROM MioDB.Test RM WITH (PAGLOCK)
INNER JOIN #IDToDel TBD ON RM.LogID = TBD.ID
TRUNCATE TABLE #IDToDel
END
ROLLBACK
When I launch the execution the query window seems to no longer respond and without having particular increase of CPUTime and DiskIO on the process. Can anyone help me thanks.

Honestly, I think you're overly complicating the problem. SQL Server can easily handle processing millions of rows in one go, and I suspect that you could likely do this in a few 1M row batches. If you have at least 4,000,000 rows you want to delete, then at 5,000 a batch that will take 800 iterations.
There is also no need for the temporary table, a DELETE can make use of a TOP, so you can just delete that many rows each cycle. I define this with a variable, and then pass that (1,000,000 rows). This would mean everything is deleted in 4 iterations, not 800. You may want to reduce the size a little, but I would suggest that 500,000 is easy pickings for instance.
This gives you the following more succinct batch:
SET NOCOUNT ON;
--The following transaction level seems like a terrible idea when you're performing DDL statements. Don't, just don't.
--SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DECLARE #BatchSize int = 1000000,
#DataCancellazione date = DATEADD(DAY, -7, GETDATE());
SELECT 1; --Dataset with 1 row
WHILE ##ROWCOUNT > 0
DELETE TOP (#BatchSize)
FROM MioDB.Test --A Schema called "MioDB" is a little confusing
WHERE ReceivedDate < #DataCancellazione; --Casting ReceivedDate would have had no change to the query
--And could well have slowed it down.

Related

How Do I Repeatedly Run SQL Delete Statements and Shrink Transaction Log?

I have a customer who is running out of space on their drive, and it's entirely taken up by the SQL DB and transaction log. Unfortunately, moving the DB and log are not an option available to us at the current time, so I need to figure out how to delete lines from 2 massive tables. So far a coworker has spent 4 days trying to do this and has not been able to put a dent in it. There are only 13GB available for the transaction log, and deleting large quantities from these tables wipes that 13GB out really quick. Obviously the quickest thing to do would be to move what we want to keep to a temporary table, truncate the existing tables, then move them back. Unfortunately, this is an extremely busy environment within a hospital, and there are tens of thousands of lines being written to these tables every hour. So, unfortunately, we can't temporarily stop writing to this table to be able to truncate.
So we've been deleting a month of data at a time from each of these tables, then shrinking the transaction log to do it again. I feel like there's got to be a way to just get this to repeat, but I'm not entirely sure what I'm doing... I tried:
delete top (10000)
from Table 1_
where CreationDate_ < '2017-06-01'
Go
delete top (10000)
from Table2_
where CreationDate_ < '2017-06-01'
Go
dbcc shrinkfile (SQL_Log,4)
Go 2
Go 2
This appears to remove 10,000 lines from each table, then runs the shrinkfile for the log twice (for some reason it doesn't fully shrink down to the 4224kb size when we only run it twice), but it does not seem to repeat. I've also tried adding () starting at the first delete statement and ending after the first "Go 2" line. When I do that it just says:
Incorrect syntax near "Go"
Anybody have any clue how to do this? If we can get this to work, I plan on increasing the delete statements to a number much larger than 10,000, and increasing the repeat on the script to something much larger than 2, but I need to get it to work before I can actually do that...
You can use a WHILE loop and control the number of iterations using a COUNT of records.
DECLARE #Chunk INT = 10000
DECLARE #Date DATE = '2017-06-01'
DECLARE #Cnt INT = SELECT COUNT(*) FROM Table 1_ WHERE CreationDate_ < #Date
WHILE #Cnt > 0
BEGIN
delete top (#Chunk)
from Table 1_
where CreationDate_ < #Date
SET #Cnt = #Cnt - #Chunk
END
//Move on to the next group
SET #Date = '2017-07-01'
SET #Cnt = SELECT COUNT(*) FROM Table 1_ WHERE CreationDate_ < #Date
WHILE #Cnt > 0
BEGIN
// Your delete query
SET #Cnt = #Cnt - #Chunk
END
//and so on
You don't need to shrink the log file repeatedly. If the database is in simple recovery mode the log file will not continue to grow, so long as you don't delete too many rows in a single transaction. Once all the rows have been purged you can shrink the log file down to something reasonable for the environment.

Inserting large number of records without locking the table

I am trying to insert 1,500,000 records into a table. Am facing table lock issues during the insertion. So I came up with the below batch insert.
DECLARE #BatchSize INT = 50000
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[Destination]
(proj_details_sid,
period_sid,
sales,
units)
SELECT TOP(#BatchSize) s.proj_details_sid,
s.period_sid,
s.sales,
s.units
FROM [dbo].[SOURCE] s
WHERE NOT EXISTS (SELECT 1
FROM dbo.Destination d
WHERE d.proj_details_sid = s.proj_details_sid
AND d.period_sid = s.period_sid)
IF ##ROWCOUNT < #BatchSize
BREAK
END
I have a clustered Index on Destination table (proj_details_sid ,period_sid ). NOT EXISTS part is just to restrict inserted records from again inserting into the table
Am I doing it right, will this avoid table lock ? or is there any better way.
Note : Time taken is more or less same with batch and without batch insert
Lock escalation is not likely to be related to the SELECT part of your statement at all.
It is a natural consequence of inserting a large number of rows
Lock escalation is triggered when lock escalation is not disabled on the table by using the ALTER TABLE SET LOCK_ESCALATION option, and when either of the following conditions exists:
A single Transact-SQL statement acquires at least 5,000 locks on a single nonpartitioned table or index.
A single Transact-SQL statement acquires at least 5,000 locks on a single partition of a partitioned table and the ALTER TABLE SET LOCK_ESCALATION option is set to AUTO.
The number of locks in an instance of the Database Engine exceeds memory or configuration thresholds.
If locks cannot be escalated because of lock conflicts, the Database Engine periodically triggers lock escalation at every 1,250 new locks acquired.
You can easily see this for yourself by tracing the lock escalation event in Profiler or simply trying the below with different batch sizes. For me TOP (6228) shows 6250 locks held but TOP (6229) it suddenly plummets to 1 as lock escalation kicks in. The exact numbers may vary (dependant on database settings and resources currently available). Use trial and error to find the threshold where lock escalation appears for you.
CREATE TABLE [dbo].[Destination]
(
proj_details_sid INT,
period_sid INT,
sales INT,
units INT
)
BEGIN TRAN --So locks are held for us to count in the next statement
INSERT INTO [dbo].[Destination]
SELECT TOP (6229) 1,
1,
1,
1
FROM master..spt_values v1,
master..spt_values v2
SELECT COUNT(*)
FROM sys.dm_tran_locks
WHERE request_session_id = ##SPID;
COMMIT
DROP TABLE [dbo].[Destination]
You are inserting 50,000 rows so almost certainly lock escalation will be attempted.
The article How to resolve blocking problems that are caused by lock escalation in SQL Server is quite old but a lot of the suggestions are still valid.
Break up large batch operations into several smaller operations (i.e. use a smaller batch size)
Lock escalation cannot occur if a different SPID is currently holding an incompatible table lock - The example they give is a different session executing
BEGIN TRAN
SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1=0
WAITFOR DELAY '1:00:00'
COMMIT TRAN
Disable lock escalation by enabling trace flag 1211 - However this is a global setting and can cause severe issues. There is a newer option 1224 that is less problematic but this is still global.
Another option would be to ALTER TABLE blah SET (LOCK_ESCALATION = DISABLE) but this is still not very targeted as it affects all queries against the table not just your single scenario here.
So I would opt for option 1 or possibly option 2 and discount the others.
Instead of checking the data exists in Destination, it seems better to store all data in temp table first, and batch insert into Destination
Reference: Using ROWLOCK in an INSERT statement (SQL Server)
DECLARE #batch int = 100
DECLARE #curRecord int = 1
DECLARE #maxRecord int
-- remove (nolock) if you don't want to have dirty read
SELECT row_number over (order by s.proj_details_sid, s.period_sid) as rownum,
s.proj_details_sid,
s.period_sid,
s.sales,
s.units
INTO #Temp
FROM [dbo].[SOURCE] s WITH (NOLOCK)
WHERE NOT EXISTS (SELECT 1
FROM dbo.Destination d WITH (NOLOCK)
WHERE d.proj_details_sid = s.proj_details_sid
AND d.period_sid = s.period_sid)
-- change this maxRecord if you want to limit the records to insert
SELECT #maxRecord = count(1) from #Temp
WHILE #maxRecord >= #curRecord
BEGIN
INSERT INTO [dbo].[Destination]
(proj_details_sid,
period_sid,
sales,
units)
SELECT proj_details_sid, period_sid, sales, units
FROM #Temp
WHERE rownum >= #curRecord and rownum < #curRecord + #batch
SET #curRecord = #curRecord + #batch
END
DROP TABLE #Temp
I added (NOLOCK) your destination table -> dbo.Destination(NOLOCK).
Now, You won't lock your table.
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[Destination]
(proj_details_sid,
period_sid,
sales,
units)
SELECT TOP(#BatchSize) s.proj_details_sid,
s.period_sid,
s.sales,
s.units
FROM [dbo].[SOURCE] s
WHERE NOT EXISTS (SELECT 1
FROM dbo.Destination(NOLOCK) d
WHERE d.proj_details_sid = s.proj_details_sid
AND d.period_sid = s.period_sid)
IF ##ROWCOUNT < #BatchSize
BREAK
END
To do this you can use WITH (NOLOCK) in your select statement.
BUT NOLOCK is not recommended on OLTP Databases.

How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table

Let's say we have table Sales with 30 columns and 500,000 rows. I would like to delete 400,000 in the table (those where "toDelete='1'").
But I have a few constraints :
the table is read / written "often" and I would not like a long "delete" to take a long time and lock the table for too long
I need to skip the transaction log (like with a TRUNCATE) but while doing a "DELETE ... WHERE..." (I need to put a condition), but haven't found any way to do this...
Any advice would be welcome to transform a
DELETE FROM Sales WHERE toDelete='1'
to something more partitioned & possibly transaction log free.
Calling DELETE FROM TableName will do the entire delete in one large transaction. This is expensive.
Here is another option which will delete rows in batches :
deleteMore:
DELETE TOP(10000) Sales WHERE toDelete='1'
IF ##ROWCOUNT != 0
goto deleteMore
I'll leave my answer here, since I was able to test different approaches for mass delete and update (I had to update and then delete 125+mio rows, server has 16GB of RAM, Xeon E5-2680 #2.7GHz, SQL Server 2012).
TL;DR: always update/delete by primary key, never by any other condition. If you can't use PK directly, create a temp table and fill it with PK values and update/delete your table using that table. Use indexes for this.
I started with solution from above (by #Kevin Aenmey), but this approach turned out to be inappropriate, since my database was live and it handles a couple of hundred transactions per second and there was some blocking involved (there was an index for all there fields from condition, using WITH(ROWLOCK) didn't change anything).
So, I added a WAITFOR statement, which allowed database to process other transactions.
deleteMore:
WAITFOR DELAY '00:00:01'
DELETE TOP(1000) FROM MyTable WHERE Column1 = #Criteria1 AND Column2 = #Criteria2 AND Column3 = #Criteria3
IF ##ROWCOUNT != 0
goto deleteMore
This approach was able to process ~1.6mio rows/hour for updating and ~0,2mio rows/hour for deleting.
Turning to temp tables changed things quite a lot.
deleteMore:
SELECT TOP 10000 Id /* Id is the PK */
INTO #Temp
FROM MyTable WHERE Column1 = #Criteria1 AND Column2 = #Criteria2 AND Column3 = #Criteria3
DELETE MT
FROM MyTable MT
JOIN #Temp T ON T.Id = MT.Id
/* you can use IN operator, it doesn't change anything
DELETE FROM MyTable WHERE Id IN (SELECT Id FROM #Temp)
*/
IF ##ROWCOUNT > 0 BEGIN
DROP TABLE #Temp
WAITFOR DELAY '00:00:01'
goto deleteMore
END ELSE BEGIN
DROP TABLE #Temp
PRINT 'This is the end, my friend'
END
This solution processed ~25mio rows/hour for updating (15x faster) and ~2.2mio rows/hour for deleting (11x faster).
What you want is batch processing.
While (select Count(*) from sales where toDelete =1) >0
BEGIN
Delete from sales where SalesID in
(select top 1000 salesId from sales where toDelete = 1)
END
Of course you can experiment which is the best value to use for the batch, I've used from 500 - 50000 depending on the table. If you use cascade delete, you will probably need a smaller number as you have those child records to delete.
One way I have had to do this in the past is to have a stored procedure or script that deletes n records. Repeat until done.
DELETE TOP 1000 FROM Sales WHERE toDelete='1'
You should try to give it a ROWLOCK hint so it will not lock the entire table. However, if you delete a lot of rows lock escalation will occur.
Also, make sure you have a non-clustered filtered index (only for 1 values) on the toDelete column. If possible make it a bit column, not varchar (or what it is now).
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
Ultimately, you can try to iterate over the table and delete in chunks.
Updated
Since while loops and chunk deletes are the new pink here, I'll throw in my version too (combined with my previous answer):
SET ROWCOUNT 100
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
WHILE ##rowcount > 0
BEGIN
SET ROWCOUNT 100
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
END
My own take on this functionality would be as follows.
This way there is no repeated code and you can manage your chunk size.
DECLARE #DeleteChunk INT = 10000
DECLARE #rowcount INT = 1
WHILE #rowcount > 0
BEGIN
DELETE TOP (#DeleteChunk) FROM Sales WITH(ROWLOCK)
SELECT #rowcount = ##RowCount
END
I have used the below to delete around 50 million records -
BEGIN TRANSACTION
DeleteOperation:
DELETE TOP (BatchSize)
FROM [database_name].[database_schema].[database_table]
IF ##ROWCOUNT > 0
GOTO DeleteOperation
COMMIT TRANSACTION
Please note that keeping the BatchSize < 5000 is less expensive on resources.
As I assume the best way to delete huge amount of records is to delete it by Primary Key. (What is Primary Key see here)
So you have to generate tsql script that contains the whole list of lines to delete and after this execute this script.
For example code below is gonna generate that file
GO
SET NOCOUNT ON
SELECT 'DELETE FROM DATA_ACTION WHERE ID = ' + CAST(ID AS VARCHAR(50)) + ';' + CHAR(13) + CHAR(10) + 'GO'
FROM DATA_ACTION
WHERE YEAR(AtTime) = 2014
The ouput file is gonna have records like
DELETE FROM DATA_ACTION WHERE ID = 123;
GO
DELETE FROM DATA_ACTION WHERE ID = 124;
GO
DELETE FROM DATA_ACTION WHERE ID = 125;
GO
And now you have to use SQLCMD utility in order to execute this script.
sqlcmd -S [Instance Name] -E -d [Database] -i [Script]
You can find this approach explaned here https://www.mssqltips.com/sqlservertip/3566/deleting-historical-data-from-a-large-highly-concurrent-sql-server-database-table/
Here's how I do it when I know approximately how many iterations:
delete from Activities with(rowlock) where Id in (select top 999 Id from Activities
(nolock) where description like 'financial data update date%' and len(description) = 87
and User_Id = 2);
waitfor delay '00:00:02'
GO 20
Edit: This worked better and faster for me than selecting top:
declare #counter int = 1
declare #msg varchar(max)
declare #batch int = 499
while ( #counter <= 37600)
begin
set #msg = ('Iteration count = ' + convert(varchar,#counter))
raiserror(#msg,0,1) with nowait
delete Activities with (rowlock) where Id in (select Id from Activities (nolock) where description like 'financial data update date%' and len(description) = 87 and User_Id = 2 order by Id asc offset 1 ROWS fetch next #batch rows only)
set #counter = #counter + 1
waitfor delay '00:00:02'
end
Declare #counter INT
Set #counter = 10 -- (you can always obtain the number of rows to be deleted and set the counter to that value)
While #Counter > 0
Begin
Delete TOP (4000) from <Tablename> where ID in (Select ID from <sametablename> with (NOLOCK) where DateField < '2021-01-04') -- or opt for GetDate() -1
Set #Counter = #Counter -1 -- or set #counter = #counter - 4000 if you know number of rows to be deleted.
End

Copy one column to another for over a billion rows in SQL Server database

Database : SQL Server 2005
Problem : Copy values from one column to another column in the same table with a billion+
rows.
test_table (int id, bigint bigid)
Things tried 1: update query
update test_table set bigid = id
fills up the transaction log and rolls back due to lack of transaction log space.
Tried 2 - a procedure on following lines
set nocount on
set rowcount = 500000
while #rowcount > 0
begin
update test_table set bigid = id where bigid is null
set #rowcount = ##rowcount
set #rowupdated = #rowsupdated + #rowcount
end
print #rowsupdated
The above procedure starts slowing down as it proceeds.
Tried 3 - Creating a cursor for update.
generally discouraged in SQL Server documentation and this approach updates one row at a time which is too time consuming.
Is there an approach that can speed up the copying of values from one column to another. Basically I am looking for some 'magic' keyword or logic that will allow the update query to rip through the billion rows half a million at a time sequentially.
Any hints, pointers will be much appreciated.
I'm going to guess that you are closing in on the 2.1billion limit of an INT datatype on an artificial key for a column. Yes, that's a pain. Much easier to fix before the fact than after you've actually hit that limit and production is shut down while you are trying to fix it :)
Anyway, several of the ideas here will work. Let's talk about speed, efficiency, indexes, and log size, though.
Log Growth
The log blew up originally because it was trying to commit all 2b rows at once. The suggestions in other posts for "chunking it up" will work, but that may not totally resolve the log issue.
If the database is in SIMPLE mode, you'll be fine (the log will re-use itself after each batch). If the database is in FULL or BULK_LOGGED recovery mode, you'll have to run log backups frequently during the running of your operation so that SQL can re-use the log space. This might mean increasing the frequency of the backups during this time, or just monitoring the log usage while running.
Indexes and Speed
ALL of the where bigid is null answers will slow down as the table is populated, because there is (presumably) no index on the new BIGID field. You could, (of course) just add an index on BIGID, but I'm not convinced that is the right answer.
The key (pun intended) is my assumption that the original ID field is probably the primary key, or the clustered index, or both. In that case, lets take advantage of that fact, and do a variation of Jess' idea:
set #counter = 1
while #counter < 2000000000 --or whatever
begin
update test_table set bigid = id
where id between #counter and (#counter + 499999) --BETWEEN is inclusive
set #counter = #counter + 500000
end
This should be extremely fast, because of the existing indexes on ID.
The ISNULL check really wasn't necessary anyway, neither is my (-1) on the interval. If we duplicate some rows between calls, that's not a big deal.
Use TOP in the UPDATE statement:
UPDATE TOP (#row_limit) dbo.test_table
SET bigid = id
WHERE bigid IS NULL
You could try to use something like SET ROWCOUNT and do batch updates:
SET ROWCOUNT 5000;
UPDATE dbo.test_table
SET bigid = id
WHERE bigid IS NULL
GO
and then repeat this as many times as you need to.
This way, you're avoiding the RBAR (row-by-agonizing-row) symptoms of cursors and while loops, and yet, you don't unnecessarily fill up your transaction log.
Of course, in between runs, you'd have to do backups (especially of your log) to keep its size within reasonable limits.
Is this a one time thing? If so, just do it by ranges:
set counter = 500000
while #counter < 2000000000 --or whatever your max id
begin
update test_table set bigid = id where id between (#counter - 500000) and #counter and bigid is null
set counter = #counter + 500000
end
I didn't run this to try it, but if you can get it to update 500k at a time I think you're moving in the right direction.
set rowcount 500000
update test_table tt1
set bigid = (SELECT tt2.id FROM test_table tt2 WHERE tt1.id = tt2.id)
where bigid IS NULL
You can also try changing the recover model so you don't log the transactions
ALTER DATABASE db1
SET RECOVERY SIMPLE
GO
update test_table
set bigid = id
GO
ALTER DATABASE db1
SET RECOVERY FULL
GO
First step, if there are any, would be to drop indexes before the operation. This is probably what is causing the speed degrade with time.
The other option, a little outside the box thinking...can you express the update in such a way that you could materialize the column values in a select? If you can do this then you could create what amounts to a NEW table using SELECT INTO which is a minimally logged operation (assuming in 2005 that you are set to a recovery model of SIMPLE or BULK LOGGED). This would be pretty fast and then you can drop the old table, rename this table to to old table name and recreate any indexes.
select id, CAST(id as bigint) bigid into test_table_temp from test_table
drop table test_table
exec sp_rename 'test_table_temp', 'test_table'
I second the
UPDATE TOP(X) statement
Also to suggest, if you're in a loop, add in some WAITFOR delay or COMMIT between, to allow other processes some time to use the table if needed vs. blocking forever until all the updates are completed

Batch commit on large INSERT operation in native SQL?

I have a couple large tables (188m and 144m rows) I need to populate from views, but each view contains a few hundred million rows (pulling together pseudo-dimensionally modelled data into a flat form). The keys on each table are over 50 composite bytes of columns. If the data was in tables, I could always think about using sp_rename to make the other new table, but that isn't really an option.
If I do a single INSERT operation, the process uses a huge amount of transaction log space, typicalyl filing it up and prompting a bunch of hassle with the DBAs. (And yes, this is probably a job the DBAs should handle/design/architect)
I can use SSIS and stream the data into the destination table with batch commits (but this does require the data to be transmitted over the network, since we are not allowed to run SSIS packages on the server).
Any things other than to divide the process up into multiple INSERT operations using some kind of key to distribute the rows into different batches and doing a loop?
Does the view have ANY kind of unique identifier / candidate key? If so, you could select those rows into a working table using:
SELECT key_columns INTO dbo.temp FROM dbo.HugeView;
(If it makes sense, maybe put this table into a different database, perhaps with SIMPLE recovery model, to prevent the log activity from interfering with your primary database. This should generate much less log anyway, and you can free up the space in the other database before you resume, in case the problem is that you have inadequate disk space all around.)
Then you can do something like this, inserting 10,000 rows at a time, and backing up the log in between:
SET NOCOUNT ON;
DECLARE
#batchsize INT,
#ctr INT,
#rc INT;
SELECT
#batchsize = 10000,
#ctr = 0;
WHILE 1 = 1
BEGIN
WITH x AS
(
SELECT key_column, rn = ROW_NUMBER() OVER (ORDER BY key_column)
FROM dbo.temp
)
INSERT dbo.PrimaryTable(a, b, c, etc.)
SELECT v.a, v.b, v.c, etc.
FROM x
INNER JOIN dbo.HugeView AS v
ON v.key_column = x.key_column
WHERE x.rn > #batchsize * #ctr
AND x.rn <= #batchsize * (#ctr + 1);
IF ##ROWCOUNT = 0
BREAK;
BACKUP LOG PrimaryDB TO DISK = 'C:\db.bak' WITH INIT;
SET #ctr = #ctr + 1;
END
That's all off the top of my head, so don't cut/paste/run, but I think the general idea is there. For more details (and why I backup log / checkpoint inside the loop), see this post on sqlperformance.com:
Break large delete operations into chunks
Note that if you are taking regular database and log backups you will probably want to take a full to start your log chain over again.
You could partition your data and insert your data in a cursor loop. That would be nearly the same as SSIS batchinserting. But runs on your server.
create cursor ....
select YEAR(DateCol), MONTH(DateCol) from whatever
while ....
insert into yourtable(...)
select * from whatever
where YEAR(DateCol) = year and MONTH(DateCol) = month
end
I know this is an old thread, but I made a generic version of Arthur's cursor solution:
--Split a batch up into chunks using a cursor.
--This method can be used for most any large table with some modifications
--It could also be refined further with an #Day variable (for example)
DECLARE #Year INT
DECLARE #Month INT
DECLARE BatchingCursor CURSOR FOR
SELECT DISTINCT YEAR(<SomeDateField>),MONTH(<SomeDateField>)
FROM <Sometable>;
OPEN BatchingCursor;
FETCH NEXT FROM BatchingCursor INTO #Year, #Month;
WHILE ##FETCH_STATUS = 0
BEGIN
--All logic goes in here
--Any select statements from <Sometable> need to be suffixed with:
--WHERE Year(<SomeDateField>)=#Year AND Month(<SomeDateField>)=#Month
FETCH NEXT FROM BatchingCursor INTO #Year, #Month;
END;
CLOSE BatchingCursor;
DEALLOCATE BatchingCursor;
GO
This solved the problem on loads of our large tables.
There is no pixie dust, you know that.
Without knowing specifics about the actual schema being transfered, a generic solution would be exactly as you describe it: divide processing into multiple inserts and keep track of the key(s). This is sort of pseudo-code T-SQL:
create table currentKeys (table sysname not null primary key, key sql_variant not null);
go
declare #keysInserted table (key sql_variant);
declare #key sql_variant;
begin transaction
do while (1=1)
begin
select #key = key from currentKeys where table = '<target>';
insert into <target> (...)
output inserted.key into #keysInserted (key)
select top (<batchsize>) ... from <source>
where key > #key
order by key;
if (0 = ##rowcount)
break;
update currentKeys
set key = (select max(key) from #keysInserted)
where table = '<target>';
commit;
delete from #keysInserted;
set #key = null;
begin transaction;
end
commit
It would get more complicated if you want to allow for parallel batches and partition the keys.
You could use the BCP command to load the data and use the Batch Size parameter
http://msdn.microsoft.com/en-us/library/ms162802.aspx
Two step process
BCP OUT data from Views into Text files
BCP IN data from Text files into Tables with batch size parameter
This looks like a job for good ol' BCP.