Batch commit on large INSERT operation in native SQL?

Batch commit on large INSERT operation in native SQL? - sql

I have a couple large tables (188m and 144m rows) I need to populate from views, but each view contains a few hundred million rows (pulling together pseudo-dimensionally modelled data into a flat form). The keys on each table are over 50 composite bytes of columns. If the data was in tables, I could always think about using sp_rename to make the other new table, but that isn't really an option.
If I do a single INSERT operation, the process uses a huge amount of transaction log space, typicalyl filing it up and prompting a bunch of hassle with the DBAs. (And yes, this is probably a job the DBAs should handle/design/architect)
I can use SSIS and stream the data into the destination table with batch commits (but this does require the data to be transmitted over the network, since we are not allowed to run SSIS packages on the server).
Any things other than to divide the process up into multiple INSERT operations using some kind of key to distribute the rows into different batches and doing a loop?

Does the view have ANY kind of unique identifier / candidate key? If so, you could select those rows into a working table using:
SELECT key_columns INTO dbo.temp FROM dbo.HugeView;
(If it makes sense, maybe put this table into a different database, perhaps with SIMPLE recovery model, to prevent the log activity from interfering with your primary database. This should generate much less log anyway, and you can free up the space in the other database before you resume, in case the problem is that you have inadequate disk space all around.)
Then you can do something like this, inserting 10,000 rows at a time, and backing up the log in between:
SET NOCOUNT ON;
DECLARE
#batchsize INT,
#ctr INT,
#rc INT;
SELECT
#batchsize = 10000,
#ctr = 0;
WHILE 1 = 1
BEGIN
WITH x AS
(
SELECT key_column, rn = ROW_NUMBER() OVER (ORDER BY key_column)
FROM dbo.temp
)
INSERT dbo.PrimaryTable(a, b, c, etc.)
SELECT v.a, v.b, v.c, etc.
FROM x
INNER JOIN dbo.HugeView AS v
ON v.key_column = x.key_column
WHERE x.rn > #batchsize * #ctr
AND x.rn <= #batchsize * (#ctr + 1);
IF ##ROWCOUNT = 0
BREAK;
BACKUP LOG PrimaryDB TO DISK = 'C:\db.bak' WITH INIT;
SET #ctr = #ctr + 1;
END
That's all off the top of my head, so don't cut/paste/run, but I think the general idea is there. For more details (and why I backup log / checkpoint inside the loop), see this post on sqlperformance.com:
Break large delete operations into chunks
Note that if you are taking regular database and log backups you will probably want to take a full to start your log chain over again.

You could partition your data and insert your data in a cursor loop. That would be nearly the same as SSIS batchinserting. But runs on your server.
create cursor ....
select YEAR(DateCol), MONTH(DateCol) from whatever
while ....
insert into yourtable(...)
select * from whatever
where YEAR(DateCol) = year and MONTH(DateCol) = month
end

I know this is an old thread, but I made a generic version of Arthur's cursor solution:
--Split a batch up into chunks using a cursor.
--This method can be used for most any large table with some modifications
--It could also be refined further with an #Day variable (for example)
DECLARE #Year INT
DECLARE #Month INT
DECLARE BatchingCursor CURSOR FOR
SELECT DISTINCT YEAR(<SomeDateField>),MONTH(<SomeDateField>)
FROM <Sometable>;
OPEN BatchingCursor;
FETCH NEXT FROM BatchingCursor INTO #Year, #Month;
WHILE ##FETCH_STATUS = 0
BEGIN
--All logic goes in here
--Any select statements from <Sometable> need to be suffixed with:
--WHERE Year(<SomeDateField>)=#Year AND Month(<SomeDateField>)=#Month
FETCH NEXT FROM BatchingCursor INTO #Year, #Month;
END;
CLOSE BatchingCursor;
DEALLOCATE BatchingCursor;
GO
This solved the problem on loads of our large tables.

There is no pixie dust, you know that.
Without knowing specifics about the actual schema being transfered, a generic solution would be exactly as you describe it: divide processing into multiple inserts and keep track of the key(s). This is sort of pseudo-code T-SQL:
create table currentKeys (table sysname not null primary key, key sql_variant not null);
go
declare #keysInserted table (key sql_variant);
declare #key sql_variant;
begin transaction
do while (1=1)
begin
select #key = key from currentKeys where table = '<target>';
insert into <target> (...)
output inserted.key into #keysInserted (key)
select top (<batchsize>) ... from <source>
where key > #key
order by key;
if (0 = ##rowcount)
break;
update currentKeys
set key = (select max(key) from #keysInserted)
where table = '<target>';
commit;
delete from #keysInserted;
set #key = null;
begin transaction;
end
commit
It would get more complicated if you want to allow for parallel batches and partition the keys.

You could use the BCP command to load the data and use the Batch Size parameter
http://msdn.microsoft.com/en-us/library/ms162802.aspx
Two step process
BCP OUT data from Views into Text files
BCP IN data from Text files into Tables with batch size parameter

This looks like a job for good ol' BCP.

Related

INSERT 150TB data into the table [duplicate]

I have a table with 3.4 million rows. I want to copy this whole data into another table.
I am performing this task using the below query:
select *
into new_items
from productDB.dbo.items
I need to know the best possible way to do this task.

I had the same problem, except I have a table with 2 billion rows, so the log file would grow to no end if I did this, even with the recovery model set to Bulk-Logging:
insert into newtable select * from oldtable
So I operate on blocks of data. This way, if the transfer is interupted, you just restart it. Also, you don't need a log file as big as the table. You also seem to get less tempdb I/O, not sure why.
set identity_insert newtable on
DECLARE #StartID bigint, #LastID bigint, #EndID bigint
select #StartID = isNull(max(id),0) + 1
from newtable
select #LastID = max(ID)
from oldtable
while #StartID < #LastID
begin
set #EndID = #StartID + 1000000
insert into newtable (FIELDS,GO,HERE)
select FIELDS,GO,HERE from oldtable (NOLOCK)
where id BETWEEN #StartID AND #EndId
set #StartID = #EndID + 1
end
set identity_insert newtable off
go
You might need to change how you deal with IDs, this works best if your table is clustered by ID.

If you are copying into a new table, the quickest way is probably what you have in your question, unless your rows are very large.
If your rows are very large, you may want to use the bulk insert functions in SQL Server. I think you can call them from C#.
Or you can first download that data into a text file, then bulk-copy (bcp) it. This has the additional benefit of allowing you to ignore keys, indexes etc.
Also try the Import/Export utility that comes with the SQL Management Studio; not sure whether it will be as fast as a straight bulk-copy, but it should allow you to skip the intermediate step of writing out as a flat file, and just copy directly table-to-table, which might be a bit faster than your SELECT INTO statement.

I have been working with our DBA to copy an audit table with 240M rows to another database.
Using a simple select/insert created a huge tempdb file.
Using a the Import/Export wizard worked but copied 8M rows in 10min
Creating a custom SSIS package and adjusting settings copied 30M rows in 10Min
The SSIS package turned out to be the fastest and most efficent for our purposes
Earl

Here's another way of transferring large tables. I've just transferred 105 million rows between two servers using this. Quite quick too.
Right-click on the database and choose Tasks/Export Data.
A wizard will take you through the steps but you choosing your SQL server client as the data source and target will allow you to select the database and table(s) you wish to transfer.
For more information, see https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/

If it's a 1 time import, the Import/Export utility in SSMS will probably work the easiest and fastest. SSIS also seems to work better for importing large data sets than a straight INSERT.
BULK INSERT or BCP can also be used to import large record sets.
Another option would be to temporarily remove all indexes and constraints on the table you're importing into and add them back once the import process completes. A straight INSERT that previously failed might work in those cases.
If you're dealing with timeouts or locking/blocking issues when going directly from one database to another, you might consider going from one db into TEMPDB and then going from TEMPDB into the other database as it minimizes the effects of locking and blocking processes on either side. TempDB won't block or lock the source and it won't hold up the destination.
Those are a few options to try.
-Eric Isaacs

Simple Insert/Select sp's work great until the row count exceeds 1 mil. I've watched tempdb file explode trying to insert/select 20 mil + rows. The simplest solution is SSIS setting the batch row size buffer to 5000 and commit size buffer to 1000.

I know this is late, but if you are encountering semaphore timeouts then you can use row_number to set increments for your insert(s) using something like
INSERT INTO DestinationTable (column1, column2, etc)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN , column1, column2, etc
FROM SourceTable ) AS A
WHERE A.RN >= 1 AND A.RN <= 10000 )
The size of the log file will grow, so there is that to contend with. You get better performance if you disable constraints and index when inserting into an existing table. Then enable the constraints and rebuild the index for the table you inserted into once the insertion is complete.

I like the solution from #Mathieu Longtin to copy in batches thereby minimising log file issues and created a version with OFFSET FETCH as suggested by #CervEd.
Others have suggested using the Import/Export Wizard or SSIS packages, but that's not always possible.
It's probably overkill for many but my solution includes some checks for record counts and outputs progress as well.
USE [MyDB]
GO
SET NOCOUNT ON;
DECLARE #intStart int = 1;
DECLARE #intCount int;
DECLARE #intFetch int = 10000;
DECLARE #strStatus VARCHAR(200);
DECLARE #intCopied int = 0;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Getting count of HISTORY records currently in MyTable...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SELECT #intCount = COUNT(*) FROM [dbo].MyTable WHERE IsHistory = 1;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Count of HISTORY records currently in MyTable: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT; --(note: PRINT resets ##ROWCOUNT to 0 so using RAISERROR instead)
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Starting copy...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
WHILE #intStart < #intCount
BEGIN
INSERT INTO [dbo].[MyTable_History] (
[PK1], [PK2], [PK3], [Data1], [Data2])
SELECT
[PK1], [PK2], [PK3], [Data1], [Data2]
FROM [MyDB].[dbo].[MyTable]
WHERE IsHistory = 1
ORDER BY
[PK1], [PK2], [PK3]
OFFSET #intStart - 1 ROWS
FETCH NEXT #intFetch ROWS ONLY;
SET #intCopied = #intCopied + ##ROWCOUNT;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Records copied so far: ' + CONVERT(VARCHAR(20), #intCopied);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SET #intStart = #intStart + #intFetch;
END
--Check the record count is correct.
IF #intCopied = #intCount
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Correct record count.';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
ELSE
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Only ' + CONVERT(VARCHAR(20), #intCopied) + ' records were copied, expected: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
GO

If your focus is Archiving (DW) and are dealing with VLDB with 100+ partitioned tables and you want to isolate most of these resource intensive work on a non production server (OLTP) here is a suggestion (OLTP -> DW)
1) Use backup / Restore to get the data onto the archive server (so now, on Archive or DW you will have Stage and Target database)
2) Stage database: Use partition switch to move data to corresponding stage table
3) Use SSIS to transfer data from staged database to target database for each staged table on both sides
4) Target database: Use partition switch on target database to move data from stage to base table
Hope this helps.

select * into new_items from productDB.dbo.items
That pretty much is it. THis is the most efficient way to do it.

SQL Server: Merge in iterations

I have to merge millions of rows into a table. The target table has an AFTER UPDATE trigger. The whole process is consuming a lot more memory than I'd like to allocate, and tempdb is eating up disk space.
I'd like to have the MERGE command run in batches of 100,000 records at a time. With SET ROWCOUNT being deprecated and cursors being inefficient, I'm not sure what the best approach for this would be.

A set oriented approach would be best way to run the query efficiently. So what you are doing seems just fine to me. If it consumes temp db etc, would need to know if you are doing any row by row operations which is slowing down. Generally a MERGE is a single statement and therefore is efficent.
The other things to consider is, if you got indexes on the destination table, then you could drop those indexes and then run the MERGE followed by recreating the indexes.
Coming to your question, you can split up into batches by using the mod operator and running the MERGE in a loop
eg:
declare #i int
select #i=count(*)/10 from source
while #i>0
begin
merge
into dest d
using (select *
from source
where id%10000=i --here id is the primary key of the source table
) s
on d.id=s.id
when matched
set ...
when not matched
insert...
...rest of the insert/update logic here
set #i=#i-1
end

Try WHILE loop
DECLARE #I INT = 1
WHILE (#I > 0)
BEGIN
;MERGE INTO Dst USING (
SELECT TOP 1000
FROM Src
WHERE NotUpdated
)
...
SET #I = ##ROWCOUNT
END

How to force a running t-sql query (half done) to commit?

I have database on Sql Server 2008 R2.
On that database a delete query on 400 Million records, has been running for 4 days , but I need to reboot the machine. How can I force it to commit whatever is deleted so far? I want to reject that data which is deleted by running query so far.
But problem is that query is still running and will not complete before the server reboot.
Note : I have not set any isolation / begin/end transaction for the query. The query is running in SSMS studio.
If machine reboot or I cancelled the query, then database will go in recovery mode and it will recovering for next 2 days, then I need to re-run the delete and it will cost me another 4 days.
I really appreciate any suggestion / help or guidance in this.
I am novice user of sql server.
Thanks in Advance
Regards

There is no way to stop SQL Server from trying to bring the database into a transactionally consistent state. Every single statement is implicitly a transaction itself (if not part of an outer transaction) and is executing either all or nothing. So if you either cancel the query or disconnect or reboot the server, SQL Server will from transaction log write the original values back to the updated data pages.
Next time when you delete so many rows at once, don't do it at once. Divide the job in smaller chunks (I always use 5.000 as a magic number, meaning I delete 5000 rows at the time in the loop) to minimize transaction log use and locking.
set rowcount 5000
delete table
while ##rowcount = 5000
delete table
set rowcount 0

If you are deleting that many rows you may have a better time with truncate. Truncate deletes all rows from the table very efficiently. However, I'm assuming that you would like to keep some of the records in the table. The stored procedure below backs up the data you would like to keep into a temp table then truncates then re-inserts the records that were saved. This can clean a huge table very quickly.
Note that truncate doesn't play well with Foreign Key constraints so you may need to drop those then recreate them after cleaned.
CREATE PROCEDURE [dbo].[deleteTableFast] (
#TableName VARCHAR(100),
#WhereClause varchar(1000))
AS
BEGIN
-- input:
-- table name: is the table to use
-- where clause: is the where clause of the records to KEEP
declare #tempTableName varchar(100);
set #tempTableName = #tableName+'_temp_to_truncate';
-- error checking
if exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#tempTableName)) begin
print 'ERROR: already temp table ... exiting'
return
end
if not exists (SELECT [Table_Name] FROM Information_Schema.COLUMNS WHERE [TABLE_NAME] =(#TableName)) begin
print 'ERROR: table does not exist ... exiting'
return
end
-- save wanted records via a temp table to be able to truncate
exec ('select * into '+#tempTableName+' from '+#TableName+' WHERE '+#WhereClause);
exec ('truncate table '+#TableName);
exec ('insert into '+#TableName+' select * from '+#tempTableName);
exec ('drop table '+#tempTableName);
end
GO

You must know D(Durability) in ACID first before you understand why database goes to Recovery mode.
Generally speaking, you should avoid long running SQL if possible. Long running SQL means more lock time on resource, larger transaction log and huge rollback time when it fails.
Consider divided your task some id or time. For example, you want to insert large volume data from TableSrc to TableTarget, you can write query like
DECLARE #BATCHCOUNT INT = 1000;
DECLARE #Id INT = 0;
DECLARE #Max = ...;
WHILE Id < #Max
BEGIN
INSERT INTO TableTarget
FROM TableSrc
WHERE PrimaryKey >= #Id AND #PrimaryKey < #Id + #BatchCount;
SET #Id = #Id + #BatchCount;
END
It's ugly more code and more error prone. But it's the only way I know to deal with huge data volume.

How to copy a huge table data into another table in SQL Server

I have a table with 3.4 million rows. I want to copy this whole data into another table.
I am performing this task using the below query:
select *
into new_items
from productDB.dbo.items
I need to know the best possible way to do this task.

I had the same problem, except I have a table with 2 billion rows, so the log file would grow to no end if I did this, even with the recovery model set to Bulk-Logging:
insert into newtable select * from oldtable
So I operate on blocks of data. This way, if the transfer is interupted, you just restart it. Also, you don't need a log file as big as the table. You also seem to get less tempdb I/O, not sure why.
set identity_insert newtable on
DECLARE #StartID bigint, #LastID bigint, #EndID bigint
select #StartID = isNull(max(id),0) + 1
from newtable
select #LastID = max(ID)
from oldtable
while #StartID < #LastID
begin
set #EndID = #StartID + 1000000
insert into newtable (FIELDS,GO,HERE)
select FIELDS,GO,HERE from oldtable (NOLOCK)
where id BETWEEN #StartID AND #EndId
set #StartID = #EndID + 1
end
set identity_insert newtable off
go
You might need to change how you deal with IDs, this works best if your table is clustered by ID.

If you are copying into a new table, the quickest way is probably what you have in your question, unless your rows are very large.
If your rows are very large, you may want to use the bulk insert functions in SQL Server. I think you can call them from C#.
Or you can first download that data into a text file, then bulk-copy (bcp) it. This has the additional benefit of allowing you to ignore keys, indexes etc.
Also try the Import/Export utility that comes with the SQL Management Studio; not sure whether it will be as fast as a straight bulk-copy, but it should allow you to skip the intermediate step of writing out as a flat file, and just copy directly table-to-table, which might be a bit faster than your SELECT INTO statement.

I have been working with our DBA to copy an audit table with 240M rows to another database.
Using a simple select/insert created a huge tempdb file.
Using a the Import/Export wizard worked but copied 8M rows in 10min
Creating a custom SSIS package and adjusting settings copied 30M rows in 10Min
The SSIS package turned out to be the fastest and most efficent for our purposes
Earl

Here's another way of transferring large tables. I've just transferred 105 million rows between two servers using this. Quite quick too.
Right-click on the database and choose Tasks/Export Data.
A wizard will take you through the steps but you choosing your SQL server client as the data source and target will allow you to select the database and table(s) you wish to transfer.
For more information, see https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/

If it's a 1 time import, the Import/Export utility in SSMS will probably work the easiest and fastest. SSIS also seems to work better for importing large data sets than a straight INSERT.
BULK INSERT or BCP can also be used to import large record sets.
Another option would be to temporarily remove all indexes and constraints on the table you're importing into and add them back once the import process completes. A straight INSERT that previously failed might work in those cases.
If you're dealing with timeouts or locking/blocking issues when going directly from one database to another, you might consider going from one db into TEMPDB and then going from TEMPDB into the other database as it minimizes the effects of locking and blocking processes on either side. TempDB won't block or lock the source and it won't hold up the destination.
Those are a few options to try.
-Eric Isaacs

Simple Insert/Select sp's work great until the row count exceeds 1 mil. I've watched tempdb file explode trying to insert/select 20 mil + rows. The simplest solution is SSIS setting the batch row size buffer to 5000 and commit size buffer to 1000.

I know this is late, but if you are encountering semaphore timeouts then you can use row_number to set increments for your insert(s) using something like
INSERT INTO DestinationTable (column1, column2, etc)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN , column1, column2, etc
FROM SourceTable ) AS A
WHERE A.RN >= 1 AND A.RN <= 10000 )
The size of the log file will grow, so there is that to contend with. You get better performance if you disable constraints and index when inserting into an existing table. Then enable the constraints and rebuild the index for the table you inserted into once the insertion is complete.

I like the solution from #Mathieu Longtin to copy in batches thereby minimising log file issues and created a version with OFFSET FETCH as suggested by #CervEd.
Others have suggested using the Import/Export Wizard or SSIS packages, but that's not always possible.
It's probably overkill for many but my solution includes some checks for record counts and outputs progress as well.
USE [MyDB]
GO
SET NOCOUNT ON;
DECLARE #intStart int = 1;
DECLARE #intCount int;
DECLARE #intFetch int = 10000;
DECLARE #strStatus VARCHAR(200);
DECLARE #intCopied int = 0;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Getting count of HISTORY records currently in MyTable...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SELECT #intCount = COUNT(*) FROM [dbo].MyTable WHERE IsHistory = 1;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Count of HISTORY records currently in MyTable: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT; --(note: PRINT resets ##ROWCOUNT to 0 so using RAISERROR instead)
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Starting copy...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
WHILE #intStart < #intCount
BEGIN
INSERT INTO [dbo].[MyTable_History] (
[PK1], [PK2], [PK3], [Data1], [Data2])
SELECT
[PK1], [PK2], [PK3], [Data1], [Data2]
FROM [MyDB].[dbo].[MyTable]
WHERE IsHistory = 1
ORDER BY
[PK1], [PK2], [PK3]
OFFSET #intStart - 1 ROWS
FETCH NEXT #intFetch ROWS ONLY;
SET #intCopied = #intCopied + ##ROWCOUNT;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Records copied so far: ' + CONVERT(VARCHAR(20), #intCopied);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SET #intStart = #intStart + #intFetch;
END
--Check the record count is correct.
IF #intCopied = #intCount
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Correct record count.';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
ELSE
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Only ' + CONVERT(VARCHAR(20), #intCopied) + ' records were copied, expected: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
GO

If your focus is Archiving (DW) and are dealing with VLDB with 100+ partitioned tables and you want to isolate most of these resource intensive work on a non production server (OLTP) here is a suggestion (OLTP -> DW)
1) Use backup / Restore to get the data onto the archive server (so now, on Archive or DW you will have Stage and Target database)
2) Stage database: Use partition switch to move data to corresponding stage table
3) Use SSIS to transfer data from staged database to target database for each staged table on both sides
4) Target database: Use partition switch on target database to move data from stage to base table
Hope this helps.

select * into new_items from productDB.dbo.items
That pretty much is it. THis is the most efficient way to do it.

Possible to implement a manual increment with just simple SQL INSERT?

I have a primary key that I don't want to auto increment (for various reasons) and so I'm looking for a way to simply increment that field when I INSERT. By simply, I mean without stored procedures and without triggers, so just a series of SQL commands (preferably one command).
Here is what I have tried thus far:
BEGIN TRAN
INSERT INTO Table1(id, data_field)
VALUES ( (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]');
COMMIT TRAN;
* Data abstracted to use generic names and identifiers
However, when executed, the command errors, saying that
"Subqueries are not allowed in this
context. only scalar expressions are
allowed"
So, how can I do this/what am I doing wrong?
EDIT: Since it was pointed out as a consideration, the table to be inserted into is guaranteed to have at least 1 row already.

You understand that you will have collisions right?
you need to do something like this and this might cause deadlocks so be very sure what you are trying to accomplish here
DECLARE #id int
BEGIN TRAN
SELECT #id = MAX(id) + 1 FROM Table1 WITH (UPDLOCK, HOLDLOCK)
INSERT INTO Table1(id, data_field)
VALUES (#id ,'[blob of data]')
COMMIT TRAN
To explain the collision thing, I have provided some code
first create this table and insert one row
CREATE TABLE Table1(id int primary key not null, data_field char(100))
GO
Insert Table1 values(1,'[blob of data]')
Go
Now open up two query windows and run this at the same time
declare #i int
set #i =1
while #i < 10000
begin
BEGIN TRAN
INSERT INTO Table1(id, data_field)
SELECT MAX(id) + 1, '[blob of data]' FROM Table1
COMMIT TRAN;
set #i =#i + 1
end
You will see a bunch of these
Server: Msg 2627, Level 14, State 1, Line 7
Violation of PRIMARY KEY constraint 'PK__Table1__3213E83F2962141D'. Cannot insert duplicate key in object 'dbo.Table1'.
The statement has been terminated.

Try this instead:
INSERT INTO Table1 (id, data_field)
SELECT id, '[blob of data]' FROM (SELECT MAX(id) + 1 as id FROM Table1) tbl
I wouldn't recommend doing it that way for any number of reasons though (performance, transaction safety, etc)

It could be because there are no records so the sub query is returning NULL...try
INSERT INTO tblTest(RecordID, Text)
VALUES ((SELECT ISNULL(MAX(RecordID), 0) + 1 FROM tblTest), 'asdf')

I don't know if somebody is still looking for an answer but here is a solution that seems to work:
-- Preparation: execute only once
CREATE TABLE Test (Value int)
CREATE TABLE Lock (LockID uniqueidentifier)
INSERT INTO Lock SELECT NEWID()
-- Real insert
BEGIN TRAN LockTran
-- Lock an object to block simultaneous calls.
UPDATE Lock WITH(TABLOCK)
SET LockID = LockID
INSERT INTO Test
SELECT ISNULL(MAX(T.Value), 0) + 1
FROM Test T
COMMIT TRAN LockTran

We have a similar situation where we needed to increment and could not have gaps in the numbers. (If you use an identity value and a transaction is rolled back, that number will not be inserted and you will have gaps because the identity value does not roll back.)
We created a separate table for last number used and seeded it with 0.
Our insert takes a few steps.
--increment the number
Update dbo.NumberTable
set number = number + 1
--find out what the incremented number is
select #number = number
from dbo.NumberTable
--use the number
insert into dbo.MyTable using the #number
commit or rollback
This causes simultaneous transactions to process in a single line as each concurrent transaction will wait because the NumberTable is locked. As soon as the waiting transaction gets the lock, it increments the current value and locks it from others. That current value is the last number used and if a transaction is rolled back, the NumberTable update is also rolled back so there are no gaps.
Hope that helps.
Another way to cause single file execution is to use a SQL application lock. We have used that approach for longer running processes like synchronizing data between systems so only one synchronizing process can run at a time.

If you're doing it in a trigger, you could make sure it's an "INSTEAD OF" trigger and do it in a couple of statements:
DECLARE #next INT
SET #next = (SELECT (MAX(id) + 1) FROM Table1)
INSERT INTO Table1
VALUES (#next, inserted.datablob)
The only thing you'd have to be careful about is concurrency - if two rows are inserted at the same time, they could attempt to use the same value for #next, causing a conflict.
Does this accomplish what you want?

It seems very odd to do this sort of thing w/o an IDENTITY (auto-increment) column, making me question the architecture itself. I mean, seriously, this is the perfect situation for an IDENTITY column. It might help us answer your question if you'd explain the reasoning behind this decision. =)
Having said that, some options are:
using an INSTEAD OF trigger for this purpose. So, you'd do your INSERT (the INSERT statement would not need to pass in an ID). The trigger code would handle inserting the appropriate ID. You'd need to use the WITH (UPDLOCK, HOLDLOCK) syntax used by another answerer to hold the lock for the duration of the trigger (which is implicitly wrapped in a transaction) & to elevate the lock type from "shared" to "update" lock (IIRC).
you can use the idea above, but have a table whose purpose is to store the last, max value inserted into the table. So, once the table is set up, you would no longer have to do a SELECT MAX(ID) every time. You'd simply increment the value in the table. This is safe provided that you use appropriate locking (as discussed). Again, that avoids repeated table scans every time you INSERT.
use GUIDs instead of IDs. It's much easier to merge tables across databases, since the GUIDs will always be unique (whereas records across databases will have conflicting integer IDs). To avoid page splitting, sequential GUIDs can be used. This is only beneficial if you might need to do database merging.
Use a stored proc in lieu of the trigger approach (since triggers are to be avoided, for some reason). You'd still have the locking issue (and the performance problems that can arise). But sprocs are preferred over dynamic SQL (in the context of applications), and are often much more performant.
Sorry about rambling. Hope that helps.

How about creating a separate table to maintain the counter? It has better performance than MAX(id), as it will be O(1). MAX(id) is at best O(lgn) depending on the implementation.
And then when you need to insert, simply lock the counter table for reading the counter and increment the counter. Then you can release the lock and insert to your table with the incremented counter value.

Have a separate table where you keep your latest ID and for every transaction get a new one.
It may be a bit slower but it should work.
DECLARE #NEWID INT
BEGIN TRAN
UPDATE TABLE SET ID=ID+1
SELECT #NEWID=ID FROM TABLE
COMMIT TRAN
PRINT #NEWID -- Do what you want with your new ID

Code without any transaction scope (I use it in my engineer course as an exercice) :
-- Preparation: execute only once
CREATE TABLE increment (val int);
INSERT INTO increment VALUES (1);
-- Real insert
DECLARE #newIncrement INT;
UPDATE increment
SET #newIncrement = val,
val = val + 1;
INSERT INTO Table1 (id, data_field)
SELECT #newIncrement, 'some data';

declare #nextId int
set #nextId = (select MAX(id)+1 from Table1)
insert into Table1(id, data_field) values (#nextId, '[blob of data]')
commit;
But perhaps a better approach would be using a scalar function getNextId('table1')

Any critiques of this? Works for me.
DECLARE #m_NewRequestID INT
, #m_IsError BIT = 1
, #m_CatchEndless INT = 0
WHILE #m_IsError = 1
BEGIN TRY
SELECT #m_NewRequestID = (SELECT ISNULL(MAX(RequestID), 0) + 1 FROM Requests)
INSERT INTO Requests ( RequestID
, RequestName
, Customer
, Comment
, CreatedFromApplication)
SELECT RequestID = #m_NewRequestID
, RequestName = dbo.ufGetNextAvailableRequestName(PatternName)
, Customer = #Customer
, Comment = [Description]
, CreatedFromApplication = #CreatedFromApplication
FROM RequestPatterns
WHERE PatternID = #PatternID
SET #m_IsError = 0
END TRY
BEGIN CATCH
SET #m_IsError = 1
SET #m_CatchEndless = #m_CatchEndless + 1
IF #m_CatchEndless > 1000
THROW 51000, '[upCreateRequestFromPattern]: Unable to get new RequestID', 1
END CATCH

This should work:
INSERT INTO Table1 (id, data_field)
SELECT (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]';
Or this (substitute LIMIT for other platforms):
INSERT INTO Table1 (id, data_field)
SELECT TOP 1
MAX(id) + 1, '[blob of data]'
FROM
Table1
ORDER BY
[id] DESC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Batch commit on large INSERT operation in native SQL? - sql

You could use the BCP command to load the data and use the Batch Size parameter http://msdn.microsoft.com/en-us/library/ms162802.aspx Two step process BCP OUT data from Views into Text files BCP IN data from Text files into Tables with batch size parameter

This looks like a job for good ol' BCP.

Related

INSERT 150TB data into the table [duplicate]

SQL Server: Merge in iterations

How to force a running t-sql query (half done) to commit?

How to copy a huge table data into another table in SQL Server

Possible to implement a manual increment with just simple SQL INSERT?

Categories

Resources