How to Batch INSERT SQL Server? - sql

I am trying to batch inserting rows from one table to another.
DECLARE #batch INT = 10000;
WHILE #batch > 0
BEGIN
BEGIN TRANSACTION
INSERT into table2
select top (#batch) *
FROM table1
SET #batch = ##ROWCOUNT
COMMIT TRANSACTION
END
It runs on the first 10,000 and inserts them. Then i get error message "Cannot insert duplicate key" which its trying to insert the same primary key so i assume its trying to repeat the same batch. What logic am i missing here to loop through the batches? Probably something simple but i cant figure it out.
Can anyone help? thanks

Your code keeps inserting the same rows. You can avoid it by "paginating" your inserts:
DECLARE #batch INT = 10000;
DECLARE #page INT = 0
DECLARE #lastCount INT = 1
WHILE #lastCount > 0
BEGIN
BEGIN TRANSACTION
INSERT into table2
SELECT col1, col2, ... -- list columns explicitly
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY YourPrimaryKey ) AS RowNum, *
FROM table1
) AS RowConstrainedResult
WHERE RowNum >= (#page * #batch) AND RowNum < ((#page+1) * #batch)
SET #lastCount = ##ROWCOUNT
SET #page = #page + 1
COMMIT TRANSACTION
END

You need some way to eliminate existing rows. You seem to have a primary key, so:
INSERT into table2
SELECT TOP (#batch) *
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.id);

Related

DELETE of old rows causing deadlock

I have two SQL Server tables. Table1 is a live table to store live events from users, the other table, Table2 is storing these events as historical data for archive purposes.
When INSERT happens on Table1 a trigger INSERTs that data to Table2 and DELETEs old rows from both tables with different time intervals.
When I'm using only INSERT to Table2 and DELETE FROM table1 everything goes alright, but when It goes to deletion of the old files on table2 I've got deadlock.
Can anyone give me some hints why this can happen and what can be the cause? (I'm newbie, sorry)
CREATE TRIGGER trigger_events ON dbo.TAble1
FOR INSERT
AS
INSERT INTO dbo.TAble2
(id, event_time, resource_type)
SELECT id, event_time, resource_type FROM inserted WITH (NOLOCK);
DECLARE #done_main bit = 0;
WHILE (#done_main = 0)
BEGIN
DELETE TOP(1000) FROM dbo.TAble1 WHERE EVENT_TIME < (SELECT cast(DATEDIFF(second,'1970-01-01 00:00:00',(DATEADD(HOUR,-1,GETDATE())))AS bigint)* 1000 );
IF ##rowcount < 1000 SET #done_main = 1;
END;
DECLARE #done bit = 0;
WHILE (#done = 0)
BEGIN
DELETE TOP(1000) FROM dbo.TAble2 WHERE EVENT_TIME < (SELECT cast(DATEDIFF(second,'1970-01-01 00:00:00',(DATEADD(MONTH,-1,GETDATE())))AS bigint)* 1000 );
IF ##rowcount < 1000 SET #done = 1;
END;
GO

SQL insert trigger condition statement and multiple rows

Could you please help me to finish my trigger. What i got so far:
CREATE TRIGGER [dbo].[atbl_Sales_OrdersLines_ITrigGG]
ON [dbo].[atbl_Sales_OrdersLines]
FOR INSERT
AS
BEGIN
DECLARE #ID INT = (SELECT ProductID
FROM INSERTED)
DECLARE #OrderedQ INT = (SELECT SUM(Amount)
FROM atbl_Sales_OrdersLines
WHERE ProductID = #ID)
DECLARE #CurrentQ INT = (SELECT Quantity
FROM atbl_Sales_Products
WHERE ProductID = #ID)
DECLARE #PossibleQ INT = (SELECT Amount
FROM INSERTED
WHERE ProductID = #ID)
IF (#CurrentQ - #OrderedQ >= #PossibleQ)
ELSE
END
I need to complete the code. Can not figure out how to do it. I need that if condition is met - trigger would allow the insert. If else, trigger would stop the insert/or rollback and prompt a message that quantity is not sufficient.
Also, will this code work if insert is multiple lines with different product ids?
Thanks.
Something like this might work. This trigger checks the products that are in the insert, summing the total that have been ordered (now and in the past), and if any of them exceed the available quantity, the whole transaction is rolled back. Whenever writing triggers, you want to avoid any assumptions that there is a single row being inserted/updated/deleted, and avoid cursors. You want to just use basic set based operations.
CREATE TRIGGER [dbo].[atbl_Sales_OrdersLines_ITrigGG]
ON [dbo].[atbl_Sales_OrdersLines]
FOR INSERT
AS
BEGIN
IF (exists (select 1 from (
select x.ProductId, totalOrdersQty, ISNULL(asp.Quantity, 0) PossibleQty from (
select i.ProductId, sum(aso.Amount) totalOrdersQty
from (select distinct ProductId from inserted) i
join atbl_Sales_OrdersLines aso on aso.ProductId = i.ProductId
group by productId) x
left join atbl_Sales_Product asp on asp.ProductId = x.ProductId
) x
where PossibleQty < totalOrdersQty))
BEGIN
RAISERROR ('Quantity is not sufficient' ,10,1)
ROLLBACK TRANSACTION
END
END
I still think this is a horrible idea.
Try this,
CREATE TRIGGER [dbo].[atbl_Sales_OrdersLines_ITrigGG]
ON [dbo].[atbl_Sales_OrdersLines]
INSTEAD OF INSERT --FOR INSERT
AS
BEGIN
DECLARE #ID INT = (SELECT ProductID
FROM INSERTED)
DECLARE #OrderedQ INT = (SELECT SUM(Amount)
FROM atbl_Sales_OrdersLines
WHERE ProductID = #ID)
DECLARE #CurrentQ INT = (SELECT Quantity
FROM atbl_Sales_Products
WHERE ProductID = #ID)
DECLARE #PossibleQ INT = (SELECT Amount
FROM INSERTED
WHERE ProductID = #ID)
IF (#CurrentQ - #OrderedQ >= #PossibleQ)
BEGIN
INSERT INTO YOURTABLE (COLUMN1, COLUMN2, COLUMN3, ..)
SELECT COLUMN1, COLUMN2, COLUMN3, ..
FROM inserted
END
ELSE
BEGIN
RAISERROR ('Quantity is not sufficient' ,10,1)
ROLLBACK TRANSACTION
END

SQL Server: Why isn't this logic working when Chunking on inserts?

Fellow Techies--
I've got an endless loop condition happening here. Why is ##rowcount never getting set back to 0? I must not be understanding what ##rowcount really does--or I am setting the value in the wrong place. I think the value should be decrementing on each pass until I eventually hit zero.
DECLARE #ChunkSize int = 250000;
WHILE #ChunkSize <> 0
BEGIN
BEGIN TRANSACTION
INSERT TableName
(col1,col2)
SELECT TOP (#ChunkSize)
col1,col2
FROM TableName2
COMMIT TRANSACTION;
SET #ChunkSize = ##ROWCOUNT
END -- transaction block
END -- while-loop block
I'm not sure, by what you posted, how you are going to ensure you catch rows that you haven't already inserted. If you don't, it'll be an infinite loop of course. Here is a way using test data--but naturally you'd want to base it off a PK or other unique column. Perhaps you just left that part off, or I'm missing something all together. I'm just interested in what your final code is for your chunking and the logic behind it, so this is an answer and inquiry.
if object_id('tempdb..#source') is not null drop table #source
if object_id('tempdb..#destination') is not null drop table #destination
create table #source(c1 int, c2 int)
create table #destination (c1 int, c2 int)
insert into #source (c1,c2) values
(1,1),
(2,1),
(3,1),
(4,1),
(5,1),
(6,1),
(7,1),
(8,1),
(9,1),
(10,1),
(11,1),
(12,1)
DECLARE #ChunkSize int = 2;
WHILE #ChunkSize <> 0
BEGIN
INSERT INTO #destination (c1,c2)
SELECT TOP (#ChunkSize) c1,c2 FROM #source WHERE c1 NOT IN (SELECT DISTINCT c1 FROM #destination) ORDER BY ROW_NUMBER() OVER (ORDER BY C1)
SET #ChunkSize = ##ROWCOUNT
--SELECT #ChunkSize
END
select * from #source
select * from #destination
Nothing is happening because you're setting chunksize to itself without ever looking at what you've already inserted. Using your example, #Chunksize = 250000. First, select performs SELECT TOP 250000 and returns (presumably) 250000 rows. You then use ##RowCount to update #Chunksize, but the row count returned will be 250000, so you just set it to 250000 again. Which could be fine, except there is no way that number will ever change without ruling out rows that you've already inserted - you will keep inserting the same 250000 rows over and over.
You need something like NOT EXISTS to filter out the rows you've already inserted:
DECLARE #ChunkSize int = 250000;
WHILE #ChunkSize > 0
BEGIN
BEGIN TRANSACTION
INSERT INTO TableName
(col1,col2)
SELECT TOP (#ChunkSize)
col1,col2
FROM TableName2 T2
WHERE NOT EXISTS (SELECT *
FROM TableName T
WHERE T.Col1 = T2.Col1
AND T.Col2 = T2.Col2)
SET #ChunkSize = ##ROWCOUNT
PRINT CONVERT(nvarchar(10),#ChunkSize) + ' Rows Inserted.';
COMMIT TRANSACTION
END -- while-loop block
Implemented solution
In the end, I decided to pump the SQL through SSIS, where I could set the commit batch size accordingly. Had I not chosen hat route, I would have had to follow #scsimon's suggestion and basically maintain a tracking table for the records completed and the records left to cycle through.

Efficient SQL Server stored procedure

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)
According to the last running I made it is working good but it is expected to last for about 200 days:
SET NOCOUNT ON
-- define the last ID handled
DECLARE #LastID integer
SET #LastID = 0
declare #tempDate datetime
set #tempDate = dateadd(dd,-20,getdate())
-- define the ID to be handled now
DECLARE #IDToHandle integer
DECLARE #iCounter integer
DECLARE #watch1 nvarchar(50)
DECLARE #watch2 nvarchar(50)
set #iCounter = 0
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
-- as long as we have s......
WHILE #IDToHandle IS NOT NULL
BEGIN
IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = #IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =#IDToHandle )=0)
BEGIN
INSERT INTO SECONDERY_TABLE
SELECT col1,col2,col3.....
FROM MAIN_TABLE WHERE id_col = #IDToHandle
EXEC [dbo].[DeleteByID] #ID = #IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
set #iCounter = #iCounter +1
END
IF (#iCounter % 1000 = 0)
begin
set #watch1 = 'iCounter - ' + CAST(#iCounter AS VARCHAR)
set #watch2 = 'IDToHandle - '+ CAST(#IDToHandle AS VARCHAR)
raiserror ( #watch1, 10,1) with nowait
raiserror (#watch2, 10,1) with nowait
end
-- set the last handled to the one we just handled
SET #LastID = #IDToHandle
SET #IDToHandle = NULL
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
END
Any ideas or directions to improve this procedure run-time will be welcomed
Yes, try this:
Declare #Ids Table (id int Primary Key not Null)
Insert #Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
And someDateCol < #tempDate -- If there are times in these datetime fields,
-- then you may need to modify this condition.
And some_other_int_col In (1745, 1548, 4785)
And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
Where some_int_col = m.id_col)
And Not Exists (Select * From A_70k_rows_table
Where some_int_col = m.id_col)
Select id from #Ids -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified #Ids is correct
-- Once you have verified that above #Ids is correctly populated,
-- then delete or comment out the select and return lines above so insert runs.
Begin Transaction
Delete OT -- eliminate row-by-row call to second stored proc
From OtherTable ot
Join MAIN_TABLE m On m.id_col = ot.FKCol
Join #Ids i On i.Id = m.id_col
Insert SECONDERY_TABLE(col1, col2, etc.)
Select col1,col2,col3.....
FROM MAIN_TABLE m Join #Ids i On i.Id = m.id_col
Delete m -- eliminate row-by-row call to second stored proc
FROM MAIN_TABLE m
Join #Ids i On i.Id = m.id_col
Commit Transaction
Explaanation.
You had numerous filtering conditions that were not SARGable, i.e., they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.

SQL: Query timeout expired

I have a simple query for update table (30 columns and about 150 000 rows).
For example:
UPDATE tblSomeTable set F3 = #F3 where F1 = #F1
This query will affected about 2500 rows.
The tblSomeTable has a trigger:
ALTER TRIGGER [dbo].[trg_tblSomeTable]
ON [dbo].[tblSomeTable]
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
declare #operationType nvarchar(1)
declare #createDate datetime
declare #UpdatedColumnsMask varbinary(500) = COLUMNS_UPDATED()
-- detect operation type
if not exists(select top 1 * from inserted)
begin
-- delete
SET #operationType = 'D'
SELECT #createDate = dbo.uf_DateWithCompTimeZone(CompanyId) FROM deleted
end
else if not exists(select top 1 * from deleted)
begin
-- insert
SET #operationType = 'I'
SELECT #createDate = dbo..uf_DateWithCompTimeZone(CompanyId) FROM inserted
end
else
begin
-- update
SET #operationType = 'U'
SELECT #createDate = dbo..uf_DateWithCompTimeZone(CompanyId) FROM inserted
end
-- log data to tmp table
INSERT INTO tbl1
SELECT
#createDate,
#operationType,
#status,
#updatedColumnsMask,
d.F1,
i.F1,
d.F2,
i.F2,
d.F3,
i.F3,
d.F4,
i.F4,
d.F5,
i.F5,
...
FROM (Select 1 as temp) t
LEFT JOIN inserted i on 1=1
LEFT JOIN deleted d on 1=1
END
And if I execute the update query I have a timeout.
How can I optimize a logic to avoid timeout?
Thank you.
This query:
SELECT *
FROM (
SELECT 1 AS temp
) t
LEFT JOIN
INSERTED i
ON 1 = 1
LEFT JOIN
DELETED d
ON 1 = 1
will yield 2500 ^ 2 = 6250000 records from a cartesian product of INSERTED and DELETED (that is all possible combinations of all records in both tables), which will be inserted into tbl1.
Is that what you wanted to do?
Most probably, you want to join the tables on their PRIMARY KEY:
INSERT
INTO tbl1
SELECT #createDate,
#operationType,
#status,
#updatedColumnsMask,
d.F1,
i.F1,
d.F2,
i.F2,
d.F3,
i.F3,
d.F4,
i.F4,
d.F5,
i.F5,
...
FROM INSERTED i
FULL JOIN
DELETED d
ON i.id = d.id
This will treat update to the PK as deleting a record and inserting another, with a new PK.
Thanks Quassnoi, It's a good idea with "FULL JOIN". It is helped me.
Also I try to update table in portions (1000 items in one time) to make my code works faster because for some companyId I need to update more than 160 000 rows.
Instead of old code:
UPDATE tblSomeTable set someVal = #someVal where companyId = #companyId
I use below one:
declare #rc integer = 0
declare #parts integer = 0
declare #index integer = 0
declare #portionSize int = 1000
-- select Ids for update
declare #tempIds table (id int)
insert into #tempIds
select id from tblSomeTable where companyId = #companyId
-- calculate amount of iterations
set #rc=##rowcount
set #parts = #rc / #portionSize + 1
-- update table in portions
WHILE (#parts > #index)
begin
UPDATE TOP (#portionSize) t
SET someVal = #someVal
FROM tblSomeTable t
JOIN #tempIds t1 on t1.id = t.id
WHERE companyId = #companyId
delete top (#portionSize) from #tempIds
set #index += 1
end
What do you think about this? Does it make sense? If yes, how to choose correct portion size?
Or simple update also good solution? I just want to avoid locks in the future.
Thanks