SQL Server: Why isn't this logic working when Chunking on inserts?

SQL Server: Why isn't this logic working when Chunking on inserts? - sql

Fellow Techies--
I've got an endless loop condition happening here. Why is ##rowcount never getting set back to 0? I must not be understanding what ##rowcount really does--or I am setting the value in the wrong place. I think the value should be decrementing on each pass until I eventually hit zero.
DECLARE #ChunkSize int = 250000;
WHILE #ChunkSize <> 0
BEGIN
BEGIN TRANSACTION
INSERT TableName
(col1,col2)
SELECT TOP (#ChunkSize)
col1,col2
FROM TableName2
COMMIT TRANSACTION;
SET #ChunkSize = ##ROWCOUNT
END -- transaction block
END -- while-loop block

I'm not sure, by what you posted, how you are going to ensure you catch rows that you haven't already inserted. If you don't, it'll be an infinite loop of course. Here is a way using test data--but naturally you'd want to base it off a PK or other unique column. Perhaps you just left that part off, or I'm missing something all together. I'm just interested in what your final code is for your chunking and the logic behind it, so this is an answer and inquiry.
if object_id('tempdb..#source') is not null drop table #source
if object_id('tempdb..#destination') is not null drop table #destination
create table #source(c1 int, c2 int)
create table #destination (c1 int, c2 int)
insert into #source (c1,c2) values
(1,1),
(2,1),
(3,1),
(4,1),
(5,1),
(6,1),
(7,1),
(8,1),
(9,1),
(10,1),
(11,1),
(12,1)
DECLARE #ChunkSize int = 2;
WHILE #ChunkSize <> 0
BEGIN
INSERT INTO #destination (c1,c2)
SELECT TOP (#ChunkSize) c1,c2 FROM #source WHERE c1 NOT IN (SELECT DISTINCT c1 FROM #destination) ORDER BY ROW_NUMBER() OVER (ORDER BY C1)
SET #ChunkSize = ##ROWCOUNT
--SELECT #ChunkSize
END
select * from #source
select * from #destination

Nothing is happening because you're setting chunksize to itself without ever looking at what you've already inserted. Using your example, #Chunksize = 250000. First, select performs SELECT TOP 250000 and returns (presumably) 250000 rows. You then use ##RowCount to update #Chunksize, but the row count returned will be 250000, so you just set it to 250000 again. Which could be fine, except there is no way that number will ever change without ruling out rows that you've already inserted - you will keep inserting the same 250000 rows over and over.
You need something like NOT EXISTS to filter out the rows you've already inserted:
DECLARE #ChunkSize int = 250000;
WHILE #ChunkSize > 0
BEGIN
BEGIN TRANSACTION
INSERT INTO TableName
(col1,col2)
SELECT TOP (#ChunkSize)
col1,col2
FROM TableName2 T2
WHERE NOT EXISTS (SELECT *
FROM TableName T
WHERE T.Col1 = T2.Col1
AND T.Col2 = T2.Col2)
SET #ChunkSize = ##ROWCOUNT
PRINT CONVERT(nvarchar(10),#ChunkSize) + ' Rows Inserted.';
COMMIT TRANSACTION
END -- while-loop block

Implemented solution
In the end, I decided to pump the SQL through SSIS, where I could set the commit batch size accordingly. Had I not chosen hat route, I would have had to follow #scsimon's suggestion and basically maintain a tracking table for the records completed and the records left to cycle through.

Related

How to Batch INSERT SQL Server?

I am trying to batch inserting rows from one table to another.
DECLARE #batch INT = 10000;
WHILE #batch > 0
BEGIN
BEGIN TRANSACTION
INSERT into table2
select top (#batch) *
FROM table1
SET #batch = ##ROWCOUNT
COMMIT TRANSACTION
END
It runs on the first 10,000 and inserts them. Then i get error message "Cannot insert duplicate key" which its trying to insert the same primary key so i assume its trying to repeat the same batch. What logic am i missing here to loop through the batches? Probably something simple but i cant figure it out.
Can anyone help? thanks

Your code keeps inserting the same rows. You can avoid it by "paginating" your inserts:
DECLARE #batch INT = 10000;
DECLARE #page INT = 0
DECLARE #lastCount INT = 1
WHILE #lastCount > 0
BEGIN
BEGIN TRANSACTION
INSERT into table2
SELECT col1, col2, ... -- list columns explicitly
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY YourPrimaryKey ) AS RowNum, *
FROM table1
) AS RowConstrainedResult
WHERE RowNum >= (#page * #batch) AND RowNum < ((#page+1) * #batch)
SET #lastCount = ##ROWCOUNT
SET #page = #page + 1
COMMIT TRANSACTION
END

You need some way to eliminate existing rows. You seem to have a primary key, so:
INSERT into table2
SELECT TOP (#batch) *
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.id);

Sql update in batches

What is most effective way to update a table in sql server in order to put a limit of 10k records in one single transaction?
I read about top and ROWCOUNT approach by adding it in a while loop. Which is more effective among those? Or please share if you know alternate effective ways. Thank you.

Here is one potential approach without using set rowcount
-- prepare test data
use tempdb
drop table dbo.t;
create table dbo.t (a int identity, b int)
go
insert into dbo.t ( b)
values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11);
go
-- assume we do 3 records per time, put 10000 here if you wnat 10K records
-- also the update is just to update column [b] to [b] * 2, here is the code
declare #N int = 3; -- do a batch of #N records
declare #i int = 0, #max_loop int;
select #max_loop = count(*)/#N from dbo.t
-- the first batch may include records <= #N-1 and the last batch may include records <= #N
while (#i <= #max_loop)
begin
; with c as (
select rnk=ROW_NUMBER() over (order by a)/#N, a, b from dbo.t
)
update c set b = b*2 -- doule b
where rnk = #i;
set #i = #i + 1;
end
go
-- check the result
select * from dbo.t

You can try with the below approach:
WHILE (1=1)
BEGIN
BEGIN TRANSACTION
UPDATE TOP (10000) XXX
SET XXX.YYY = <ValueToUpdate>
FROM XXX -- Update 10000 nonupdated rows
WHERE <condition> -- make sure that condition makes sure that it does not become infinite loop
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
EDIT
Update for all employees of an organization and making sure that it does not become infinte loop. Here, I am updating modifiedDate for an employee record.
DECLARE #updatedids table(id int)
WHILE (1=1)
BEGIN
BEGIN TRANSACTION
UPDATE TOP(10000) a
SET a.ModifiedDate = GETDATE()
OUTPUT inserted.BusinessEntityID INTO #updatedids
FROM HumanResources.Employee a
LEFT JOIN #updatedids u
ON a.BusinessEntityID = u.id
WHERE u.id IS NULL
-- Update 10000 nonupdated rows
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END

Efficient SQL Server stored procedure

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)
According to the last running I made it is working good but it is expected to last for about 200 days:
SET NOCOUNT ON
-- define the last ID handled
DECLARE #LastID integer
SET #LastID = 0
declare #tempDate datetime
set #tempDate = dateadd(dd,-20,getdate())
-- define the ID to be handled now
DECLARE #IDToHandle integer
DECLARE #iCounter integer
DECLARE #watch1 nvarchar(50)
DECLARE #watch2 nvarchar(50)
set #iCounter = 0
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
-- as long as we have s......
WHILE #IDToHandle IS NOT NULL
BEGIN
IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = #IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =#IDToHandle )=0)
BEGIN
INSERT INTO SECONDERY_TABLE
SELECT col1,col2,col3.....
FROM MAIN_TABLE WHERE id_col = #IDToHandle
EXEC [dbo].[DeleteByID] #ID = #IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
set #iCounter = #iCounter +1
END
IF (#iCounter % 1000 = 0)
begin
set #watch1 = 'iCounter - ' + CAST(#iCounter AS VARCHAR)
set #watch2 = 'IDToHandle - '+ CAST(#IDToHandle AS VARCHAR)
raiserror ( #watch1, 10,1) with nowait
raiserror (#watch2, 10,1) with nowait
end
-- set the last handled to the one we just handled
SET #LastID = #IDToHandle
SET #IDToHandle = NULL
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
END
Any ideas or directions to improve this procedure run-time will be welcomed

Yes, try this:
Declare #Ids Table (id int Primary Key not Null)
Insert #Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
And someDateCol < #tempDate -- If there are times in these datetime fields,
-- then you may need to modify this condition.
And some_other_int_col In (1745, 1548, 4785)
And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
Where some_int_col = m.id_col)
And Not Exists (Select * From A_70k_rows_table
Where some_int_col = m.id_col)
Select id from #Ids -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified #Ids is correct
-- Once you have verified that above #Ids is correctly populated,
-- then delete or comment out the select and return lines above so insert runs.
Begin Transaction
Delete OT -- eliminate row-by-row call to second stored proc
From OtherTable ot
Join MAIN_TABLE m On m.id_col = ot.FKCol
Join #Ids i On i.Id = m.id_col
Insert SECONDERY_TABLE(col1, col2, etc.)
Select col1,col2,col3.....
FROM MAIN_TABLE m Join #Ids i On i.Id = m.id_col
Delete m -- eliminate row-by-row call to second stored proc
FROM MAIN_TABLE m
Join #Ids i On i.Id = m.id_col
Commit Transaction
Explaanation.
You had numerous filtering conditions that were not SARGable, i.e., they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.

Update multiple rows in table from table variable

I'm writing a stored procedure to update multiple records based on a table variable parameter.
The existing table is: Tb_Project_Image with relevant columns:
id PK (identity 1,1)
cat_ord decimal(4,2)
The procedure will receive a temporary table variable (shown in the code below) containing the id as PI_ID, and the new value for cat_ord as newCatOrd. idx is a simple identity for each row containing 1...n where n is the rowcount of #tempTable.
For each row in #tempTable, I want to update Tb_Project_Image where id = PI_ID to the corresponding value.
DECLARE #tempTable table (
idx smallint Primary Key IDENTITY(1,1),
PI_ID bigint,
newCatOrd decimal(4, 2) not null )
INSERT INTO #tempTable values (3, 7.01)
INSERT INTO #tempTable values (4, 7.02)
INSERT INTO #tempTable values (5, 7.03)
--etc...
DECLARE #error int
DECLARE #update int
DECLARE #iter int
SET #iter = 1
BEGIN TRAN
WHILE #iter <= (select COUNT(*) from #tempTable)
BEGIN
UPDATE Tb_Project_Image
SET cat_ord = (SELECT newCatOrd FROM #tempTable
WHERE idx = #iter)
WHERE id = (SELECT PI_ID FROM #tempTable
WHERE idx = #iter)
--error checking
set #error = ##ERROR
set #update = ##ROWCOUNT
IF ((#error = 0) AND (#update = 1))
BEGIN
SET #iter = #iter + 1
CONTINUE
END
ELSE
BREAK
END
IF ((#error = 0) AND (#update = 1))
COMMIT TRAN
ELSE
ROLLBACK TRAN
GO
Now, the error checking is because, to ensure integrity, EACH row in the temporary table MUST make 1 update. (explanation omitted to save space) If a single iteration of the while loop threw an error, or didn't effect exactly 1 row, I want to break the loop and rollback the transaction
THE PROBLEM I'm having is that this error checking is not working. I'm currently running it with 14 rows in #tempTable and the 11th uses a PI_ID not found in the Project_Image table. Therefore, #update = 0... but it continues the loop and commits the data.
I'd be doubly glad if someone had a method of doing this that only used a single update statement.

You cannot do it this way, because even SET resets the state of ##ERROR and ##ROWNUMBER variables. In this case ##ROWCOUNT is set to 1 after set #error = ##ERROR. If you do not assign the values to local variables, your code will work:
IF ((##error = 0) AND (##rowcount = 1))
But you might rather try try...catch error handling and test ##rowcount separately after update.
UPDATE: doing it in single update:
UPDATE t
SET cat_ord = tt.newCatOrd
FROM Tb_Project_Image t
INNER JOIN #tempTable tt
ON t.id = tt.PI_ID
-- If there was PI_ID not found in Tb_Project_Image
-- But I think that this should have been dealt with
-- During the initial loading of temporary table
IF ##ROWCOUNT <> (select count (*) from #tempTable)
BEGIN
-- Error reporting here
ROLLBACK TRANSACTION
END

Instead of updating and then rolling back, you could also use a CTE to determine if any records should be updated prior to performing the update. Something like this should work:
WITH NON_SINGLETON AS (
-- Find any records in #tempTable that don't match
-- exactly one record in Tb_Project_Image
SELECT t.PI_ID, COUNT(pi.id) C
FROM #tempTable t
LEFT JOIN Tb_Project_Image pi ON t.PI_ID = pi.id
GROUP BY t.PI_ID
HAVING COUNT(pi.id) != 1
)
UPDATE Tb_Project_Image
SET cat_ord = t.newCatOrd
FROM Tb_Project_Image pi
JOIN #tempTable t ON pi.id = t.PI_ID
-- If any invalid records were found in the CTE,
-- then this condition will fail for all rows
-- and nothing will be updated
WHERE NOT EXISTS(SELECT 1 FROM NON_SINGLETON)
If it's possible for #tempTable to have duplicate entries for the same PI_ID, then this will handle those scenarios as well. And since it's a single statement, you don't have to explicitly managing the transaction in the proc (if it's the only thing that needs to be included in the transaction).

Check if a row exists, otherwise insert

I need to write a T-SQL stored procedure that updates a row in a table. If the row doesn't exist, insert it. All this steps wrapped by a transaction.
This is for a booking system, so it must be atomic and reliable. It must return true if the transaction was committed and the flight booked.
I'm sure on how to use ##rowcount. This is what I've written until now. Am I on the right road?
-- BEGIN TRANSACTION (HOW TO DO?)
UPDATE Bookings
SET TicketsBooked = TicketsBooked + #TicketsToBook
WHERE FlightId = #Id AND TicketsMax < (TicketsBooked + #TicketsToBook)
-- Here I need to insert only if the row doesn't exists.
-- If the row exists but the condition TicketsMax is violated, I must not insert
-- the row and return FALSE
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO Bookings ... (omitted)
END
-- END TRANSACTION (HOW TO DO?)
-- Return TRUE (How to do?)

I assume a single row for each flight? If so:
IF EXISTS (SELECT * FROM Bookings WHERE FLightID = #Id)
BEGIN
--UPDATE HERE
END
ELSE
BEGIN
-- INSERT HERE
END
I assume what I said, as your way of doing things can overbook a flight, as it will insert a new row when there are 10 tickets max and you are booking 20.

Take a look at MERGE command. You can do UPDATE, INSERT & DELETE in one statement.
Here is a working implementation on using MERGE
- It checks whether flight is full before doing an update, else does an insert.
if exists(select 1 from INFORMATION_SCHEMA.TABLES T
where T.TABLE_NAME = 'Bookings')
begin
drop table Bookings
end
GO
create table Bookings(
FlightID int identity(1, 1) primary key,
TicketsMax int not null,
TicketsBooked int not null
)
GO
insert Bookings(TicketsMax, TicketsBooked) select 1, 0
insert Bookings(TicketsMax, TicketsBooked) select 2, 2
insert Bookings(TicketsMax, TicketsBooked) select 3, 1
GO
select * from Bookings
And then ...
declare #FlightID int = 1
declare #TicketsToBook int = 2
--; This should add a new record
merge Bookings as T
using (select #FlightID as FlightID, #TicketsToBook as TicketsToBook) as S
on T.FlightID = S.FlightID
and T.TicketsMax > (T.TicketsBooked + S.TicketsToBook)
when matched then
update set T.TicketsBooked = T.TicketsBooked + S.TicketsToBook
when not matched then
insert (TicketsMax, TicketsBooked)
values(S.TicketsToBook, S.TicketsToBook);
select * from Bookings

Pass updlock, rowlock, holdlock hints when testing for existence of the row.
begin tran /* default read committed isolation level is fine */
if not exists (select * from Table with (updlock, rowlock, holdlock) where ...)
/* insert */
else
/* update */
commit /* locks are released here */
The updlock hint forces the query to take an update lock on the row if it already exists, preventing other transactions from modifying it until you commit or roll back.
The holdlock hint forces the query to take a range lock, preventing other transactions from adding a row matching your filter criteria until you commit or roll back.
The rowlock hint forces lock granularity to row level instead of the default page level, so your transaction won't block other transactions trying to update unrelated rows in the same page (but be aware of the trade-off between reduced contention and the increase in locking overhead - you should avoid taking large numbers of row-level locks in a single transaction).
See http://msdn.microsoft.com/en-us/library/ms187373.aspx for more information.
Note that locks are taken as the statements which take them are executed - invoking begin tran doesn't give you immunity against another transaction pinching locks on something before you get to it. You should try and factor your SQL to hold locks for the shortest possible time by committing the transaction as soon as possible (acquire late, release early).
Note that row-level locks may be less effective if your PK is a bigint, as the internal hashing on SQL Server is degenerate for 64-bit values (different key values may hash to the same lock id).

i'm writing my solution. my method doesn't stand 'if' or 'merge'. my method is easy.
INSERT INTO TableName (col1,col2)
SELECT #par1, #par2
WHERE NOT EXISTS (SELECT col1,col2 FROM TableName
WHERE col1=#par1 AND col2=#par2)
For Example:
INSERT INTO Members (username)
SELECT 'Cem'
WHERE NOT EXISTS (SELECT username FROM Members
WHERE username='Cem')
Explanation:
(1) SELECT col1,col2 FROM TableName WHERE col1=#par1 AND col2=#par2
It selects from TableName searched values
(2) SELECT #par1, #par2 WHERE NOT EXISTS
It takes if not exists from (1) subquery
(3) Inserts into TableName (2) step values

I finally was able to insert a row, on the condition that it didn't already exist, using the following model:
INSERT INTO table ( column1, column2, column3 )
(
SELECT $column1, $column2, $column3
WHERE NOT EXISTS (
SELECT 1
FROM table
WHERE column1 = $column1
AND column2 = $column2
AND column3 = $column3
)
)
which I found at:
http://www.postgresql.org/message-id/87hdow4ld1.fsf#stark.xeocode.com

This is something I just recently had to do:
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[cjso_UpdateCustomerLogin]
(
#CustomerID AS INT,
#UserName AS VARCHAR(25),
#Password AS BINARY(16)
)
AS
BEGIN
IF ISNULL((SELECT CustomerID FROM tblOnline_CustomerAccount WHERE CustomerID = #CustomerID), 0) = 0
BEGIN
INSERT INTO [tblOnline_CustomerAccount] (
[CustomerID],
[UserName],
[Password],
[LastLogin]
) VALUES (
/* CustomerID - int */ #CustomerID,
/* UserName - varchar(25) */ #UserName,
/* Password - binary(16) */ #Password,
/* LastLogin - datetime */ NULL )
END
ELSE
BEGIN
UPDATE [tblOnline_CustomerAccount]
SET UserName = #UserName,
Password = #Password
WHERE CustomerID = #CustomerID
END
END

You could use the Merge Functionality to achieve. Otherwise you can do:
declare #rowCount int
select #rowCount=##RowCount
if #rowCount=0
begin
--insert....

INSERT INTO [DatabaseName1].dbo.[TableName1] SELECT * FROM [DatabaseName2].dbo.[TableName2]
WHERE [YourPK] not in (select [YourPK] from [DatabaseName1].dbo.[TableName1])

Full solution is below (including cursor structure). Many thanks to Cassius Porcus for the begin trans ... commit code from posting above.
declare #mystat6 bigint
declare #mystat6p varchar(50)
declare #mystat6b bigint
DECLARE mycur1 CURSOR for
select result1,picture,bittot from all_Tempnogos2results11
OPEN mycur1
FETCH NEXT FROM mycur1 INTO #mystat6, #mystat6p , #mystat6b
WHILE ##Fetch_Status = 0
BEGIN
begin tran /* default read committed isolation level is fine */
if not exists (select * from all_Tempnogos2results11_uniq with (updlock, rowlock, holdlock)
where all_Tempnogos2results11_uniq.result1 = #mystat6
and all_Tempnogos2results11_uniq.bittot = #mystat6b )
insert all_Tempnogos2results11_uniq values (#mystat6 , #mystat6p , #mystat6b)
--else
-- /* update */
commit /* locks are released here */
FETCH NEXT FROM mycur1 INTO #mystat6 , #mystat6p , #mystat6b
END
CLOSE mycur1
DEALLOCATE mycur1
go

Simple way to copy data from T1 to T2 and avoid duplicate in T2
--Insert a new record
INSERT INTO dbo.Table2(NoEtu, FirstName, LastName)
SELECT t1.NoEtuDos, t1.FName, t1.LName
FROM dbo.Table1 as t1
WHERE NOT EXISTS (SELECT (1) FROM dbo.Table2 AS t2
WHERE t1.FName = t2.FirstName
AND t1.LName = t2.LastName
AND t1.NoEtuDos = t2.NoEtu)

INSERT INTO table ( column1, column2, column3 )
SELECT $column1, $column2, $column3
EXCEPT SELECT column1, column2, column3
FROM table

The best approach to this problem is first making the database column UNIQUE
ALTER TABLE table_name ADD UNIQUE KEY
THEN INSERT IGNORE INTO table_name ,the value won't be inserted if it results in a duplicate key/already exists in the table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server: Why isn't this logic working when Chunking on inserts? - sql

Related

How to Batch INSERT SQL Server?

Sql update in batches

Efficient SQL Server stored procedure

Update multiple rows in table from table variable

Check if a row exists, otherwise insert

Categories

Resources