I have a very slow procedule in mssql 2016, I dont think it should be that slow.
## the code ##
select #count = count(UID) from TABLE1 where IsProcess = 0
while #count > 0
begin
Declare #Name nvarchar(100)
Declare #UID int
select Top 1 #UID = UID, #Name = Name from Table1 where IsProcess = 0
set #UID2 = ISNULL((select TOP 1 UID from TABLE2 where Name like N'%' + #Name + N'%'), 0)
if #UID2 = 0
begin
insert into table2 (Name) values(#Name)
set #UID2 = ##IDENTITY
end
insert into tablerelation (UID1, UID2) values(#UID, #UID2)
update TABLE1 set IsProcess = 1 where UID = #UID
select #count = count(UID) from TABLE1 where IsProcess = 0
end
there are about 5 million rows in table1
the server cpu was less than 10%
memory arround 4G (and 8G on server)
no other application run on server
it is now about 3 to 4 row per second
I thought it should be 200 rows and up per second at least.
I had used cursor , it was the same
please help.
thanks a lot.
modified by code, this is my first post here, thanks for your patient
Why are you using a loop instead of set-based operations? From what I can tell, you want:
insert into tablerelation (uid1, uid2)
select t1.UID, t2.UID
from Table1 t1 cross apply
(select top 1 t2.*
from table2 t2
where t2.name like N'%' + t1.Name + N'%'
) t2
where IsProcess = 0;
This should be a bit faster, but it won't be an incredible speed-up. The speed gain is from the reduction in database operations.
However, the real problem is the join using like. With wildcards, it is not very easy to speed it up.
Related
I am trying to batch inserting rows from one table to another.
DECLARE #batch INT = 10000;
WHILE #batch > 0
BEGIN
BEGIN TRANSACTION
INSERT into table2
select top (#batch) *
FROM table1
SET #batch = ##ROWCOUNT
COMMIT TRANSACTION
END
It runs on the first 10,000 and inserts them. Then i get error message "Cannot insert duplicate key" which its trying to insert the same primary key so i assume its trying to repeat the same batch. What logic am i missing here to loop through the batches? Probably something simple but i cant figure it out.
Can anyone help? thanks
Your code keeps inserting the same rows. You can avoid it by "paginating" your inserts:
DECLARE #batch INT = 10000;
DECLARE #page INT = 0
DECLARE #lastCount INT = 1
WHILE #lastCount > 0
BEGIN
BEGIN TRANSACTION
INSERT into table2
SELECT col1, col2, ... -- list columns explicitly
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY YourPrimaryKey ) AS RowNum, *
FROM table1
) AS RowConstrainedResult
WHERE RowNum >= (#page * #batch) AND RowNum < ((#page+1) * #batch)
SET #lastCount = ##ROWCOUNT
SET #page = #page + 1
COMMIT TRANSACTION
END
You need some way to eliminate existing rows. You seem to have a primary key, so:
INSERT into table2
SELECT TOP (#batch) *
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.id);
I am facing an issue on SQL Server in which my stored procedure becomes slow after couple of days.
Below is the sample of my stored procedure.
Could this be a caching issue on the server side? Can I increase the server's cache size to resolve the problem?
Normally the stored procedure returns data in one second.
#START_VALUE int=null,
#END_VALUE int=null
#UID NVARCHAR(MAX)=null,
AS
BEGIN
SELECT
dbo.TABLE1.ID,
ROW_NUMBER() OVER (ORDER BY TABLE1.UPDATED_ON desc) AS RN,
CONVERT(VARCHAR(10), dbo.TABLE1.DATE, 101) AS TDATE,
CATEGORY = (
SELECT TOP 1 COLUMN1
FROM TABLE5 CT1
WHERE TABLE1.CATEGORY = CT1.CATEGORY_ID
),
TYPETEXT = (
SELECT TOP 1 COLUMN1
FROM TABLE6 CT1
WHERE TABLE1.TYPE = CT1.TYPE_ID
),
IMAGE = STUFF(( SELECT DISTINCT ',' + CAST(pm.C1 AS varchar(12))
FROM TABLE2 pm
WHERE pm.ID = TABLE1.ID AND pm.C1 IS NOT NULL AND pm.C1 <> ''
FOR XML PATH('')),
1, 1, '' ) INTO #tempRecords
FROM dbo.TABLE1
WHERE ((#UID is null OR dbo.TABLE1.ID = #UID )
ORDER BY TABLE1.UPDATED DESC
SELECT #count = COUNT(*) FROM #tempRecords;
SELECT *, CONVERT([int],#count) AS 'TOTAL_RECORDS'
FROM #tempRecords
WHERE #tempRecords.RN BETWEEN CONVERT([bigint], #START_VALUE) AND CONVERT([bigint], #END_VALUE)
END
GO
'
A few performance tips:
1) #UID is null OR dbo.TABLE1.ID = #UID --> this is bad because you'll have one execution plan when UID is null and when it's not. Build a dynamic sql query and you'll get 2 execution plans.
2) Update stats in a maintenance plan.
3) Check index fragmentation.
4) Try to do the same thing without using a temp table.
5) Try to avoid castings.
I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)
According to the last running I made it is working good but it is expected to last for about 200 days:
SET NOCOUNT ON
-- define the last ID handled
DECLARE #LastID integer
SET #LastID = 0
declare #tempDate datetime
set #tempDate = dateadd(dd,-20,getdate())
-- define the ID to be handled now
DECLARE #IDToHandle integer
DECLARE #iCounter integer
DECLARE #watch1 nvarchar(50)
DECLARE #watch2 nvarchar(50)
set #iCounter = 0
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
-- as long as we have s......
WHILE #IDToHandle IS NOT NULL
BEGIN
IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = #IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =#IDToHandle )=0)
BEGIN
INSERT INTO SECONDERY_TABLE
SELECT col1,col2,col3.....
FROM MAIN_TABLE WHERE id_col = #IDToHandle
EXEC [dbo].[DeleteByID] #ID = #IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
set #iCounter = #iCounter +1
END
IF (#iCounter % 1000 = 0)
begin
set #watch1 = 'iCounter - ' + CAST(#iCounter AS VARCHAR)
set #watch2 = 'IDToHandle - '+ CAST(#IDToHandle AS VARCHAR)
raiserror ( #watch1, 10,1) with nowait
raiserror (#watch2, 10,1) with nowait
end
-- set the last handled to the one we just handled
SET #LastID = #IDToHandle
SET #IDToHandle = NULL
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
END
Any ideas or directions to improve this procedure run-time will be welcomed
Yes, try this:
Declare #Ids Table (id int Primary Key not Null)
Insert #Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
And someDateCol < #tempDate -- If there are times in these datetime fields,
-- then you may need to modify this condition.
And some_other_int_col In (1745, 1548, 4785)
And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
Where some_int_col = m.id_col)
And Not Exists (Select * From A_70k_rows_table
Where some_int_col = m.id_col)
Select id from #Ids -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified #Ids is correct
-- Once you have verified that above #Ids is correctly populated,
-- then delete or comment out the select and return lines above so insert runs.
Begin Transaction
Delete OT -- eliminate row-by-row call to second stored proc
From OtherTable ot
Join MAIN_TABLE m On m.id_col = ot.FKCol
Join #Ids i On i.Id = m.id_col
Insert SECONDERY_TABLE(col1, col2, etc.)
Select col1,col2,col3.....
FROM MAIN_TABLE m Join #Ids i On i.Id = m.id_col
Delete m -- eliminate row-by-row call to second stored proc
FROM MAIN_TABLE m
Join #Ids i On i.Id = m.id_col
Commit Transaction
Explaanation.
You had numerous filtering conditions that were not SARGable, i.e., they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.
Is there a more efficient way to write this code? Or with less code?
SELECT *
INTO #Temp
FROM testtemplate
Declare #id INT
Declare #name VARCHAR(127)
WHILE (SELECT Count(*) FROM #Temp) > 0
BEGIN
SELECT TOP 1 #id = testtemplateid FROM #Temp
SELECT TOP 1 #name = name FROM #Temp
UPDATE testtemplate
SET testtemplate.vendortestcode = (SELECT test_code FROM test_code_lookup WHERE test_name = #name)
WHERE testtemplateid = #id
--finish processing
DELETE #Temp Where testtemplateid = #id
END
DROP TABLE #Temp
You can do this in a single UPDATE with no need to loop.
UPDATE tt
SET vendortestcode = tcl.test_code
FROM testtemplate tt
INNER JOIN test_code_lookup tcl
ON tt.name = tcl.test_name
You could try a single update like this:
UPDATE A
SET A.vendortestcode = B.test_code
FROM testtemplate A
INNER JOIN test_code_lookup B
ON A.name = B.test_name
Also, the way you are doing it now is wrong, since you are taking a TOP 1 Id and a TOP 1 name in two separate querys, without an ORDER BY, so its not sure that you are taking the right name for your ID.
You could write a function to update vendortestcode. Then your code reduces to one SQL statement:
update testtemplate set vendortestcode = dbo.get_test_code_from_name(name)
I have a simple query for update table (30 columns and about 150 000 rows).
For example:
UPDATE tblSomeTable set F3 = #F3 where F1 = #F1
This query will affected about 2500 rows.
The tblSomeTable has a trigger:
ALTER TRIGGER [dbo].[trg_tblSomeTable]
ON [dbo].[tblSomeTable]
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
declare #operationType nvarchar(1)
declare #createDate datetime
declare #UpdatedColumnsMask varbinary(500) = COLUMNS_UPDATED()
-- detect operation type
if not exists(select top 1 * from inserted)
begin
-- delete
SET #operationType = 'D'
SELECT #createDate = dbo.uf_DateWithCompTimeZone(CompanyId) FROM deleted
end
else if not exists(select top 1 * from deleted)
begin
-- insert
SET #operationType = 'I'
SELECT #createDate = dbo..uf_DateWithCompTimeZone(CompanyId) FROM inserted
end
else
begin
-- update
SET #operationType = 'U'
SELECT #createDate = dbo..uf_DateWithCompTimeZone(CompanyId) FROM inserted
end
-- log data to tmp table
INSERT INTO tbl1
SELECT
#createDate,
#operationType,
#status,
#updatedColumnsMask,
d.F1,
i.F1,
d.F2,
i.F2,
d.F3,
i.F3,
d.F4,
i.F4,
d.F5,
i.F5,
...
FROM (Select 1 as temp) t
LEFT JOIN inserted i on 1=1
LEFT JOIN deleted d on 1=1
END
And if I execute the update query I have a timeout.
How can I optimize a logic to avoid timeout?
Thank you.
This query:
SELECT *
FROM (
SELECT 1 AS temp
) t
LEFT JOIN
INSERTED i
ON 1 = 1
LEFT JOIN
DELETED d
ON 1 = 1
will yield 2500 ^ 2 = 6250000 records from a cartesian product of INSERTED and DELETED (that is all possible combinations of all records in both tables), which will be inserted into tbl1.
Is that what you wanted to do?
Most probably, you want to join the tables on their PRIMARY KEY:
INSERT
INTO tbl1
SELECT #createDate,
#operationType,
#status,
#updatedColumnsMask,
d.F1,
i.F1,
d.F2,
i.F2,
d.F3,
i.F3,
d.F4,
i.F4,
d.F5,
i.F5,
...
FROM INSERTED i
FULL JOIN
DELETED d
ON i.id = d.id
This will treat update to the PK as deleting a record and inserting another, with a new PK.
Thanks Quassnoi, It's a good idea with "FULL JOIN". It is helped me.
Also I try to update table in portions (1000 items in one time) to make my code works faster because for some companyId I need to update more than 160 000 rows.
Instead of old code:
UPDATE tblSomeTable set someVal = #someVal where companyId = #companyId
I use below one:
declare #rc integer = 0
declare #parts integer = 0
declare #index integer = 0
declare #portionSize int = 1000
-- select Ids for update
declare #tempIds table (id int)
insert into #tempIds
select id from tblSomeTable where companyId = #companyId
-- calculate amount of iterations
set #rc=##rowcount
set #parts = #rc / #portionSize + 1
-- update table in portions
WHILE (#parts > #index)
begin
UPDATE TOP (#portionSize) t
SET someVal = #someVal
FROM tblSomeTable t
JOIN #tempIds t1 on t1.id = t.id
WHERE companyId = #companyId
delete top (#portionSize) from #tempIds
set #index += 1
end
What do you think about this? Does it make sense? If yes, how to choose correct portion size?
Or simple update also good solution? I just want to avoid locks in the future.
Thanks