I have to update 40 million rows in SQL Server [closed]

I have to update 40 million rows in SQL Server [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
The code shown here is taking a very long time to update or populate 42673844. Could anyone help me on how to simplify this code and improve the performance.
I am trying to populate the Session ID's based on some conditions
Declaration of variables
declare #sval as INT
declare #maxRow as INT
declare #currvlt as int
declare #prevvlt as int
declare #cashamount as int
declare #sec as INT
declare #next_action as INT
declare #previous_action as INT
declare #rec as int
declare #currentRow as INT
-- assigning the maximum value to the #maxrow
set #maxRow = (select max(number) from dbo.stage3 as count)
print #maxRow
set #sval = 0
set #currentRow = 1
I am putting the conditions in a while loop to perform an update all the 42 million rows. The below select will assign the values to the declared variables
While (#currentRow <= #maxRow)
begin
select
#currvlt = vltid ,
#prevvlt = previous_vltid ,
#cashamount = isnull(playercashableamount,0),
#sec = seconds,
#next_action = next_action , --next record type
#previous_action = previous_action,
#rec = recordtype
from dbo.stage3
where number = #currentRow
if #currvlt <> #prevvlt or #prevvlt is null
begin
update dbo.stage3 set sessionid = #sval+1 where number = #currentRow
set #sval = #sval +1
end
else
if (#rec = '4' and #sec > 30 and #previous_action in( 1 ,5)) or
(#cashamount < 1 and #sec < 30 and #next_action in( 1 ,4 )) or
((#rec = '5' and #sec < 30 and #next_action = '4' ) or
#next_action = '4')
begin
update dbo.stage3 set sessionid = #sval where number = #currentRow
end
else
update dbo.stage3 set sessionid = #sval where number = #currentRow
set #currentRow = #currentRow +1
end

This might help. You should not need to do a row by row evaluation. Instead, address the entire set in one call when possible. Since you have provided no information about the business challenge you are trying to solve, this code is posted as-is. This may not be exactly what you're after, but it should get you going in the right direction. Please adjust for your own needs.
Also, any update on a table with 40 million rows will take a long time; even if you have perfectly tuned indexes on lightning fast storage. Instead of thinking "how do I get the answer I want?", try thinking "what am I asking the server to do for me?", and "is there a better way to make this request?" SQL Server runs best when dealing with a set of data in one pass. Analyzing row by row is not what the engine is built to do. It can be done, but should be used only when all other means have been exhausted.
The code below is a select statement. Run it, and if the results are what you're looking for, uncomment the UPDATE section, then comment out the SELECT section and run it.
SELECT CASE
WHEN recordtype = '4'
AND seconds > 30
AND previous_action IN (1, 4) THEN 0
WHEN s.cash_amount < 1
AND seconds < 30
AND next_action IN (1, 4) THEN 0
WHEN next_action = 4 THEN 0
WHEN (vltid <> previous_vltid)
OR previous_vltid IS NULL THEN s.RowNum
ELSE s.RowNum
END AS session_id
,s.vltid
,s.previous_vltid
,s.cash_amount
,s.recordtype
,s.previous_action
,s.seconds
,s.next_action
--UPDATE s
--SET sessionid = CASE
-- WHEN recordtype = '4'
-- AND seconds > 30
-- AND previous_action IN (1, 4) THEN 0
-- WHEN s.cash_amount < 1
-- AND seconds < 30
-- AND next_action IN (1, 4) THEN 0
-- WHEN next_action = 4 THEN 0
-- WHEN (vltid <> previous_vltid)
-- OR previous_vltid IS NULL THEN s.RowNum
-- ELSE s.RowNum
-- END
FROM (
SELECT sessionid
,ROW_NUMBER() OVER (PARTITION BY vltid
ORDER BY
vltid
) AS RowNum
,vltid
,previous_vltid
,ISNULL(playercashableamount, 0) AS cash_amount
,recordtype
,previous_action
,seconds
,next_action
FROM dbo.stage3
) AS s;

Related

While Loop SQL not populating complete results

Question: the iteration happens only till record 131 and gives accurate value, after that the parameter #ADE_END_DATE returns a NULL value, why would that be? Below is my code.
Additionally I noticed the column Leave_Date has NULL values and the iteration stops and returns NULL value for the parameter #ADE_END_DATE where the NULL value starts.
Thanks for your help.
BEGIN
DECLARE #HIREDATEPlus1Yr DATETIME
DECLARE #ADE_Leave_Date DATETIME
DECLARE #ADE_End_Date DATETIME
DECLARE #ADE_Start_Date DATETIME
DECLARE #DATECAL DATETIME
DECLARE #i INT
DECLARE #j INT
DECLARE #Loop_length INT
DECLARE #ID VARCHAR(18)
-- start of loop
SET #j = 1
-- Loop length will equal to the list of all ADRs
SET #Loop_Length = (SELECT COUNT([AD_ID])
FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE Status NOT IN ('MANAGER', 'TBH', 'FROZEN'))
-- Loop through each ADRs
WHILE (#j <= #Loop_length)
BEGIN
-- Loop through each ADRs
SET #i = 0
-- Find AD ID
SET #ID = (SELECT TOP 1 [AD_ID] FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE [AD_ID] NOT IN (SELECT TOP (#j-1) [AD_ID]
FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE ([AD_ID] IS NOT NULL
AND Status NOT IN ('MANAGER', 'TBH', 'FROZEN'))))
-- Find the start date of the ADR
SET #ADE_Start_Date = (SELECT TOP 1 [Hire_Date]
FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE [AD_ID] NOT IN (SELECT TOP (#j-1) [AD_ID]
FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE ([AD_ID] IS NOT NULL
AND Status NOT IN ('MANAGER', 'TBH', 'FROZEN'))))
-- Hire date plus 1 year
SET #HIREDATEPlus1Yr = DATEADD(YEAR, 1, #ADE_Start_Date)
--Adding Leave Date
SET #ADE_Leave_Date = (SELECT TOP 1 [LEAVE_DATE]
FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE [AD_ID] NOT IN (SELECT TOP (#j-1) [AD_ID]
FROM [DS_ADHOC_MOPs].[ADE].[List]
WHERE ([AD_ID] IS NOT NULL
AND Status NOT IN ('MANAGER', 'TBH', 'FROZEN'))))
-- Set a temporary variable which will be 1 year from now. Use the Date ADD formulae to start date, if they are leaver before one year then add leave date (Use IF): DONE
-- Put everything inside the while loop and add opportunity selecting to it.
IF #ADE_Leave_Date IS NULL
SET #ADE_End_Date = DATEADD(YEAR, 1, #ADE_Start_Date)
ELSE IF #HIREDATEPlus1Yr < #ADE_Leave_Date
SET #ADE_End_Date = DATEADD(YEAR, 1, #ADE_Start_Date)
ELSE
SET #ADE_End_Date = #ADE_Leave_Date
SET #DATECAL = datediff(DAY, #ADE_Start_Date, #ADE_End_Date)
SET #j = #j + 1
UPDATE #TEMPTABLEEEE
SET [#ADE_End_Date] = #ADE_End_Date
WHERE #ID = AD_ID
END
SELECT * FROM #TEMPTABLEEEE
END

I'm not sure why you are using a WHILE loop. It looks like this code could be much simplified. SQL is a set based language. Whenever possible, you should try to handle your data as a whole set instead of breaking it down into row by row evaluations.
Does this give you what you need? If the table has more than one row for each AD_ID, you will need to get the MAX() or MIN() Hire_Date/LEAVE_DATE. To improve the answer, consider providing sample data.
UPDATE t
SET [#ADE_End_Date] = ed.ADE_EndDate
FROM #TEMPTABLEEEE AS t
INNER JOIN (
SELECT AD_ID
,CASE
WHEN LEAVE_DATE IS NULL THEN DATEADD(YEAR,1,Hire_Date)
WHEN DATEADD(YEAR,1,Hire_Date) < LEAVE_DATE THEN DATEADD(YEAR,1,Hire_Date)
ELSE LEAVE_DATE
END AS ADE_EndDate
FROM DS_ADHOC_MOPs.ADE.List
WHERE Status NOT IN ('MANAGER', 'TBH', 'FROZEN')
) AS ed ON t.AD_ID = ed.AD_ID

Insert data based on non existence and different criteria's [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm trying to add a row to my table only on two conditions but when inserting it retrieves error and I cannot figure it out
Create PROC [dbo].[setvisitorqueue]
#pid bigint = null , #vid int = NULL ,#regdate nvarchar(50) =NULL
AS
declare #queNum int =null
set #queNum = (select max([ticketNo]) + 1 from [dbo].[queue] where [ticketdate]= GetDate())
if( #queNum is null) begin set #queNum=1 end
Declare #Tktt int = null
set #Tktt = (select count(queue.ticketid) from queue where (queue.pid = #pid )and (queue.ticketdate = GetDate()) and (queue.vid = #vid and queue.checked = 0))
if (#Tktt is null )
begin insert into queue (vid , pid , ticketNo , ticketdate ) Values (#Vid,#pid,#queNum,#regdate ) end
Its not working for me.

Can you try it simple way like this?
CREATE PROC [dbo].[setvisitorqueue]
#pid BIGINT = null,
#vid INT = NULL,
#regdate NVARCHAR(50) = NULL
AS
IF (
SELECT COUNT(ticketid)
FROM [dbo].[queue]
WHERE checked = 0 and pid = #pid and vid = #vid and ticketdate = GetDate()
) = 0
INSERT INTO [dbo].[queue](vid pid, ticketdate, ticketNo )
SELECT #Vid, #pid, #regdate, ticketNo = IsNull(MAX([ticketNo]),0) + 1
FROM [dbo].[queue]
WHERE [ticketdate]= GetDate();
RETURN;
GO
In this code I've done following:
Improved readability by Caps, intend, spaces, etc.
Eliminated variables - you do not need them in that code You do not
need to calculate a "TicketNo" in the beginning if it won't be used.
So, it will be calculated if needed within IF statement.
You do not need to use BEGIN-END on every transaction, single
request IS a transaction
Not sure what your error was, but your procedure won't do anything just because when you do "COUNT" it returns a number. That means your "#Tktt" variable would never be NULL.
I guess your intention is to run the Insert statement when it is no records found and compared "COUNT" query to "0" value.

Here is your SP with all the issues I spotted corrected with comments, and with best practices added. As noted by the other answer you can probably simplify things. I have just aimed to correct existing issues.
-- NOTES: Keep your casing and layout consistent
-- Always terminate statements with a semi-colon
-- Don't add un-necessary brackets, they just clutter the code
-- You also have a concurrency issue:
-- if this proc is called twice at the same time you could issue the same ticket number twice
create proc [dbo].[setvisitorqueue]
(
#pid bigint = null,
#vid int = null,
-- Only every use a datetime/date datatype to store a datatime/date. Datetime2 is the current standard. Change precision to suit.
#regdate datetime2(0) = null
)
as
begin
-- Always start your SP with
set nocount, xact_abort on;
declare #queNum int = null;
set #queNum = (
select max([ticketNo]) + 1
from dbo.[queue]
-- This seems very unlikely to happen? It has to match down to the fraction of a second.
where [ticketdate] = getdate()
);
if #queNum is null begin
set #queNum = 1;
end;
declare #Tktt int = null;
-- #Tktt will *never* be null after this, it may be zero though.
set #Tktt = (
select count(*)
from dbo.[queue]
where pid = #pid
-- This seems very unlikely to happen? It has to match down to the fraction of a second.
and ticketdate = getdate()
and vid = #vid and checked = 0
);
-- Handle 0 or null just in case
-- if #Tktt is null -- THIS IS WHAT PREVENTED YOUR INSERT
if coalesce(#Tktt,0) = 0
begin
insert into dbo.[queue] (vid, pid, ticketNo, ticketdate)
values (#Vid, #pid, #queNum, #regdate);
end;
-- Always return the status of the SP, 0 means OK
return 0;
end;

SQL Server: is this a bug or do I have a misunderstanding?

Today I'm found a very sticky problem on SQL Server 2014.
Scenario: I want to pay awards to my customer (some pin code for cell phone operator)
In last cycle of loop T.Used = 0 condition is bypassed and is not working. I know in other conditions in that query (T.Cash < (#myAwards - #paid)) is there a mistake and I must to use T.Cash <= (#myAwards - #paid) instead of this but please focus on main question.
Why it's happened when I update Used flag to 1 (True) then in next loop it's selected while it doesn't have a valid condition (T.Used = 0)?
DECLARE #myAwards INT = 90000,
#paid INT = 0;
DECLARE #Temp TABLE
(
Id INT NOT NULL,
Pin VARCHAR(100) NOT NULL,
Cash INT NOT NULL,
[Weight] INT NULL,
Used BIT NOT NULL
)
INSERT INTO #Temp
SELECT
UPFI.Id, UPFI.PinCode,
PT.Cash, NULL, 0
FROM
dbo.UploadedPinFactorItem UPFI WITH (NOLOCK)
INNER JOIN
dbo.PinType PT WITH (NOLOCK) ON PT.ID = UPFI.PinTypeID
WHERE
PT.Cash <= #myAwards
UPDATE T
SET [Weight] = ISNULL((SELECT COUNT(TT.Id)
FROM #Temp TT
WHERE TT.Cash = T.Cash), 0) * T.Cash
FROM #Temp T
--For debug (first picture)
SELECT * FROM #Temp
DECLARE #i int = 1
DECLARE #count int = 0
SELECT #count = COUNT([Id]) FROM #Temp C WHERE C.Used = 0
WHILE (#i <= #count AND #paid < #myAwards)
BEGIN
DECLARE #nextId INT,
#nextCash INT,
#nextFlag BIT;
-- 'T.Used = 0' condition is by passed
SELECT TOP (1)
#nextId = T.Id, #nextCash = T.Cash, #nextFlag = T.Used
FROM
#Temp T
WHERE
T.Used = 0
AND T.Cash < (#myAwards - #paid)
ORDER BY
T.[Weight] DESC, T.Cash DESC, T.Id DESC
UPDATE #Temp
SET Used = 1
WHERE Id = #nextId
SET #i = #i + 1
SET #paid = #paid + #nextCash
--Show result in second picture
SELECT
#i AS 'i', #paid AS 'paid', #nextFlag AS 'flag', #nextId AS 'marked Id',*
FROM
#temp T
ORDER BY
T.[Weight] DESC, T.Cash DESC, T.Id DESC
END
SELECT 'final', #paid, *
FROM #temp T
ORDER BY T.[Weight] DESC, T.Cash DESC, T.Id DESC
Please let me to understand this is a bug or I have misunderstanding
First screenshot:
Second screenshot (result of loop):
Third screenshot (final result):

As per my comments:
This isn't a problem with the condition, the problem is with the implemented logic. After i = 4, there are no more rows where T.Used = 0 AND T.Cash < (#myAwards - #paid), that makes it so your reassigning variables gets zero rows, so they mantain the previous values.
You can test this behavior by doing:
DECLARE #A INT = 10;
SELECT #A = object_id
FROM sys.all_objects
WHERE name = 'an object that doesn''t exist'
SELECT #A;

How to optimize this t-sql script code by avoiding loop?

I use following sql query to update MyTable. the code take between 5 to 15 min. to update MyTabel as long as ROWS <= 100000000 but when Rows > 100000000 it take exponential time to update MYTable. How can I change this code to use set-base instead of while loop?
DECLARE #startTime DATETIME
DECLARE #batchSize INT
DECLARE #iterationCount INT
DECLARE #i INT
DECLARE #from INT
DECLARE #to INT
SET #batchSize = 10000
SET #i = 0
SELECT #iterationCount = COUNT(*) / #batchSize
FROM MyTable
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
WHILE #i <= #iterationCount BEGIN
BEGIN TRANSACTION T
SET #startTime = GETDATE()
SET #from = #i * #batchSize
SET #to = (#i + 1) * #batchSize - 1
;WITH data
AS (
SELECT DoorsReleased, ROW_NUMBER() OVER (ORDER BY id) AS Row
FROM MyTable
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
)
UPDATE data
SET DoorsReleased = ~DoorsReleased
WHERE row BETWEEN #from AND #to
SET #i = #i + 1
COMMIT TRANSACTION T
END

One of your issues is that your select statement in the loop fetches all records for LitraID = 8175, sets row numbers, then filters in the update statement. This happens on every iteration.
One way round this would be to get all ids for the update before entering the loop and storing them in a temporary table. Then you can write a similar query to the one you have, but joining to this table of ids.
However, there is an even easier way if you know approximately how many records have LitraID = 8175 and if they are spread throughout the table, not bunched together with similar ids.
DECLARE #batchSize INT
DECLARE #minId INT
DECLARE #maxId INT
SET #batchSize = 10000 --adjust according to how frequently LitraID = 8175, larger numbers if infrequent
SET #minId = 100000000
WHILE #minId <= 300000000 BEGIN
SET #maxId = #minId + #batchSize - 1
IF #maxId > 300000000 BEGIN
SET #maxId = 300000000
END
BEGIN TRANSACTION T
UPDATE MyTable
SET DoorsReleased = ~DoorsReleased
WHERE id BETWEEN #minId AND #maxId
COMMIT TRANSACTION T
SET #minId = #maxId + 1
END
This will use the value of id to control the loop, meaning you don't need the extra step to calculate #iterationCount. It uses small batches so that the table isn't locked for long periods. It doesn't have any unnecessary SELECT statements and the WHERE clause in the update is efficient assuming id has an index.
It won't have exactly the same number of records updated in every transaction, but there's no reason it needs to.

This will eliminate the loop
UPDATE MyTable
set DoorsReleased = ~DoorsReleased
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
AND DoorsReleased is not null -- if DoorsReleased is nullable
-- AND DoorsReleased <> ~DoorsReleased</strike>
if you are set on looping
below will NOT work
I thought ~ was part of the column name but it is a not operator
select 1;
WHILE (##ROWCOUNT > 0)
BEGIN
UPDATE top (100000) MyTable
set DoorsReleased = ~DoorsReleased
WHERE LitraID = 8175
AND id BETWEEN 100000000 AND 300000000
AND ( DoorsReleased <> ~DoorsReleased
or ( DoorsReleased is null and ~DoorsReleased is not null )
)
END
Inside a transaction I don't think looping would have value as the transaction log cannot clear. And a batch size of 10,000 is small.\
as stated in a comment if you want to loop then try using id as row_number() all those loops is expensive
you might be able to use OFFSET

Using a dummy variable to keep a count

I have a user (X) who has been sent 15 emails. There is a field called Opened which is 1 or 0 depending if he opened the emails. I want to work down the list (ordered by date) and start a count each time I get to zero i.e. count each time he gets an email until he opens it. Then I want to average out all the counts. I have tried partitioning and various things but nothing works.
Sorry if I am putting this in the wrong place. This is my final code, should anyone want to do something similar.
Declare #UserCount as Int
Declare #counter as int
Declare #StartEnd as varchar(10)
SET #counter=1
SET #UserCount=1
--Step through the table by user
WHILE #UserCount < (SELECT COUNT(UserID) FROM #Stage2)
BEGIN
--UserProcessOrder is the row number - update the dummy parameter #StartEnd
SET #StartEnd = (SELECT TOP(1) StartEnd FROM #Stage2 WHERE UserProcessOrder=#UserCount)
--Update the table with the counter value
UPDATE #Stage2 SET MaxBeforeOpen=#counter where UserProcessOrder=#UserCount
--Add 1 to the counter so that every time we see START we add 1
SET #Counter = #counter+1
--If we are at the end of the group then re-set the counter
IF #StartEnd='END' SET #counter = 1
--Go to next record for that user
SET #UserCount=#UserCount+1
END
--select * from #Stage2
SELECT AVG(MaxBeforeOpen) as AverageEmailsBeforeOpen
FROM
(
SELECT *
from #Stage2
WHERE StartEnd='END'
) as x

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

I have to update 40 million rows in SQL Server [closed] - sql

Related

While Loop SQL not populating complete results

Insert data based on non existence and different criteria's [closed]

SQL Server: is this a bug or do I have a misunderstanding?

How to optimize this t-sql script code by avoiding loop?

Using a dummy variable to keep a count

Categories

Resources