Speed up simple update statement in postgres for 1 million rows - sql

I have a very simple sql update statement in postgres.
UPDATE p2sa.observation SET file_path = replace(file_path, 'path/sps', 'newpath/p2s')
The observation table has 1513128 rows. The query so far has been running for around 18 hours with no end in sight.
The file_path column is not indexed so I guess it is doing a top to bottom scan but it seems a bit excessive the time. Probably replace is also a slow operation.
Is there some alternative or better approach for doing this one off kind of update which affects all rows. It is essentially updating an old file path to a new location. It only needs to be updated once or maybe again in the future.
Thanks.

In SQL you could do a while loop to update in batches.
Try this to see how it performs.
Declare #counter int
Declare #RowsEffected int
Declare #RowsCnt int
Declare #CodeId int
Declare #Err int
DECLARE #MaxNumber int = (select COUNT(*) from p2sa.observation)
SELECT #COUNTER = 1
SELECT #RowsEffected = 0
WHILE ( #RowsEffected < #MaxNumber)
BEGIN
SET ROWCOUNT 10000
UPDATE p2sa.observation
SET file_path = replace(file_path, 'path/sps', 'newpath/p2s')
where file_path != 'newpath/p2s'
SELECT #RowsCnt = ##ROWCOUNT ,#Err = ##error
IF #Err <> 0
BEGIN
Print 'Problem Updating the records'
BREAK
END
ELSE
SELECT #RowsEffected = #RowsEffected + #RowsCnt
PRINT 'The total number of rows effected :'+convert(varchar,#RowsEffected)
/*delaying the Loop for 10 secs , so that Update is completed*/
WAITFOR DELAY '00:00:10'
END
SET ROWCOUNT 0

Related

SQL Stored Procedure - Understanding SQL Statement

New to stored procedures. Can anyone explain the following SQL sample which appears at the start of a stored procedure?
Begin/End - Encloses a series of SQL statements so that a group of SQL statements can be executed
SET NOCOUNT ON - the count (indicating the number of rows affected by a SQL statement) is not returned.
DECLARE - setting local variables
While - loops round
With - unsure
Update batch - unsure
SET #Rowcount = ##ROWCOUNT; - unsure
BEGIN
SET NOCOUNT ON;
--UPDATE, done in batches to minimise locking
DECLARE #Batch INT= 100;
DECLARE #Rowcount INT= #Batch;
WHILE #Rowcount > 0
BEGIN
WITH t
AS (
SELECT [OrganisationID],
[PropertyID],
[QuestionID],
[BaseAnsweredQuestionID]
FROM dbo.Unioned_Table
WHERE organisationid = 1),
s
AS (
SELECT [OrganisationID],
[PropertyID],
[QuestionID],
[BaseAnsweredQuestionID]
FROM dbo.table
WHERE organisationid = 1),
batch
AS (
SELECT TOP (#Batch) T.*,
s.BaseAnsweredQuestionID NewBaseAnsweredQuestionID
FROM T
INNER JOIN s ON t.organisationid = s.organisationid
AND t.PropertyID = s.PropertyID
AND t.QuestionID = s.QuestionID
WHERE t.BaseAnsweredQuestionID <> s.BaseAnsweredQuestionID)
UPDATE batch
SET
BaseAnsweredQuestionID = NewBaseAnsweredQuestionID
SET #Rowcount = ##ROWCOUNT;
END;
The clue is in the comment --UPDATE, done in batches to minimise locking.
The intent is to update dbo.table's column BaseAnsweredQuestionID with the equivalent column from dbo.Unioned_Table, in batches of 100. The comment suggests the batching logic is necessary to prevent locking.
In detail:
DECLARE #Batch INT= 100; sets the batch size.
DECLARE #Rowcount INT= #Batch; initializes the loop.
WHILE #Rowcount > 0 starts the loop. #Rowcount will become zero when the update statement affects no rows (see below).
with a as () is a common table expression (commonly abbreviated to CTE) - it creates a temporary result set which you can effectively treat as a table. The next few queries define CTEs t, s and batch.
CTE batch contains just 100 rows by using the SELECT TOP (#Batch) term - it selects a random 100 rows from the two other CTEs.
The next statement:
UPDATE batch
SET BaseAnsweredQuestionID = NewBaseAnsweredQuestionID
SET #Rowcount = ##ROWCOUNT
updates the 100 rows in the batch CTE (which in turn is a join on two other CTEs), and populates the loop variable #Rowcount with the number of rows affected by the update statement (##ROWCOUNT). If there are no matching rows, ##ROWCOUNT becomes zero, and thus the loop ends.

WAITFOR DELAY doesn't act separately within each WHILE loop

I've been teaching myself to use WHILE loops and decided to try making a fun Russian Roulette simulation. That is, a query that will randomly SELECT (or PRINT) up to 6 statements (one for each of the chambers in a revolver), the last of which reads "you die!" and any prior to this reading "you survive."
I did this by first creating a table #Nums which contains the numbers 1-6 in random order. I then have a WHILE loop as follows, with a BREAK if the chamber containing the "bullet" (1) is selected (I know there are simpler ways of selecting a random number, but this is adapted from something else I was playing with before and I had no interest in changing it):
SET NOCOUNT ON
CREATE TABLE #Nums ([Num] INT)
DECLARE #Count INT = 1
DECLARE #Limit INT = 6
DECLARE #Number INT
WHILE #Count <= #Limit
BEGIN
SET #Number = ROUND(RAND(CONVERT(varbinary,NEWID()))*#Limit,0,1)+1
IF NOT EXISTS (SELECT [Num] FROM #Nums WHERE [Num] = #Number)
BEGIN
INSERT INTO #Nums VALUES(#Number)
SET #Count += 1
END
END
DECLARE #Chamber INT
WHILE 1=1
BEGIN
SET #Chamber = (SELECT TOP 1 [Num] FROM #Nums)
IF #Chamber = 1
BEGIN
SELECT 'you die!' [Unlucky...]
BREAK
END
SELECT
'you survive.' [Phew...]
DELETE FROM #Nums WHERE [Num] = #Chamber
END
DROP TABLE #Nums
This works fine, but the results all appear instantaneously, and I want to add a delay between each one to add a bit of tension.
I tried using WAITFOR DELAY as follows:
WHILE 1=1
BEGIN
WAITFOR DELAY '00:00:03'
SET #Chamber = (SELECT TOP 1 [Num] FROM #Nums)
IF #Chamber = 1
BEGIN
SELECT 'you die!' [Unlucky...]
BREAK
END
SELECT
'you survive.' [Phew...]
DELETE FROM #Nums WHERE [Num] = #Chamber
END
I would expect the WAITFOR DELAY to initially cause a 3 second delay, then for the first SELECT statement to be executed and for the text to appear in the results grid, and then, assuming the live chamber was not selected, for there to be another 3 second delay and so on, until the live chamber is selected.
However, before anything appears in my results grid, there is a delay of 3 seconds per number of SELECT statements that are executed, after which all results appear at the same time.
I tried using PRINT instead of SELECT but encounter the same issue.
Clearly there's something I'm missing here - can anyone shed some light on this?
It's called buffering. The server doesn't want to return an only partially full response because most of the time, there's all of the networking overheads to account for. Lots of very small packets is more expensive than a few larger packets1.
If you use RAISERROR (don't worry about the name here where we're using 10) you can specify NOWAIT to say "send this immediately". There's no equivalent with PRINT or returning result sets:
SET NOCOUNT ON
CREATE TABLE #Nums ([Num] INT)
DECLARE #Count INT = 1
DECLARE #Limit INT = 6
DECLARE #Number INT
WHILE #Count <= #Limit
BEGIN
SET #Number = ROUND(RAND(CONVERT(varbinary,NEWID()))*#Limit,0,1)+1
IF NOT EXISTS (SELECT [Num] FROM #Nums WHERE [Num] = #Number)
BEGIN
INSERT INTO #Nums VALUES(#Number)
SET #Count += 1
END
END
DECLARE #Chamber INT
WHILE 1=1
BEGIN
WAITFOR DELAY '00:00:03'
SET #Chamber = (SELECT TOP 1 [Num] FROM #Nums)
IF #Chamber = 1
BEGIN
RAISERROR('you die!, Unlucky',10,1) WITH NOWAIT
BREAK
END
RAISERROR('you survive., Phew...',10,1) WITH NOWAIT
DELETE FROM #Nums WHERE [Num] = #Chamber
END
DROP TABLE #Nums
As Larnu already aluded to in comments, this isn't a good use of T-SQL.
SQL is a set-oriented language. We try not to write procedural code (do this, then do that, then run this block of code multiple times). We try to give the server as much as possible in a single query and let it work out how to process it. Whilst T-SQL does have language support for loops, we try to avoid them if possible.
1I'm using packets very loosely here. Note that it applies the same optimizations no matter what networking (or no-networking-local-memory) option is actually being used to carry the connection between client and server.

SQL/PLSQL - automate the following through batch

Please help me to do the following,
First I need to connect to sql plus with username/Passwd#SID to check for the value.
For this I have variable to the query like below,
declare #Extract uniqueidentifier
select #Extract = select extract from integration where number='1000';
Based on the value (say 0), it has to come out of the loop to trigger the batch.
If the value (say 1), batch should not be triggered and mail should be sent might be through batch .
Thanks for your time!
Here an example if I think I know what you are asking for
Declare #Extract int
Declare #Count int
set #Count = 1
while #Count < 12 --(select MAX(number) + 1 from tablename)
Begin
set #Extract = #count--(select extract from tablename where number = #count)
if #Extract = 0
Begin
--Trigger batch Code
select #count
End
set #count = #count + 1
End
Be sure to uncomment the select statement to do an efficient loop of the table

Does ms sql server not take ReadPast Hint in Functions?

I have a pending order table with a check constraint to prevent people from ordering an item we don't have in stock. This required me to create a counter function to decide if an insert can happen or not. It works until there is 1 item left in inventory then I get a message that we are out of stock of the item. I thought it was a dirty read issue but even after interducing a ReadPast hint I still see this behavior. Is there some other factor causing this problem? Or do I need to setup the isolation level differently?
I have tried calling this function with the sprokID and it returns true which is why I am thinking during insert there is a dirty read taking place.
ALTER TABLE [dbo].[PendingSprokOrders] WITH CHECK ADD CONSTRAINT [CK_SprokInStock] CHECK (([dbo].[SprokInStockCount]([SprokID])=(1)))
FUNCTION [dbo].[SprokInStockCount] ( #SprokId INT )
RETURNS INT
AS
BEGIN
DECLARE #Active INT
SET #Active = ( SELECT COUNT(*)
FROM [PendingSprokOrders] AS uac WITH(READPAST)
WHERE uac.SprokID = #SprokId
)
DECLARE #Total INT
SET #Total = ( SELECT
ISNULL(InStock, 0)
FROM SprokInvetory
WHERE id = #SprokId
)
DECLARE #Result INT
IF #Total - #Active > 0
SET #Result = 1
ELSE
SET #Result = 0
RETURN #Result;
END;
The math is off. Instead of:
IF #Total - #Active > 0
SET #Result = 1
ELSE
SET #Result = 0
it should be:
IF #Total - #Active > -1
SET #Result = 1
ELSE
SET #Result = 0
That's because your constraint function can see the row that you are attempting to add and is counting it.
Yes it does but your set#total statements are contradictory also there are a couple breaks in your code.

I need to run a stored procedure on multiple records

I need to run a stored procedure on a bunch of records. The code I have now iterates through the record stored in a temp table. The stored procedure returns a table of records.
I was wondering what I can do to avoid the iteration if anything.
set #counter = 1
set #empnum = null
set #lname = null
set #fname = null
-- get all punches for employees
while exists(select emp_num, lname, fname from #tt_employees where id = #counter)
begin
set #empnum = 0
select #empnum = emp_num, #lname = lname , #fname= fname from #tt_employees where id = #counter
INSERT #tt_hrs
exec PCT_GetEmpTimeSp
empnum
,#d_start_dt
,#d_end_dt
,#pMode = 0
,#pLunchMode = 3
,#pShowdetail = 0
,#pGetAll = 1
set #counter = #counter + 1
end
One way to avoid this kind of iteration is to analyze the code within the stored procedure and revised so that, rather than processing for one set of inputs at a time, it processes for all sets of inputs at a time. Often enough, this is not possible, which is why iteration loops are not all that uncommon.
A possible alternative is to use APPLY functionality (cross apply, outer apply). To do this, you'd rewrite the procedure as one of the table-type functions, and work that function into the query something like so:
INSERT #tt_hrs
select [columnList]
from #tt_employees
cross apply dbo.PCT_GetEmpTimeFunc(emp_num, #d_start_dt, #d_end_dt, 0, 3, 0, 1)
(It was not clear where all your inputs to the procedure were coming from.)
Note that you still are iterating over calls to the function, but now it's "packed" into one query.
I think you are on the right track.
you can have a temp table with identity column
CREATE TABLE #A (ID INT IDENTITY(1,1) NOT NULL, Name VARCHAR(50))
After records are inserted in to this temp table, find the total number of records in the table.
DECLARE #TableLength INTEGER
SELECT #TableLength = MAX(ID) FROM #A
DECLARE #Index INT
SET #Index = 1
WHILE (#Index <=#TableLength)
BEGIN
-- DO your work here
SET #Index = #Index + 1
END
Similar to what you have already proposed.
Alternative to iterate over records is to use CURSOR. CURSORS should be avoided at any cost.