SQL Count Unique Deleted Sets - sql

I currently have a simple table in my database that stores sets and values. I want to be able to delete all entries in the database and return the number of distinct sets that were deleted.
create table sets(
SetId varchar(50)
Value int
)
If I have two sets each with two values, then the table will be loaded for four entries.
Set1, 0
Set1, 1
Set2, 0
Set2, 1
If I delete everything I want to be able to count how many unique SetIds were deleted, so in the example above it should return 2.
Right now I can accomplish this by creating a tempTable that contains the deleted SetIds and then I count distinct
CREATE TABLE #temp
(
SetId varchar(50)
);
delete from Sets
OUTPUT DELETED.SetId INTO #temp
select count(distinct SetId) from #temp;
Is there a better way to accomplish this without having to use a temp table?

If you have many rows, and want to avoid temp table (lot of IO) :
declare #cnt int;
set xact_abort on
begin transaction
begin try
select #cnt = count(distinct SetId) from sets;
delete from sets;
commit transaction
end try
begin catch
rollback;
end catch
or :
declare #cnt int;
set xact_abort on
begin transaction
begin try
select #cnt = count(distinct SetId) from sets;
truncate table sets
commit transaction
end try
begin catch
rollback;
end catch

Related

Best way to lock SQL Server database

I'm writing a C++ application that is connecting to a SQL Server database via ODBC.
I need an Archive function so I'm going to write a stored procedure that takes a date. It will total up all the transactions and payments prior to that date for each customer, update the customer's starting balance accordingly, and then delete all transactions and payments prior to that date.
It occurs to me that it could be very bad if someone else is adding or deleting transactions or payments at the same time this stored procedure runs. Therefore, I'm thinking I should lock the entire database during execution, which would not happen that often.
I'm curious if my logic is good and what would be the best way to lock the entire database for such a purpose.
UPDATE:
Based on user12069178's answer, here's what I've come up with so far. Would appreciate any feedback on it.
ALTER PROCEDURE [dbo].[ArchiveData] #ArchiveDateTime DATETIME
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE #TempTable TABLE
(
CustomerId INT,
Amount BIGINT
);
BEGIN TRANSACTION;
-- Archive transactions
DELETE Transactions WITH (TABLOCK)
OUTPUT deleted.CustomerId, deleted.TotalAmount INTO #TempTable
WHERE [TimeStamp] < #ArchiveDateTime;
IF EXISTS (SELECT 1 FROM #TempTable)
BEGIN
UPDATE Customers SET StartingBalance = StartingBalance +
(SELECT SUM(Amount) FROM #TempTable temp WHERE Id = temp.CustomerId)
END;
DELETE FROM #TempTable
-- Archive payments
DELETE Payments WITH (TABLOCK)
OUTPUT deleted.CustomerId, deleted.Amount INTO #TempTable
WHERE [Date] < #ArchiveDateTime;
IF EXISTS (SELECT 1 FROM #TempTable)
BEGIN
UPDATE Customers SET StartingBalance = StartingBalance -
(SELECT SUM(Amount) FROM #TempTable temp WHERE Id = temp.CustomerId)
END;
COMMIT TRANSACTION;
END
Generally the way to make sure that the rows you are deleting are the ones that you are totalling and inserting is to use the OUTPUT clause while deleting. It can output the rows that were selected for deletion.
Here's a setup that will give us some transactions:
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.Transactions;
GO
CREATE TABLE dbo.Transactions
(
TransactionID int NOT NULL IDENTITY(1,1)
CONSTRAINT PK_dbo_Transactions
PRIMARY KEY,
TransactionAmount decimal(18,2) NOT NULL,
TransactionDate date NOT NULL
);
GO
SET NOCOUNT ON;
DECLARE #Counter int = 1;
WHILE #Counter <= 50
BEGIN
INSERT dbo.Transactions
(
TransactionAmount, TransactionDate
)
VALUES (ABS(CHECKSUM(NewId())) % 10 + 1, DATEADD(day, 0 - #Counter * 3, GETDATE()));
SET #Counter += 1;
END;
SELECT * FROM dbo.Transactions;
GO
Now the following code deletes the rows past a cutoff, and concurrently outputs the amounts into a table variable, and then inserts the total row into the transactions table.
DECLARE #CutoffDate date = DATEADD(day, 1, EOMONTH(DATEADD(month, -2, GETDATE())));
DECLARE #TransactionAmounts TABLE
(
TransactionAmount decimal(18,2)
);
BEGIN TRAN;
DELETE dbo.Transactions WITH (TABLOCK)
OUTPUT deleted.TransactionAmount INTO #TransactionAmounts
WHERE TransactionDate < #CutoffDate;
IF EXISTS (SELECT 1 FROM #TransactionAmounts)
BEGIN
INSERT dbo.Transactions (TransactionAmount, TransactionDate)
SELECT SUM(TransactionAmount), DATEADD(day, 1, #CutoffDate)
FROM #TransactionAmounts;
END;
COMMIT;
I usually try to avoid specifying locks whenever possible but based on your suggestion, I've added it. If you didn't have the table lock, it'd still be ok but would mean that even if someone adds in a new "old" row while you're doing this, it won't be in the total or deleted either. Making the transaction serializable would also achieve the outcome and would lock less than the table lock if the number of rows being deleted was less than the lock escalation threshold (defaults to 5000).
Hope that helps.

Catching multiple errors in loop SQL query

I have the below insert query which selects records from the OriginalData table where everything is of datatype nvarchar(max) and inserts it into the temp table which has specific column definitions i.e MainAccount is of type INT.
I am doing a row by row insert because if there is a record in OriginalData table where the MainAccount value is 'Test' the it will obviously cause a conversion error and the insert will fail. The begin try block is used to update the table with the error.
However if there are multiple errors on the same row I want to be able to capture them both and not just the first one.
TRUNCATE TABLE [Temp]
DECLARE #RowId INT, #MaxRowId INT
SET #RowId = 1
SELECT #MaxRowId = MAX(RowId)
FROM [Staging].[FactFinancialsCoded_Abbas_InitialValidationTest]
WHILE(#RowId <= #MaxRowId)
BEGIN
BEGIN TRY
INSERT INTO [Temp] (ExtractSource, MainAccount,
RecordLevel1Code, RecordLevel2Code, RecordTypeNo,
TransDate, Amount, PeriodCode, CompanyCode)
SELECT
ExtractSource, MainAccount,
RecordLevel1Code, RecordLevel2Code, RecordTypeNo,
TransDate, Amount, PeriodCode, DataAreaId
FROM
[Staging].[FactFinancialsCoded_Abbas_InitialValidationTest]
WHERE
RowId = #RowId;
PRINT #RowId;
END TRY
BEGIN CATCH
Update [Staging].[FactFinancialsCoded_Abbas_InitialValidationTest]
Set ValidationErrors = ERROR_MESSAGE()
where RowId = #RowId
END CATCH
SET #RowId += 1;
END
Instead of doing it this way, I handle this by using TRY_PARSE() or TRY_CONVERT() on each column that I am converting to a non-string column.
If you then need to store the validation failures in another table, you can make a second pass getting all the rows that have a non-null value in the source table and a null value in the destination table, and insert those rows into your "failed validation" table.

Batch deletion correctly formatted?

I have multiple tables with millions of rows in them. To be safe and not overflow the transaction log, I am deleting them in batches of 100,000 rows at a time. I have to first filter out based on date, and then delete all rows less than a certain date.
To do this I am creating a table in my stored procedure which holds the ID's of the rows that need to be deleted:
I then insert into that table and delete the rows from the desired table using loops. This seems to run successfully but it is extremely slow. Is this being done correctly? Is this the fastest way to do it?
DECLARE #FILL_ID_TABLE TABLE (
FILL_ID varchar(16)
)
DECLARE #TODAYS_DATE date
SELECT
#TODAYS_DATE = GETDATE()
--This deletes all data older than 2 weeks ago from today
DECLARE #_DATE date
SET #_DATE = DATEADD(WEEK, -2, #TODAYS_DATE)
DECLARE #BatchSize int
SELECT
#BatchSize = 100000
BEGIN TRAN FUTURE_TRAN
BEGIN TRY
INSERT INTO #FILL_ID_TABLE
SELECT DISTINCT
ID
FROM dbo.ID_TABLE
WHERE CREATED < #_DATE
SELECT
#BatchSize = 100000
WHILE #BatchSize <> 0
BEGIN
DELETE TOP (#BatchSize) FROM TABLE1
OUTPUT DELETED.* INTO dbo.TABLE1_ARCHIVE
WHERE ID IN (SELECT
ROLLUP_ID
FROM #FILL_ID_TABLE)
SET #BatchSize = ##rowcount
END
SELECT
#BatchSize = 100000
WHILE #BatchSize <> 0
BEGIN
DELETE TOP (#BatchSize) FROM TABLE2
OUTPUT DELETED.* INTO dbo.TABLE2_ARCHIVE
WHERE ID IN (SELECT
FILL_ID
FROM #FILL_ID_TABLE)
SET #BatchSize = ##rowcount
END
PRINT 'Succeed'
COMMIT TRANSACTION FUTURE_TRAN
END TRY
BEGIN CATCH
PRINT 'Failed'
ROLLBACK TRANSACTION FUTURE_TRAN
END CATCH
Try join instead of subquery
DELETE TOP (#BatchSize) T1
OUTPUT DELETED.* INTO dbo.TABLE1_ARCHIVE
FROM TABLE1 AS T1
JOIN #FILL_ID_TABLE AS FIL ON FIL.ROLLUP_ID = T1.Id

Insertion of records based on some condition

I'm trying to insert few records from the temporary table using a SQL Server stored procedure. There is a percentage column in the temporary table and a PQ number column. In a table there may exists more than 1 row with the same PQ number. But for insertion to happen the sum of percentage for the same PQ number should be 100%. I couldn't write the where clause for this situation.
CREATE PROCEDURE [dbo].[Upsert_DebitSheet]
#filename VARCHAR(250)
AS
BEGIN
SET XACT_ABORT ON
RETRY: -- Label RETRY
BEGIN TRANSACTION
BEGIN TRY
SET NOCOUNT ON;
INSERT INTO [dbo].[DebitSheet]([Date], [RMMName], [Invoice],[PQNumber], [CAF],
[Percentage], [Amount], [FileName])
SELECT
*, #filename
FROM
(SELECT
[Date], [RMMName], [Invoice], [PQNumber], [CAF],
[Percentage], [Amount]
FROM
[dbo].[TempDebitSheet]
WHERE) result
SELECT ##ROWCOUNT
TRUNCATE TABLE [dbo].[TempDebitSheet]
COMMIT TRANSACTION
END TRY
BEGIN CATCH
PRINT ERROR_MESSAGE()
ROLLBACK TRANSACTION
IF ERROR_NUMBER() = 1205 -- Deadlock Error Number
BEGIN
WAITFOR DELAY '00:00:00.05' -- Wait for 5 ms
GOTO RETRY -- Go to Label RETRY
END
END CATCH
SET ROWCOUNT 0;
END
Temporary Table
MainTable(Expected Result)
You can use subquery in the WHERE
INSERT INTO [dbo].[DebitSheet]
([Date]
,[RMMName]
,[Invoice]
,[PQNumber]
,[CAF]
,[Percentage]
,[Amount]
,[FileName])
SELECT [Date]
,[RMMName]
,[Invoice]
,[PQNumber]
,[CAF]
,[Percentage]
,[Amount]
FROM [dbo].[TempDebitSheet]
WHERE EXISTS (
SELECT tmp.[PQNumber]
FROM [dbo].[TempDebitSheet] tmp
WHERE tmp.[PQNumber] = [TempDebitSheet].[PQNumber]
GROUP BY tmp.[PQNumber]
HAVING SUM(tmp.[Percentage]) = 100
)
Modify your query like this
Insert into ...
Select result.*, #filename from (....) result

Update multiple rows in table from table variable

I'm writing a stored procedure to update multiple records based on a table variable parameter.
The existing table is: Tb_Project_Image with relevant columns:
id PK (identity 1,1)
cat_ord decimal(4,2)
The procedure will receive a temporary table variable (shown in the code below) containing the id as PI_ID, and the new value for cat_ord as newCatOrd. idx is a simple identity for each row containing 1...n where n is the rowcount of #tempTable.
For each row in #tempTable, I want to update Tb_Project_Image where id = PI_ID to the corresponding value.
DECLARE #tempTable table (
idx smallint Primary Key IDENTITY(1,1),
PI_ID bigint,
newCatOrd decimal(4, 2) not null )
INSERT INTO #tempTable values (3, 7.01)
INSERT INTO #tempTable values (4, 7.02)
INSERT INTO #tempTable values (5, 7.03)
--etc...
DECLARE #error int
DECLARE #update int
DECLARE #iter int
SET #iter = 1
BEGIN TRAN
WHILE #iter <= (select COUNT(*) from #tempTable)
BEGIN
UPDATE Tb_Project_Image
SET cat_ord = (SELECT newCatOrd FROM #tempTable
WHERE idx = #iter)
WHERE id = (SELECT PI_ID FROM #tempTable
WHERE idx = #iter)
--error checking
set #error = ##ERROR
set #update = ##ROWCOUNT
IF ((#error = 0) AND (#update = 1))
BEGIN
SET #iter = #iter + 1
CONTINUE
END
ELSE
BREAK
END
IF ((#error = 0) AND (#update = 1))
COMMIT TRAN
ELSE
ROLLBACK TRAN
GO
Now, the error checking is because, to ensure integrity, EACH row in the temporary table MUST make 1 update. (explanation omitted to save space) If a single iteration of the while loop threw an error, or didn't effect exactly 1 row, I want to break the loop and rollback the transaction
THE PROBLEM I'm having is that this error checking is not working. I'm currently running it with 14 rows in #tempTable and the 11th uses a PI_ID not found in the Project_Image table. Therefore, #update = 0... but it continues the loop and commits the data.
I'd be doubly glad if someone had a method of doing this that only used a single update statement.
You cannot do it this way, because even SET resets the state of ##ERROR and ##ROWNUMBER variables. In this case ##ROWCOUNT is set to 1 after set #error = ##ERROR. If you do not assign the values to local variables, your code will work:
IF ((##error = 0) AND (##rowcount = 1))
But you might rather try try...catch error handling and test ##rowcount separately after update.
UPDATE: doing it in single update:
UPDATE t
SET cat_ord = tt.newCatOrd
FROM Tb_Project_Image t
INNER JOIN #tempTable tt
ON t.id = tt.PI_ID
-- If there was PI_ID not found in Tb_Project_Image
-- But I think that this should have been dealt with
-- During the initial loading of temporary table
IF ##ROWCOUNT <> (select count (*) from #tempTable)
BEGIN
-- Error reporting here
ROLLBACK TRANSACTION
END
Instead of updating and then rolling back, you could also use a CTE to determine if any records should be updated prior to performing the update. Something like this should work:
WITH NON_SINGLETON AS (
-- Find any records in #tempTable that don't match
-- exactly one record in Tb_Project_Image
SELECT t.PI_ID, COUNT(pi.id) C
FROM #tempTable t
LEFT JOIN Tb_Project_Image pi ON t.PI_ID = pi.id
GROUP BY t.PI_ID
HAVING COUNT(pi.id) != 1
)
UPDATE Tb_Project_Image
SET cat_ord = t.newCatOrd
FROM Tb_Project_Image pi
JOIN #tempTable t ON pi.id = t.PI_ID
-- If any invalid records were found in the CTE,
-- then this condition will fail for all rows
-- and nothing will be updated
WHERE NOT EXISTS(SELECT 1 FROM NON_SINGLETON)
If it's possible for #tempTable to have duplicate entries for the same PI_ID, then this will handle those scenarios as well. And since it's a single statement, you don't have to explicitly managing the transaction in the proc (if it's the only thing that needs to be included in the transaction).