Automatic Deletion of Duplicate SQL Records based on date or count

Automatic Deletion of Duplicate SQL Records based on date or count - sql

After importing Excel data from a LightSwitch application into a holding table in SQL Server I end up with duplicate records. I need a way to remove the duplicates that can either be executed from LightSwitch or something that will automatically run in SQL after/during insert. I thought about a trigger, but I'm not sure it's the best solution.
The duplicates will be something like this
DocName|DocUser|DocType|DocDate|
test user1 word 10/12/2012
test user1 word 10/12/2012
test2 user2 word 10/11/2012
test2 user2 word 10/12/2012
In the case of the first set of duplicates either record can be deleted so I have one record.
However in the second case the record with the date of 10/11/2012 would need to be deleted.
I'm not apposed to a Stored Procedure if it can be executed by from LightSwitch. I know it can be done with a series of queries but I'm not sure how that could be executed from LightSwitch?

I have no experience with Lightswitch so apologies if any of this is not relevant, but speaking from the SQL side a stored procedure you could use to delete the duplicates is:
CREATE PROCEDURE dbo.DeleteDuplicatesFromT
AS
BEGIN
WITH CTE AS
( SELECT DocName,
DocUser,
DocType,
DocDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY DocName, DocUser, DocType ORDER BY DocDate DESC)
FROM T
)
DELETE CTE
WHERE RowNumber > 1;
END
Example on SQLFiddle
HOWEVER
I'd advise managing this before/during the insert stage, you can either do this in the application code, and ensure only unique records are passed to the table to begin with, or use a procedure again to perform the insert. To do the latter you will first need to create a TYPE to handle your new records:
CREATE TYPE dbo.TableTypeParameter AS TABLE
( DocName VARCHAR(5),
DocUser VARCHAR(5),
DocType VARCHAR(4),
DocDate DATETIME
);
You can fill this in your client code (using System.Data.DataTable) and pass this as a parameter to your stored procedure:
CREATE PROCEDURE dbo.BulkInsert #NewRecords dbo.TableTypeParameter READONLY
AS
BEGIN
WITH NewRecords AS
( SELECT DocName, DocType, DocUser, DocDate = MAX(DocDate)
FROM #NewRecords
GROUP BY DocName, DocType, DocUser
)
MERGE INTO T
USING NewRecords nr
ON T.DocName = nr.DocName
AND T.DocType = nr.DocType
AND T.DocUser = nr.DocUser
WHEN MATCHED AND nr.DocDate > T.DocDate THEN UPDATE
SET DocDate = nr.DocDate
WHEN NOT MATCHED THEN INSERT (DocName, DocUser, DocType, DocDate)
VALUES (nr.DocName, nr.DocUser, nr.DocType, nr.DocDate);
END;
EDIT
The procedure to insert can fairly easily be turned into a trigger if this is what is required:
CREATE TRIGGER dbo.T_InsteadOfInsert
ON T
INSTEAD OF INSERT
AS
BEGIN
WITH NewRecords AS
( SELECT DocName, DocType, DocUser, DocDate = MAX(DocDate)
FROM inserted
GROUP BY DocName, DocType, DocUser
)
MERGE INTO T
USING NewRecords nr
ON T.DocName = nr.DocName
AND T.DocType = nr.DocType
AND T.DocUser = nr.DocUser
WHEN MATCHED AND nr.DocDate > T.DocDate THEN UPDATE
SET DocDate = nr.DocDate
WHEN NOT MATCHED THEN INSERT (DocName, DocUser, DocType, DocDate)
VALUES (nr.DocName, nr.DocUser, nr.DocType, nr.DocDate);
END

A user on the MSDN forums had a similar problem using the Excel Importer. He wrote a few functions to check and validate his tables to prevent entry of and remove existing duplicates. Take a look: Automatic Validation Error Handling.

Related

How can I do what the trigger does with the store procedure?

I have a trigger on my tbl_permissions table.
Trigger is;
ALTER TRIGGER [dbo].[trig_permissions] ON [dbo].[tbl_permissions]
FOR UPDATE
AS
IF(UPDATE(email) OR UPDATE(gsm))
BEGIN
INSERT INTO dbo.tbl_permissions_log
(customer_id,type,email_new_value,email_old_value,gsm_new_value,gsm_oldu_value,modify_user_id,modify_date)
SELECT
i.customer_id,
d.type,
i.email,
d.email,
i.gsm,
d.gsm,
i.modify_user_id,
GETDATE()
FROM inserted i, deleted d ,dbo.tbl_permissions c
WHERE c.pk_id = i.pk_id AND c.pk_id = d.pk_id AND
(RTRIM(d.email) <> RTRIM(i.email)
OR (RTRIM(d.gsm) <> RTRIM(i.gsm)))
The trigger is triggering in 2 store procedure.
First One;
ALTER PROC [dbo].[sp_activation_email_update]
(
#sEmail NVARCHAR(100),
#lModifyUserId INT,
#bEML BIT
)
AS
BEGIN
UPDATE dbo.tbl_permissions
SET email=#bEML,
modify_user_id=#lModifyUserId,
modify_date=GETDATE()
WHERE customer_id
IN(SELECT customer_id FROM dbo.tbl_contact_info WHERE email=#sEmail)
END
Second;
ALTER PROC [dbo].[sp_activation_sms_update]
(
#sGsmNo NVARCHAR(15),
#lModifyUserId INT,
#bGsm BIT
)
AS
BEGIN
UPDATE dbo.tbl_permissions
SET gsm=#bGsm,
modify_user_id=#lModifyUserId,
modify_date=GETDATE()
WHERE customer_id
IN(SELECT customer_id FROM dbo.tbl_contact_info WHERE gsm_no=RIGHT(#sGsmNo, 10))
END
I want to remove trigger because of performance problem. So How can I perform the work of the trigger in the store procedure?
I tried the call another store procedure inside update store procedures and perform operations in the this new store procedure but I cant.

In your stored procedures, you can use the OUTPUT clause to capture data from the UPDATE statement, and then use that captured data to insert rows into the log table.
Something like the following:
DECLARE #Updated TABLE (pk_id int, newEmail, oldEmail, modify_user_id int)
UPDATE dbo.tbl_permissions
SET email=#bEML,
modify_user_id=#lModifyUserId,
modify_date=GETDATE()
OUTPUT inserted.pk_id, inserted.email, deleted.email, u.modify_user_id
INTO #Updated
WHERE customer_id
IN(SELECT customer_id FROM dbo.tbl_contact_info WHERE email=#sEmail)
INSERT INTO dbo.tbl_permissions_log
(customer_id,type,email_new_value,email_old_value,gsm_new_value,gsm_oldu_value,modify_user_id,modify_date)
SELECT
p.customer_id,
p.type,
u.newEmail,
u.oldEmail,
p.gsm,
p.gsm,
u.modify_user_id,
GETDATE()
FROM #Updated u
JOIN dbo.tbl_permissions p ON p.pk_id = u.pk_id
WHERE u.newEmail <> u.oldEmail
You can also direct output directly into the log table, but the statement is getting pretty cluttered at that point.
UPDATE dbo.tbl_permissions
SET email=#bEML,
modify_user_id=#lModifyUserId,
modify_date=GETDATE()
OUTPUT
inserted.customer_id,
inserted.type,
inserted.email,
deleted.email,
inserted.gsm,
deleted.gsm,
inserted.modify_user_id,
GETDATE()
INTO dbo.tbl_permissions_log
(customer_id,type,email_new_value,email_old_value,gsm_new_value,gsm_oldu_value,modify_user_id,modify_date)
WHERE customer_id
IN(SELECT customer_id FROM dbo.tbl_contact_info WHERE email=#sEmail)
AND email <> #bEML
In both cases we limit the log insert to cases where the updated value actually changed. In the latter case, this necessitates applying that condition to the actual update statement. This would modify the original behavior by also inhibiting the update to modify_user_id and modify_date when email is unchanged. (This might be a positive change.)

Conditionally insert a row if it does not exist already

[Note: I found a few answers for this, such as 9911659 and 16636698, but they were not quite clear enough on the syntax.]
I want to insert a row into a table, but only if that row does not already exist. The column values for the inserted row come from variables (procedure arguments) and not from another table, so I won't be using merge.
I do not want to use a separate if exists followed by an insert, but rather I'd like to accomplish this in a single (insert) statement.
I have #bookID, #userID, #reviewDate, and #reviewYear as arguments to my proc, which I want to insert into the new row into table my_table.
So I have this:
insert into my_table
(bookID, reviewYear, userID, reviewDate)
select
#bookID, #reviewYear, #userID, #reviewDate -- Proc arguments, values for new row
from my_table
where not exists (
select bookID -- Find existing matching row
from my_table
where bookID = #bookID
and reviewYear = #reviewYear
)
In other words, the insert adds a new row only if there is not already an existing row with the same bookID and reviewYear. So a given user can add a review for a given book for a given year, but only if no user has already done so.
Have I got this correct, or is there a simpler syntax to accomplish the same thing (in a single statement)?
Addendum (2020-Jan-10)
As pointed out, the select will choose multiple rows, and the whole insert statement will end up inserting many rows, potentially as many rows as there are currently in my_table.
A second attempt, which adds a distinct clause to the select:
insert into my_table
(bookID, reviewYear, userID, reviewDate)
select distinct -- Only one possible match
#bookID, #reviewYear, #userID, #reviewDate -- Proc arguments, values for new row
from my_table
where not exists (
select bookID -- Find existing matching row
from my_table
where bookID = #bookID
and reviewYear = #reviewYear
)

I would recommend catching an error instead:
create unique index unq_my_table on my_table(bookID, reviewYear)
begin try
insert into my_table (bookID, reviewYear, userID, reviewDate)
values ( #bookID, #reviewYear, #userID, #reviewDate ) -- Proc arguments, values for new row
end try
begin catch
-- do something here if you want
end catch;
Your code does not work because you are selecting from the table. You will get as many inserts as in the table -- and you are likely to insert duplicates.
To prevent duplication, let the database ensure uniqueness. This is one of the things they can guarantee. And a unique index/constraint does this.

Edited:
Based on the clearer description above (and possibly on my coffee), I should note that you CAN use MERGE with just variables. Using your parameters above, here is one method of doing that:
WITH Src as (
SELECT
#bookID AS bookID,
#reviewYear AS reviewYear,
#userID AS userID,
#reviewDate AS reviewDate
)
MERGE my_table AS TARGET
USING Src AS SOURCE
ON TARGET.bookID = SOURCE.bookID
AND TARGET.reviewYear = SOURCE.reviewYear
WHEN NOT MATCHED [BY TARGET]
THEN INSERT (bookID, reviewYear, userID, reviewDate)
VALUES (SOURCE.bookID, SOURCE.reviewYear, SOURCE.userID, SOURCE.reviewDate)
Original Answer:
Actually, I ran this code as posted, and it did not correctly enter the data into the table. Your basic problem here is your SELECT ... FROM my_table. This will attempt to insert as many rows into your table as the table contains. So, if the table is empty, no rows will be inserted, but if it has 20 rows, another 20 rows will be inserted.
Here is a correct method to do this. It uses your basic logic, but takes the conditional check out of the INSERT statement.
CREATE TABLE #my_table (BookID int, ReviewYear int, UserId int, ReviewDate date)
DECLARE #BookID int = 1,
#ReviewYear int = 1999,
#UserId Int = 111,
#ReviewDate date = '2019-09-11'
IF NOT EXISTS (
select \* -- Find existing matching row
from #my_table
where bookID = #bookID
and reviewYear = #reviewYear
)
insert into #my_table
(bookID, reviewYear, userID, reviewDate)
VALUES
(#bookID, #reviewYear, #userID, #reviewDate) -- Proc arguments, values for new row
SELECT \*
FROM #my_table

Invalid column name error in SQL Server after adding new column

First the solution is working perfect, and after as per our Project Manager requirement I have added two column in table. After that one insert,update store procedure is not working it show "Invalid column name" (it mention newly inserted two column name). I think some details is stored in temporary but I don't know how to find and solve it.
I tried something like this:
Removed all constrain and tried to run the store procedure, but no use
Just removed the newly added two column, it is working perfect.
Tried to add the column through an Alter query
My stored procedure is
ALTER PROCEDURE [Page].[SP_INSERT_EXPERIENCEDETAILS]
(#EXPERIENCEDETAILS [PAGE].[EXPERIENCEDETAILS] READONLY)
AS --drop PROCEDURE [Page].[SP_INSERT_EXPERIENCEDETAILS]
BEGIN
DECLARE #TEMPTABLE AS TABLE
(
ID INT,
[ACTION] VARCHAR(50)
)
MERGE INTO [PAGE].[EXPERIENCEDETAILS] AS TARGET
USING (SELECT
ID, Description, ISCurrent, COMPANYID,
Designationid, locationid, FROMDAY, FromMonth, FromYear,
TODAY, TOMONTH, Toyear
FROM
#EXPERIENCEDETAILS) AS SOURCE ON TARGET.ID = SOURCE.ID
WHEN MATCHED THEN
UPDATE
SET TARGET.[DESCRIPTION] = SOURCE.[DESCRIPTION],
TARGET.ISCURRENT = SOURCE.ISCURRENT,
TARGET.COMPANYID = SOURCE.COMPANYID,
TARGET.DESIGNATIONID = SOURCE.DESIGNATIONID,
TARGET.LOCATIONID = SOURCE.LOCATIONID,
TARGET.FROMDAY = SOURCE.FROMDAY,
TARGET.FROMMONTH = SOURCE.FROMMONTH,
TARGET.FROMYEAR = SOURCE.FROMYEAR,
TARGET.TODAY = SOURCE.TODAY,
TARGET.TOMONTH = SOURCE.TOMONTH,
TARGET.TOYEAR = SOURCE.TOYEAR
WHEN NOT MATCHED THEN
INSERT
VALUES (SOURCE.MEMBERID, SOURCE.PAGEID, SOURCE.COMPANYID,
SOURCE.DESIGNATIONID, SOURCE.LOCATIONID,
SOURCE.FROMDAY, SOURCE.FROMMONTH, SOURCE.FROMYEAR,
SOURCE.TODAY, SOURCE.TOMONTH, SOURCE.TOYEAR,
SOURCE.[DESCRIPTION], SOURCE.[ISCURRENT],
SOURCE.ENTRYDATE)
OUTPUT INSERTED.ID, $ACTION INTO #TEMPTABLE;
SELECT ID FROM #TEMPTABLE
END
Error shown in the following lines
TARGET.FROMDAY= SOURCE.FROMDAY
TARGET.TODAY=SOURCE.TODAY
SOURCE.FROMDAY
SOURCE.TODAY

You should also add those columns in table type [PAGE].[EXPERIENCEDETAILS] that used in your SP as TVP type.

Cannot add an ORDER BY command to a stored procedure

I'm converting a stored procedure in some software I'm maintaining from SQL Server SQL to Informix SQL, and problems are abundant.
Basically I'm converting each section line-by-line until I have the whole thing converted.
I have the following CREATE PROCEDURE:
CREATE PROCEDURE ifxdbase:dc_buildSP (WorkID INT, CompNo smallint)
CREATE TEMP TABLE Items
(
Code smallint,
Qty int,
Total int
);
INSERT INTO Items
SELECT
tblDetails.code,
tblDetails.quantity,
tblHead.quantity
FROM
tblHead
INNER JOIN tblDetails ON (tblDetails.compno = tblDetails.compno AND tblDetails.id_num = tblHead.id_num)
WHERE tblHead.compno = CompNo AND tblHead.id_num = WorkID;
--ORDER BY tblDetails.code;
DROP TABLE Items;
END PROCEDURE
As it stands, this works fine, but when I uncomment the line --ORDER BY tblDetails.seqno; (and remove the semicolon from the previous line) I get a "-201 A syntax error has occurred" error.
Basically tblHead is a series of order headers and tblDetails is a table of the details of each of those orders. Selecting and joining the data works fine, trying to order it fails.
Ordering should work with anything from the original SELECT, IIRC, so I can't see what could be going wrong, here...

As stated here:
..... not all clauses and options of
the SELECT statement are available for
you to use in a query within an
INSERT statement. The following SELECT
clauses and options are not supported
by Informix in an INSERT statement:
FIRST and INTO TEMP
ORDER BY and UNION
so ORDER BY is not supported in the INSERT command in Informix.
I don't have something to test right now, but you could try something like this, as a workaround:
INSERT INTO Items
SELECT code, dQuantity, hQuantity
FROM (
SELECT
tblDetails.code,
tblDetails.quantity dQuantity,
tblHead.quantity hQuantity
FROM
tblHead
INNER JOIN tblDetails ON (tblDetails.compno = tblDetails.compno AND tblDetails.id_num = tblHead.id_num)
WHERE tblHead.compno = CompNo AND tblHead.id_num = WorkID;
ORDER BY tblDetails.code
);

Weird trigger problem when I do an INSERT into a table

I've got a trigger attached to a table.
ALTER TRIGGER [dbo].[UpdateUniqueSubjectAfterInsertUpdate]
ON [dbo].[Contents]
AFTER INSERT,UPDATE
AS
BEGIN
-- Grab the Id of the row just inserted/updated
DECLARE #Id INT
SELECT #Id = Id
FROM INSERTED
END
Every time a new entry is inserted or modified, I wish to update a single field (in this table). For the sake of this question, imagine i'm updating a LastModifiedOn (datetime) field.
Ok, so what i've got is a batch insert thingy..
INSERT INTO [dbo].[Contents]
SELECT Id, a, b, c, d, YouDontKnowMe
FROM [dbo].[CrapTable]
Now all the rows are correctly inserted. The LastModifiedOn field defaults to null. So all the entries for this are null -- EXCEPT the first row.
Does this mean that the trigger is NOT called for each row that is inserted into the table, but once AFTER the insert query is finished, ie. ALL the rows are inserted? Which mean, the INSERTED table (in the trigger) has not one, but 'n' number of rows?!
If so .. er.. :( Would that mean i would need a cursor in this trigger? (if i need to do some unique logic to each single row, which i do currently).
?
UPDATE
I'll add the full trigger code, to see if it's possible to do it without a cursor.
BEGIN
SET NOCOUNT ON
DECLARE #ContentId INTEGER,
#ContentTypeId TINYINT,
#UniqueSubject NVARCHAR(200),
#NumberFound INTEGER
-- Grab the Id. Also, convert the subject to a (first pass, untested)
-- unique subject.
-- NOTE: ToUriCleanText just replaces bad uri chars with a ''.
-- eg. an '#' -> ''
SELECT #ContentId = ContentId, #ContentTypeId = ContentTypeId,
#UniqueSubject = [dbo].[ToUriCleanText]([Subject])
FROM INSERTED
-- Find out how many items we have, for these two keys.
SELECT #NumberFound = COUNT(ContentId)
FROM [dbo].[Contents]
WHERE ContentId = #ContentId
AND UniqueSubject = #UniqueSubject
-- If we have at least one identical subject, then we need to make it
-- unique by appending the current found number.
-- Eg. The first instance has no number.
-- Second instance has subject + '1',
-- Third instance has subject + '2', etc...
IF #NumberFound > 0
SET #UniqueSubject = #UniqueSubject + CAST(#NumberFound AS NVARCHAR(10))
-- Now save this change.
UPDATE [dbo].[Contents]
SET UniqueSubject = #UniqueSubject
WHERE ContentId = #ContentId
END

Why not change the trigger to deal with multiple rows?
No cursor or loops needed: it's the whole point of SQL ...
UPDATE
dbo.SomeTable
SET
LastModifiedOn = GETDATE()
WHERE
EXIST (SELECT * FROM INSERTED I WHERE I.[ID] = dbo.SomeTable.[ID]
Edit: Something like...
INSERT #ATableVariable
(ContentId, ContentTypeId, UniqueSubject)
SELECT
ContentId, ContentTypeId, [dbo].[ToUriCleanText]([Subject])
FROM
INSERTED
UPDATE
[dbo].[Contents]
SET
UniqueSubject + CAST(NumberFound AS NVARCHAR(10))
FROM
--Your original COUNT feels wrong and/or trivial
--Do you expect 0, 1 or many rows.
--Edit2: I assume 0 or 1 because of original WHERE so COUNT(*) will suffice
-- .. although, this implies an EXISTS could be used but let's keep it closer to OP post
(
SELECT ContentId, UniqueSubject, COUNT(*) AS NumberFound
FROM #ATableVariable
GROUP BY ContentId, UniqueSubject
HAVING COUNT(*) > 0
) foo
JOIN
[dbo].[Contents] C ON C.ContentId = foo.ContentId AND C.UniqueSubject = foo.UniqueSubject
Edit 2: and again with RANKING
UPDATE
C
SET
UniqueSubject + CAST(foo.Ranking - 1 AS NVARCHAR(10))
FROM
(
SELECT
ContentId, --not needed? UniqueSubject,
ROW_NUMBER() OVER (PARTITION BY ContentId ORDER BY UniqueSubject) AS Ranking
FROM
#ATableVariable
) foo
JOIN
dbo.Contents C ON C.ContentId = foo.ContentId
/* not needed? AND C.UniqueSubject = foo.UniqueSubject */
WHERE
foo.Ranking > 1

The trigger will be run only once for an INSERT INTO query. The INSERTED table will contain multiple rows.

Ok folks, I think I figure it out myself. Inspired by the previous answers and comments, I've done the following. (Can you folks have a quick look over to see if i've over-enginered this baby?)
.1. Created an Index'd View, representing the 'Subject' field, which needs to be cleaned. This is the field that has to be unique .. but before we can make it unique, we need to group by it.
-- Create the view.
CREATE VIEW ContentsCleanSubjectView with SCHEMABINDING AS
SELECT ContentId, ContentTypeId,
[dbo].[ToUriCleanText]([Subject]) AS CleanedSubject
FROM [dbo].[Contents]
GO
-- Index the view with three index's. Custered PK and a non-clustered,
-- which is where most of the joins will be done against.
-- Last one is because the execution plan reakons i was missing statistics
-- against one of the fields, so i added that index and the stats got gen'd.
CREATE UNIQUE CLUSTERED INDEX PK_ContentsCleanSubjectView ON
ContentsCleanSubjectView(ContentId)
CREATE NONCLUSTERED INDEX IX_BlahBlahSnipSnip_A ON
ContentsCleanSubjectView(ContentTypeId, CleanedSubject)
CREATE INDEX IX_BlahBlahSnipSnip_B ON
ContentsCleanSubjectView(CleanedSubject)
.2. Create the trigger code which now
a) grabs all the items 'changed' (nothing new/hard about that)
b) orders all the inserted rows, row numbered with partitioning by a clean subject
c) update the single row we're upto in the main update clause.
here's the code...
ALTER TRIGGER [dbo].[UpdateUniqueSubjectAfterInsertUpdate]
ON [dbo].[Contents]
AFTER INSERT,UPDATE
AS
BEGIN
SET NOCOUNT ON
DECLARE #InsertRows TABLE (ContentId INTEGER PRIMARY KEY,
ContentTypeId TINYINT,
CleanedSubject NVARCHAR(300))
DECLARE #UniqueSubjectRows TABLE (ContentId INTEGER PRIMARY KEY,
UniqueSubject NVARCHAR(350))
DECLARE #UniqueSubjectRows TABLE (ContentId INTEGER PRIMARY KEY,
UniqueSubject NVARCHAR(350))
-- Grab all the records that have been updated/inserted.
INSERT INTO #InsertRows(ContentId, ContentTypeId, CleanedSubject)
SELECT ContentId, ContentTypeId, [dbo].[ToUriCleanText]([Subject])
FROM INSERTED
-- Determine the correct unique subject by using ROW_NUMBER partitioning.
INSERT INTO #UniqueSubjectRows
SELECT SubResult.ContentId, UniqueSubject = CASE SubResult.RowNumber
WHEN 1 THEN SubResult.CleanedSubject
ELSE SubResult.CleanedSubject + CAST(SubResult.RowNumber - 1 AS NVARCHAR(5)) END
FROM (
-- Order all the cleaned subjects, partitioned by the cleaned subject.
SELECT a.ContentId, a.CleanedSubject, ROW_NUMBER() OVER (PARTITION BY a.CleanedSubject ORDER BY a.ContentId) AS RowNumber
FROM ContentsCleanSubjectView a
INNER JOIN #InsertRows b ON a.ContentTypeId = b.ContentTypeId AND a.CleanedSubject = b.CleanedSubject
GROUP BY a.contentId, a.cleanedSubject
) SubResult
INNER JOIN [dbo].[Contents] c ON c.ContentId = SubResult.ContentId
INNER JOIN #InsertRows d ON c.ContentId = d.ContentId
-- Now update all the effected rows.
UPDATE a
SET a.UniqueSubject = b.UniqueSubject
FROM [dbo].[Contents] a INNER JOIN #UniqueSubjectRows b ON a.ContentId = b.ContentId
END
Now, the subquery correctly returns all the cleaned subjects, partitioned correctly and numbered correctly. I never new about the 'PARTITION' command, so that trick was the big answer here :)
Then i just join'd the subquery with the row that is being updated in the parent query. The row number is correct, so now i just do a case. if this is the first time the cleaned subject exists (eg. row_number = 1), don't modify it. otherwise, append the row_number minus one. This means the 2nd instance of the same subject, the unique subject will be => cleansubject + '1'.
The reason why i believe i need to have an index'd view is because if i have two very similar subjects, that when you have stripped out (ie. cleaned) all the bad chars (which i've determined are bad) .. it's possible that the two clean subjects are the same. As such, I need to do all my joins on a cleanedSubject, instead of a subject. Now, for the massive amount of rows I have, this is crap for performance when i don't have the view. :)
So .. is this over engineered?
Edit 1:
Refactored trigger code so it's waay more performant.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Automatic Deletion of Duplicate SQL Records based on date or count - sql

A user on the MSDN forums had a similar problem using the Excel Importer. He wrote a few functions to check and validate his tables to prevent entry of and remove existing duplicates. Take a look: Automatic Validation Error Handling.

Related

How can I do what the trigger does with the store procedure?

Conditionally insert a row if it does not exist already

Invalid column name error in SQL Server after adding new column

Cannot add an ORDER BY command to a stored procedure

Weird trigger problem when I do an INSERT into a table

Categories

Resources