Avoid key clash on SQL Merge Update statement

Avoid key clash on SQL Merge Update statement - sql

I have a table with two columns, Prime_GuestID and Dup_GuestID to indicate the links between an GuestID's and the ID(s) it is replacing (duplicate records)
Now I want to go through a number of other relationship tables of the format and update any occurrences of a Dup_GuestID to it's Prime_GuestID.
However if the Prime_GuestID already has an entry for a given ThingID then instead I need to delete that row from the relationship table.
Currently I'm using the following script, though while it works for most cases it fails if two Dup_GuestID's update to the same Prime_GuestID for a given ThingID. It seems like the merge statement queues up all the changes before making them, therefore my clash check won't detect them.
MERGE Thing_Relation AS t
USING Guest_Relation AS g
ON t.GuestID = g.Dup_GuestID
WHEN MATCHED AND EXISTS ( -- Clash Check here
select *
from Thing_Relation AS t2
where t2.ThingID = t.ThingID
and t2.GuestID = g.Prime_GuestID
)
THEN DELETE
WHEN MATCHED
THEN UPDATE SET t.GuestID = g.Prime_GuestID
Is there a better way to be doing the check at 'When matched and exists' to check for clashes that would result from this merge? Or is there a better way to do this whole thing?
EDIT: Here is some sample data for the table
Thing_Relation Guest_Relation
ThingID | GuestID Prime_GuestID | Dup_GuestID
------------------ ---------------------------
1 | 101 101 | 102
1 | 102 107 | 104
2 | 103 107 | 105
3 | 104
3 | 105
Thing_Relation after merge
ThingID | GuestID
------------------
1 | 101
2 | 103
3 | 107
The 1|102 gets changed to 1|101 which already exists so row is deleted.
2|103 isn't affected
3|104 is changed to 3|107, and since 3|105 also changes to 3|107 but the previous update hasn't happened yet it isn't picked up by the EXISTS clause.

You are right, MERGE would not run your checks on the changes made by the MERGE statement itself. It is the basic property of MERGE. Few of the options could be :
Create a TRIGGER on UPDATE that will keep on checking for clashes on every update and delete the duplicate rows. OR
simply write two different statements one for UPDATE, later one DELETE for duplicate entries.

Related

Is there a way to insert a record in SQL server if it does not match the latest version of the record based on three of the columns?

Consider the following table named UserAttributes:
+----+--------+----------+-----------+
| Id | UserId | AttrName | AttrValue |
+----+--------+----------+-----------+
| 4 | 1 | FavFood | Apples |
| 3 | 2 | FavFood | Burgers |
| 2 | 1 | FavShape | Circle |
| 1 | 1 | FavFood | Chicken |
+----+--------+----------+-----------+
I would like to insert a new record in this table if the latest version of a particular attribute for a user has a value that does not match the latest.
What I mean by the latest is, for example, if I was to do:
SELECT TOP(1) * FROM [UserAttributes] WHERE [UserId] = 1 AND [AttrName] = 'FavFood' ORDER BY [Id] DESC
I will be able to see that user ID 1's current favorite food is "Apples".
Is there a query safe for concurrency that will only insert a new favorite food if it doesn't match the current favorite food for this user?
I tried using the MERGE query with a HOLDLOCK, but the problem is that WHEN MATCHED/WHEN NOT MATCHED, and that works if I never want to insert a new record after a user has previously set their favorite food (in this example) to the new value. However, it does not consider that a user might switch to a new favorite food, then subsequently change back to their old favorite food. I would like to maintain all the changes as a historical record.
In the data set above, I would like to insert a new record if the user ID 1's new favorite food is "Burgers", but I do not want to insert a record if their new favorite food is "Apples" (since that is their current favorite food). I would also like to make this operation safe for concurrency.
Thank you for your help!
EDIT: I should probably also mention that when I split this operation into two queries (ie: first select their current favorite food, then do an insert query only if there is a new food detected) it works under normal conditions. However, we are observing race conditions (and therefore duplicates) since (as you may have guessed) the data set above is simply an example and there are many threads operating on this table at the same time.

A bit ugly, but to do it in one command, you could insert the user's (new) favorite food but filter with an EXCEPT of their current values.
e.g., (assuming the user's new data is in #UserID, #FavFood
; WITH LatestFavFood AS
(SELECT TOP(1) UserID, AttrName, AttrValue
FROM [UserAttributes]
WHERE [UserId] = #UserID AND [AttrName] = 'FavFood'
ORDER BY [Id] DESC
)
INSERT INTO UserAttributes (UserID, AttrName, AttrValue)
SELECT #UserID, 'FavFood', #FavFood
EXCEPT
SELECT UserID, AttrName, AttrValue
FROM LatestFavFood
Here's a DB_Fiddle with three runs.
EDIT: I have changed the above to assume varchar types for AttrName rather than nvarchar. The fiddle has a mixture. Would be good to ensure you get them correct (especially food as it may have special characters).

Update if not exists in SQL Server?

Is there any way to run an update statement in SQL Server that skips rows that already exist in the target?
For instance, I have a view vw_BranchCaseCurrent which contains a CaseID and a BranchID, in addition to an auto-incremented ID. I want to do this:
update a
set
a.CaseID = #NewCaseID
from vw_BranchCaseCurrent a
where
a.CaseID = #OldCaseID;
But the problem is, if there is already an existing row in vw_BranchCaseCurrent for the new CaseID and the existing BranchID then this SQL will crash because it is violating the unique constraint on the backing table. So I'd need to skip that row when performing the update.
I was thinking maybe I could use a merge statement but I'm not entirely familiar with how those work...
There are about a dozen other views that need to be updated so I'm looking for something simple, if possible...
edit: let me clarify with an example:
| CaseID | BranchID |
|--------|----------|
| 42 | 8008 |
| 42 | 9001 |
| 86 | 9001 |
So I want to merge case 42 into case 86 by updating the CaseID field in this view. I want to change the first CaseID from 42 to 86. But the second row, I can't do anything with this because the BranchID of 9001 already exists for CaseID 86. So I leave that one alone.
This is a simple example; some of the other views I need to merge have multiple ID fields in addition to the CaseID...

You can express this using not exists, not in, or a left join . . . or even using window functions.
with toupdate as (
select bcc.*, sum(case when CaseID = #NewCaseID then 1 else 0 end) over () as num_new
from vw_BranchCaseCurrent bcc
)
update toupdate
set CaseID = #NewCaseID
where CaseID = #OldCaseID and num_new = 0;

Oracle SQL - How to do massive updates more efficient and faster?

I'm trying to update 500.000 rows at once. I have a table with products like this:
+------------+----------------+--------------+-------+
| PRODUCT_ID | SUB_PRODUCT_ID | DESCRIPTION | CLASS |
+------------+----------------+--------------+-------+
| A001 | ACC1 | coffeemaker | A |
| A002 | ACC1 | toaster | A |
| A003 | ACC2 | coffee table | A |
| A004 | ACC5 | couch | A |
+------------+----------------+--------------+-------+
I've sets of individually statements, for example:
update products set class = 'A' where product_id = 'A001';
update products set class = 'B' where product_id = 'A005';
update products set class = 'Z' where product_id = 'A150';
I'm making a query putting one update statement below the other update statement and putting a commit statement each 1.000 rows.
It works fine (slow, but fine) but I wanna do it better if it can be possible in any way.
There is a better way to do this more efficient and faster?

One approach would be to create a temporary table holding your update information:
new_product_class:
product_id class
========== =====
A A001
B A005
Z A150
product_id should be an indexed primary key on this new table. Then you can do an UPDATE or a MERGE on the old table joined to this temporary table:
UPDATE (SELECT p.product_id, p.class, n.product_id, n.class
FROM product p
JOIN new_product_class n ON (p.product_id = n.product_id)
SET p.class = n.class
or
MERGE INTO product p
USING new_product_class n
ON (p.product_id = n.product_id)
WHEN MATCHED THEN
UPDATE SET p.class = n.class
Merge should be fast. Other things that you could look into depending on your environment: create a new table based on the old table with nologging followed by some renaming (should backup before and after), bulk updates.

Unless you have an index, each of your update statements scans the entire table. Even if you do have an index, there is a cost associated with the compilation and execution of each statement.
If you have a lot of conditions, and those conditions can vary, then I think Glenn's solution is clearly the way to go. This does everything in a single transaction, and there is no reason to run batches of 1,000 rows -- just do everything all at once.
If the number of conditions is relatively finite (as your example), and they don't change very often, then you can also do this as a simple case:
update products
set class =
case product_id
when 'A001' then 'A'
when 'A005' then 'B'
when 'A150' then 'C'
end
where
product_id in ('A001', 'A005', 'A150')
If it's possible your class field is already set to the correct value, then there is also great value in adding a condition to make sure you are not updating something to the same value. For example if this:
update products set class = 'A' where product_id = 'A001';
Updates 5,000 records, 4,000 of which are already set to 'A', then this would be significantly more efficient:
update products
set class = 'A'
where
product_id = 'A001' and
(class is null or class != 'A')

Sql server dynamic sequences

Here is a theoretical scenario,
Suppose I have a client table and an invoice table.
1 client can have many invoices.
Now I want each invoice to have an invoice number that is unique to that client
i.e.
ClientId InvoiceNo
1 IN0001
2 IN0001
2 IN0002
2 IN0003
3 IN0001
Currently I am controlling this in my application by looking at max values etc but this is obviously not a great solution. I would much rather get my database to do this for me, as it should remove the risk of creating duplicate invoice numbers for a single client (race condition?)
I have been reading up on Sql Server 2012's SEQUENCE which sounds great, but the problem is that I would still need a seperate sequence per client
i.e.
CREATE SEQUENCE InvoiceNum_Client1.....
CREATE SEQUENCE InvoiceNum_Client2.....
CREATE SEQUENCE InvoiceNum_Client3.....
but something feels very dirty about making db meta changes from my app everytime a new client is created. Also then my trigger would have to do something like this (which I wouldn't even begin to know how to do)
NEXT VALUE FOR InvoiceNum_Client + #ClientId
etc
So my next thought was to have a "sequence" table, i.e.
ClientID INSequenceNumber
1 1
2 3
3 3
And in my trigger grab the InSequenceNumber for a given client, use it to make my invoiceNumber, and then update the sequence table, incrementing InSequenceNumber by 1 for the same client. It should have the same effect, but I am just not sure about the inner workings of transactions and scope etc
So my questions are
Is there any big disadvantage to my self rolled sequence method?
Is there another solution that I am perhaps overlooking?
Thanks!

Is there a specific reason for requiring that the invoice numbers are only unique per client? In most ERP systems, invoice numbers are typically globally unique, making implementation much easier. No matter what, you should have an Invoice table that contains a primary key (and you shouldn't use composite primary keys - that's just downright bad data modelling).
This leaves us with a scenario where you might not need to store the "per-client-invoice-number" in the database at all. Assuming you have a table called "Invoices" containing the following data:
Id | ClientId
---------------
1 | 1
2 | 1
3 | 2
4 | 1
5 | 3
6 | 2
Here, Id is the Primary Key of the Invoices table, and ClientId is a foreign key. A query like this:
SELECT
ClientId,
'IN' + RIGHT('0000' +
CONVERT(VARCHAR, ROW_NUMBER() OVER (PARTITION BY ClientId
ORDER BY Id)) AS InvoiceNo,
Id
FROM Invoices
ORDER BY ClientId, InvoiceNo
Would return:
ClientId | InvoiceNo | Id
---------------------------
1 | IN0001 | 1
1 | IN0002 | 2
1 | IN0003 | 4
2 | IN0001 | 3
2 | IN0002 | 6
3 | IN0001 | 5

Why do the clients have to have their own sequences? Have a global sequence number. Then, if you want to get the client sequences in order, use row_number():
select i.*, row_number() over (partition by clientid order by invoiceno) as ClientInvoiceSequence
from invoices;
Note: you might want the order by to be by another field such as date.
If you start storing this information in the database, you will need to do a lot of bookkeeping and careful transaction management:
What happens when two invoices are entered "at the same time"?
What happens when an invoice is deleted?
What happens when an invoice is modified in such a way that it might change the sequence?
You are much better off using an identity column and calculating the sequence when you need it.

Remove rows NOT referenced by a foreign key

This is somewhat related to this question:
I have a table with a primary key, and I have several tables that reference that primary key (using foreign keys). I need to remove rows from that table, where the primary key isn't being referenced in any of those other tables (as well as a few other constraints).
For example:
Group
groupid | groupname
1 | 'group 1'
2 | 'group 3'
3 | 'group 2'
... | '...'
Table1
tableid | groupid | data
1 | 3 | ...
... | ... | ...
Table2
tableid | groupid | data
1 | 2 | ...
... | ... | ...
and so on. Some of the rows in Group aren't referenced in any of the tables, and I need to remove those rows. In addition to this, I need to know how to find all of the tables/rows that reference a given row in Group.
I know that I can just query every table and check the groupid's, but since they are foreign keys, I imagine that there is a better way of doing it.
This is using Postgresql 8.3 by the way.

DELETE
FROM group g
WHERE NOT EXISTS
(
SELECT NULL
FROM table1 t1
WHERE t1.groupid = g.groupid
UNION ALL
SELECT NULL
FROM table1 t2
WHERE t2.groupid = g.groupid
UNION ALL
…
)

At the heart of it, SQL servers don't maintain 2-way info for constraints, so your only option is to do what the server would do internally if you were to delete the row: check every other table.
If (and be damn sure first) your constraints are simple checks and don't carry any "on delete cascade" type statements, you can attempt to delete everything from your group table. Any row that does delete would thus have nothing reference it. Otherwise, you're stuck with Quassnoi's answer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas