Update query with 'not exists' check causes primary key violation

Update query with 'not exists' check causes primary key violation - sql

The following tables are involved:
Table Product:
product_id
merged_product_id
product_name
Table Company_Product:
product_id
company_id
(Company_Product has a primary key on both the product_id and company_id columns)
I now want to run an update on Company_Product to set the product_id column to a merged_ product_id. This update could cause duplicates which would trigger a primary key violation, so therefore I added a 'not exists' check in the where clause and my query looks like this:
update cp
set cp.product_id = p.merged_product_id
from Company_Product cp
join Product p on p.product_id = cp.product_id
where p.merged_product_id is not null
and not exists
(select * from Company_Product cp2
where cp2.company_id = cp.company_id and
cp2.product_id = p.merged_product_id)
But this query fails with a primary key violation.
What I think might happen is that because the Product table contains multiple rows with the same merged_product_id, it will succeed the for the first product, but when going to the next product with the same merged_product_id, it'll fail because the 'not exists' subquery does not see the first change, as the query has not finished and committed yet.
Am I right in thinking this, and how would I change the query to make it work?
[EDIT] Some data examples:
Product:
product_id merged_product_id
23 35
24 35
25 12
26 35
27 NULL
Company_Product:
product_id company_id
23 2
24 2
25 2
26 3
27 4
[EDIT 2] Eventually I went with this solution, which uses a temporary table to to the update on and then inserts the updated data into the original Company_Product table:
create table #Company_Product
(product_id int, company_id int)
insert #Company_Product select * from Company_Product
update cp
set cp.product_id = p.merged_product_id
from #Company_Product cp
join Product p on p.product_id = cp.product_id
where p.merged_product_id is not null
delete from Company_Product
insert Company_Product select distinct * from #Company_Product
drop table #Company_Product

A primary key is supposed to be three things:
Non-null
Unique
Unchanging
By altering part of the primary key you're violating requirement #3.
I think you'd be better off creating a new table, populating it, then drop the constraints, drop the original table, and rename the new table to the desired name (then of course, re-apply the original constraints). In my experience this gives you the chance to check out the 'new' data before making it 'live'.
Share and enjoy.

You can use MERGE if you are on SQL 2008 at least.
Otherwise you're going to have to choose a criteria to establish which merged_product_id you want in and which one you leave out:
update cp
set cp.product_id = p.merged_product_id
from Company_Product cp
cross apply (
select top(1) merged_product_id
from Product
where product_id = cp.product_id
and p.merged_product_id is not null
and not exists (
select * from Company_Product cp2
where cp2.company_id = cp.company_id and
cp2.product_id = merged_product_id)
order by <insert diferentiating criteria here>) as p
Note that this is not safe if multiple concurrent requests are running the merge logic.

I can't quite see how your structure is meant to work or what this update is trying to achieve. You seem to be updating Company_Product and setting a (new) product_id on an existing row that apparently has a different product_id; e.g., changing the row from one product to another. This seems...an odd use case, I'd expect you to be inserting a new unique row. So I think I'm missing something.
If you're converting Company_Product to using a new set of product IDs instead of an old set (the name "merged_product_id" makes me speculate this), are you sure that there is no overlap between the old and new? That would cause a problem like what you're describing.

Without seeing your data, I believe your analysis is correct - the entire set is updated and then the commit fails since it results in a constraint violation. An EXISTS is never re-evaluated after "partial commit" of some of the UPDATE.
I think you need to more precisely define your rules regarding attempting to change multiple products to the same product according to the merged_product_id and then make those explicit in your query. For instance, you could exclude any products which would fall into that category with a further NOT EXISTS with appropriate query.

I think you are correct on why the update is failing. To fix this, run a delete query on your company_product table to remove the extra product_ids where the same merged_prduct_id will be applied.
here is a stab at what the query might be
delete company_product
where product_id not in (
select min(product_id)
from product
group by merged_product_id
)
and product_id not in (
select product_id
from product
where merged_product_id is null
)
-- Explanation added in resonse to comment --
What this tries to do is to delete rows that will be duplicates after the update. Since you have products with multiple merged ids, you really only need one of those products (for each company) in the table when you are done. So, my query (if it works...) will keep the min original product id for each merged product id - then your update will work.
So, let's say you have 3 product ids which will map to 2 merged ids: 1 -> 10, 2 -> 20, 3 -> 20. And you have the following company_product data:
product_id company_id
1 A
2 A
3 A
If you run your update against this, it will try to change both the second and third rows to product id 20, and it will fail. If you run the delete I suggest, it will remove the third row. After the delete and the update, the table will look like this:
product_id company_id
10 A
20 A

Try this:
create table #Company_Product
(product_id int, company_id int)
create table #Product (product_id int,merged_product_id int)
insert into #Company_Product
select 23, 2
union all select 24, 2
union all select 25, 2
union all select 26, 3
union all select 27, 4
insert into #product
Select 23, 35
union all select 24, 35
union all select 25, 12
union all select 26, 35
union all select 27, NULL
update cp
set product_id = merged_product_id
from #company_product cp
join
(
select min(product_id) as product_id, merged_product_id
from #product where merged_product_id is not null
group by merged_product_id
) a on a.product_id = cp.product_id
delete cp
--select *
from #company_product cp
join #product p on cp.product_id = p.product_id
where cp.product_id <> p.merged_product_id
and p.merged_product_id is not null

Related

Copy data on parent and child table

I have two database tables. What I need to do is to copy specific data from one storage to another, but also keep the mapping to the photos. First part I can do easily writing
INSERT INTO item (storage_id, price, quantity, description, document_id)
SELECT 10, price, quantity, description, document_id
FROM item
WHERE quantity >= 10 AND price <= 100
but after that newly inserted items does not have photos. Note, that document_id field is unique for not copied items.

Assuming id columns are auto-generated surrogate primary keys, like a serial or IDENTITY column.
Use a data-modifying CTE with RETURNING to make do with a single scan on each source table:
WITH sel AS (
SELECT id, price, quantity, description, document_id
FROM item
WHERE quantity >= 10
AND price <= 100
)
, ins_item AS (
INSERT INTO item
(storage_id, price, quantity, description, document_id)
SELECT 10 , price, quantity, description, document_id
FROM sel
RETURNING id, document_id -- document_id is UNIQUE in this set!
)
INSERT INTO item_photo
(item_id, date, size)
SELECT ii.id , ip.date, ip.size
FROM ins_item ii
JOIN sel s USING (document_id) -- link back to org item.id
JOIN item_photo ip ON ip.item_id = s.id; -- join to org item.id
CTE sel reads all we need from table items.
CTE ins_item inserts into table item. The RETURNING clause returns newly generated id values, together with the (unique!) document_id, so we can link back.
Finally, the outer INSERT inserts into item_photo. We can select matching rows after linking back to the original item.id.
Related:
Insert New Foreign Key Row for new Strings
But:
document_id field is unique for not copied items.
Does that guarantee we are dealing with unique document_id values?

Given that document_id is the same in the two sets, we can used that to ensure that after the first copy, that all duplicate entries that have photos are copied across.
Note: This is still a dirty hack, but it will work. Ideally with data synchronizations we make sure that there is a reference or common key in all the target tables. You could also use output parameters to capture the new id values or use a cursor or other looping constructs to process the records 1 by 1 and copy the photos at the same time instead of trying to update the photos after the initial copy stage.
This query will insert photos for items that do NOT have photos but another item with the same document_id does have photos.
INSERT INTO item_photo (item_id, "date", size)
SELECT "source_photo".item_id, "source_photo"."date", "source_photo". Size
FROM item "target_item"
INNER JOIN item "source_item" on "target_item".document_id = "source_item".document_id
INNER JOIN item_photo "source_photo" ON "source_item".id = "source_photo".item_id
WHERE "target_item".id <> "source_item".id
AND NOT EXISTS ( SELECT id FROM item_photo WHERE item_id = "target_item".id)
AND source_item.id IN (
SELECT MIN(p.item_id) as "item_id"
FROM item_photo p
INNER JOIN item i ON p.item_id = i.id
GROUP BY document_id
)

How to copy the column id from another table?

I'm stuck with this since last week. I have two tables, where the id column of CustomerTbl correlates with CustomerID column of PurchaseTbl:
What I'm trying to achieve is I want to duplicate the data of the table from itself, but copy the newly generated id of CustomerTbl to PurchaseTbl's CustomerID
Just like from the screenshots above. Glad for any help :)

You may use OUTPUT clause to access to the new ID. But to access to both OLD ID and NEW ID, you will need to use MERGE statement. INSERT statement does not allow you to access to the source old id.
first you need somewhere to store the old and new id, a mapping table. You may use table variable or temp table
declare #out table
(
old_id int,
new_id int
)
then the merge statement with output clause
merge
#CustomerTbl as t
using
(
select id, name
from CustomerTbl
) as s
on 1 = 2 -- force it to `false`, not matched
when not matched then
insert (name)
values (name)
output -- the output clause
s.id, -- old_id
inserted.id -- new_id
into #out (old_id, new_id);
after that you just use the #out to join back using old_id to obtain the new_id for the PurchaseTbl
insert into PurchaseTbl (CustomerID, Item, Price)
select o.new_id, p.Item, p.Price
from #out o
inner join PurchaseTbl p on o.old_id = p.CustomerID

Not sure what your end game is, but one way you could solve this is this:
INSERT INTO purchaseTbl ( customerid ,
item ,
price )
SELECT customerid + 3 ,
item ,
price
FROM purchaseTbl;

How to Retrieve id of inserted row when using upsert with WITH cluase in Posgres 9.5?

I'm trying to do upset query in Postgres 9.5 using "WITH"
with s as (
select id
from products
where product_key = 'test123'
), i as (
insert into products (product_key, count_parts)
select 'test123', 33
where not exists (select 1 from s)
returning id
)
update products
set product_key='test123', count_parts=33
where id = (select id from s)
returning id
Apparently I'm retrieving the id only on the updates and get nothing on insertions even though I know insertions succeeded.
I need to modify this query in a way I'll be able the get the id both on insertions and updates.
Thanks!

It wasn't clear to me why you do at WITH first SELECT, but the reason you get only returning UPDATE id is because you're not selecting INSERT return.
As mentioned (and linked) in comments, Postgres 9.5 supports INSERT ON CONFLICT Clause which is a much cleaner way to use.
And some examples of before and after 9.5:
Before 9.5: common way using WITH
WITH u AS (
UPDATE products
SET product_key='test123', count_parts=33
WHERE product_key = 'test123'
RETURNING id
),i AS (
INSERT
INTO products ( product_key, count_parts )
SELECT 'test123', 33
WHERE NOT EXISTS( SELECT 1 FROM u )
RETURNING id
)
SELECT *
FROM ( SELECT id FROM u
UNION SELECT id FROM i
) r;
After 9.5: using INSERT .. ON CONFLICT
INSERT INTO products ( product_key, count_parts )
VALUES ( 'test123', 33 )
ON CONFLICT ( product_key ) DO
UPDATE
SET product_key='test123', count_parts=33
RETURNING id;
UPDATE:
As hinted in a comment there might be slight cons using INSERT .. ON CONFLICT way.
In case table using auto-increment and this query happens a lot, then WITH might be a better option.
See more: https://stackoverflow.com/a/39000072/1161463

Database design - Multiple 1 to many relationships

What would be the best way to model 1 table with multiple 1 to many relatiionships.
With the above schema if Report contains 1 row, Grant 2 rows and Donation 12. When I join the three together I end up with a Cartesian product and result set of 24. Report joins to Grant and creates 2 rows, then Donation joins on that to make 24 rows.
Is there a better way to model this to avoid the caresian product?
example code
DECLARE #Report
TABLE (
ReportID INT,
Name VARCHAR(50)
)
INSERT
INTO #Report
(
ReportID,
Name
)
SELECT 1,'Report1'
DECLARE #Grant
TABLE (
GrantID INT IDENTITY(1,1) PRIMARY KEY(GrantID),
GrantMaker VARCHAR(50),
Amount DECIMAL(10,2),
ReportID INT
)
INSERT
INTO #Grant
(
GrantMaker,
Amount,
ReportID
)
SELECT 'Grantmaker1',10,1
UNION ALL
SELECT 'Grantmaker2',999,1
DECLARE #Donation
TABLE (
DonationID INT IDENTITY(1,1) PRIMARY KEY(DonationID),
DonationMaker VARCHAR(50),
Amount DECIMAL(10,2),
ReportID INT
)
INSERT
INTO #Donation
(
DonationMaker,
Amount,
ReportID
)
SELECT 'Grantmaker1',10,1
UNION ALL
SELECT 'Grantmaker2',3434,1
UNION ALL
SELECT 'Grantmaker3',45645,1
UNION ALL
SELECT 'Grantmaker4',3,1
UNION ALL
SELECT 'Grantmaker5',34,1
UNION ALL
SELECT 'Grantmaker6',23,1
UNION ALL
SELECT 'Grantmaker7',67,1
UNION ALL
SELECT 'Grantmaker8',78,1
UNION ALL
SELECT 'Grantmaker9',98,1
UNION ALL
SELECT 'Grantmaker10',43,1
UNION ALL
SELECT 'Grantmaker11',107,1
UNION ALL
SELECT 'Grantmaker12',111,1
SELECT *
FROM #Report r
INNER JOIN
#Grant g
ON r.ReportID = g.ReportID
INNER JOIN
#Donation d
ON r.ReportID = d.ReportID
Update 1 2011-03-07 15:20
Cheers for the feedback so far, to add to this scenario there are also 15 other 1 to many relationships coming from the one report table. These tables can't for various business reasons be grouped together.

Is there any relationship at all between Grants and Donations? If there isn't, does it make sense to pull back a query that shows a pseudo relationship between them?
I'd do one query for grants:
SELECT r.*, g.*
FROM #Report r
JOIN #Grant g ON r.ReportID = g.ReportID
And another for donations:
SELECT r.*, d.*
FROM #Report r
JOIN #Donation d ON r.ReportID = d.ReportID
Then let your application show the appropriate data.
However, if Grants and Donations are similar, then just make a more generic table such as Contributions.
Contributions
-------------
ContributionID (PK)
Maker
Amount
Type
ReportID (FK)
Now your query is:
SELECT r.*, c.*
FROM #Report r
JOIN #Contribution c ON r.ReportID = c.ReportID
WHERE c.Type = 'Grant' -- or Donation, depending on the application

If you're going to join on ReportID, then no, you can't avoid a lot of rows. When you omit the table "Report", and just join "Donation" to "Grant" on ReportId, you still get 24 rows.
SELECT *
FROM Grant g
INNER JOIN
Donation d
ON g.ReportID = d.ReportID
But the essential point is that it doesn't make sense in the real world to match up donations and grants. They're completely independent things that essentially have nothing to do with each other.
In the database, the statement immediately above will join each row in Grants to every matching row in Donation. The resulting 24 rows really shouldn't surprise you.
When you need to present independent things to the user, you should use a report writer or web application (for example) that selects the independent things, well, independently. Select donations and put them into one section of a report or web page, then select grants and put them into another section of the report or web page, and so on.
If the table "Report" is supposed to help you record which sections go into a particular report, then you need a structure more like this:
create table reports (
reportid integer primary key,
report_name varchar(35) not null unique
);
create table report_sections (
reportid integer not null references reports (reportid),
section_name varchar(35), -- Might want to reference a table of section names
section_order integer not null,
primary key (reportid, section_name)
);

The donation and grant tables look almost identical. You could make them one table and add a column that is something like DonationType. Would reduce complexity by 1 table. Now if donations and grants are completely different and have different subtables associated with them then keeping them seperate and only joining on one at a time would be ideal.

Tricky MS Access SQL query to remove surplus duplicate records

I have an Access table of the form (I'm simplifying it a bit)
ID AutoNumber Primary Key
SchemeName Text (50)
SchemeNumber Text (15)
This contains some data eg...
ID SchemeName SchemeNumber
--------------------------------------------------------------------
714 Malcolm ABC123
80 Malcolm ABC123
96 Malcolms Scheme ABC123
101 Malcolms Scheme ABC123
98 Malcolms Scheme DEF888
654 Another Scheme BAR876
543 Whatever Scheme KJL111
etc...
Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with the same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).
I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.

See if this query returns the rows you want to keep:
SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
(SELECT
SchemeNumber,
SchemeName,
Len(SchemeName) AS name_length,
ID
FROM tblSchemes
) AS r
INNER JOIN
(SELECT
SchemeNumber,
Max(Len(SchemeName)) AS name_length
FROM tblSchemes
GROUP BY SchemeNumber
) AS w
ON
(r.SchemeNumber = w.SchemeNumber)
AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;
If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.
DELETE
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);
Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:
DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));

DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
WHERE t1.SchemeNumber = t2.SchemeNumber
AND Length(t2.SchemeName) > Length(t1.SchemeName)
)
Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)

delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))
(SQL Server flavored)
Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.

I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.
-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
id int(11) NOT NULL,
SchemeName varchar(50),
SchemeNumber varchar(15),
PRIMARY KEY (id)
);
insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );
-- Find all the records that have dups, find the longest one
create table bar as
select max(length(SchemeName)) as max_length, SchemeNumber
from foo
group by SchemeNumber
having count(*) > 1;
-- Find the one we want to keep
create table bat as
select min(a.id) as id, a.SchemeNumber
from foo a join bar b on a.SchemeNumber = b.SchemeNumber
and length(a.SchemeName) = b.max_length
group by SchemeNumber;
-- Select into this table all the rows to delete
create table baz as
select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber
and a.id != b.id;
This will give you a new table with only records for rows that you want to remove.
Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.
Then when you're ready, use this command to delete the rows using this command.
delete from foo where id in (select id from baz);
This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.

If your platform supports ranking functions and common table expressions:
with cte as (
select row_number()
over (partition by SchemeNumber order by len(SchemeName) desc) as rn
from Table)
delete from cte where rn > 1;

try this:
Select * From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or this:,...
Select * From Table t
Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
if either of these selects the records that should be deleted, just change it to a delete
Delete
From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or using the second construction:
Delete From Table t Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Update query with 'not exists' check causes primary key violation - sql

Related

Copy data on parent and child table

How to copy the column id from another table?

How to Retrieve id of inserted row when using upsert with WITH cluase in Posgres 9.5?

Database design - Multiple 1 to many relationships

Tricky MS Access SQL query to remove surplus duplicate records

Categories

Resources