I have a n : m relationship between two tables, Product and Tag, called ProductTag. It has only two columns: ProductId and TagId.
The primary key is a composite of these two columns.
Inserting into this table is straight forward.
But, having a new set of tags that should be associated with the product, what are my options of updating the table in one go?
Right now, inside a transaction, I delete all product tags associated with a product, and insert the 'updated' ones. This works, is simple, and did not take a long time to code.
I'm still curious how it could be solved more elegantly, maybe even with PostgreSQL specific functionality?
Example:
Say you had 3 entries in this table:
product_id | tag_id
-----------+-------
1 | 2
1 | 3
1 | 6
A request arrives to update the product tags to look as follows:
product_id | tag_id
-----------+-------
1 | 3
1 | 6
1 | 7
Tag with tag_id 2 was removed, and a new tag with tag_id 7 was added. What is the best way of achieving this state in a single statement?
If we are talking about the "usual" amount of tags - say "tens" of tags, rather than "thousands" - then the delete/insert approach isn't such a bad idea.
You can however do it in a single statement, that applies the changes:
with new_tags (product_id, tag_id) as (
values (1,3),(1,6),(1,9)
), remove_tags as (
delete from product_tag pt1
using new_tags nt
where pt1.product_id = nt.product_id
and pt1.tag_id <> ALL (select tag_id from new_tags)
)
insert into product_tag (product_id, tag_id)
select product_id, tag_id
from new_tags
on conflict do nothing;
The above assumes that (product_id,tag_id) is defined as the primary key in product_tag.
Online example: https://rextester.com/VVL1293
Related
What I would like to do is have SQL maintain the integrity of my data as opposed to doing it through my application. I would like to have it that the SQL will not let me add duplicate items per price list.
If I have 3 tables "PriceList", "Prices" and "InventoryItems"
Columns
PriceList: ID, Name
InventoryItem: ID, SKU, Name
Prices: ID, ListID, ItemID, Price
I cannot put a unique on the prices.itemid as this wont allow me to add the same item to multiple price lists.
ID
ListID
ItemID
Price
1
1
106
25.35
2
1
122
45.85
3
1
122
33.24
4
1
136
86.33
In the example above I would like there to be a constraint which will prevent the item on line 3 from being added as there is already an itemID 122 linked to ListID 1
Can this be done in SQL with relations/index or some other methodology.
You can create a UNIQUE constraint, as in:
alter table my_table
add constraint uq1 unique (ListID, ItemID);
We have lots of tables in MS SQL that created without table relations many years ago. Now we are trying to create a relationship between these tables. The problem is in many of these developers used fake ids in the tables.
For example:
TABLE A, ID(primary key) -> TABLE B, AID needs to be relational. But developers used some fake ids like -1,-2 to solve some problems in their side. And now when I try to create a relation between TABLE A, ID(primary key) -> TABLE B, AID, I am getting errors.
TABLE A
ID | NAME
1 | name01
2 | name02
TABLE B
ID | NAME | AID
1 | name01 | 1
2 | name02 | -1
3 | name03 | -2
Is there way to solve this problem and is it meaning full what developers did, they didn't use any relations in sql, they are controlling everything in code-behind.
Thanks
You need to add those to your reference table. Something like this:
insert into a (id, name)
select distinct aid, 'Automatically Generated'
from b
where not exists (select 1 from a where b.aid = a.id) and
a.id is not null;
Then you can add the foreign key relationship:
alter table b add constraint fk_b_aid foreign key (aid) references a(id);
The general idea of referential integrity is exactly that you can't have invalid references.
So the best course of action here would be to suck it up and manually clean it up. Create the missing entries in the other table, or delete the records.
You can also ignore checks on existing data. If you are using sql server management studio to create relations there is option to do that just like in this screen shot
Hope it helps
I am attempting to duplicate an entry. That part isn't hard. The tricky part is: there are n entries connected with a foreign key. And for each of those entries, there are n entries connected to that. I did it manually using a lookup to duplicate and cross reference the foreign keys.
Is there some subroutine or method to duplicate an entry and search for and duplicate foreign entries? Perhaps there is a name for this type of replication I haven't stumbled on yet, is there a specific database related title for this type of operation?
PostgreSQL 8.4.13
main entry (uid is serial)
uid | title
-----+-------
1 | stuff
department (departmentid is serial, uidref is foreign key for uid above)
departmentid | uidref | title
--------------+--------+-------
100 | 1 | Foo
101 | 1 | Bar
sub_category of department (textid is serial, departmentref is foreign for departmentid above)
textid | departmentref | title
-------+---------------+----------------
1000 | 100 | Text for Foo 1
1001 | 100 | Text for Foo 2
1002 | 101 | Text for Bar 1
You can do it all in a single statement using data-modifying CTEs (requires Postgres 9.1 or later).
Your primary keys being serial columns makes it easier:
WITH m AS (
INSERT INTO main (<all columns except pk>)
SELECT <all columns except pk>
FROM main
WHERE uid = 1
RETURNING uid AS uidref -- returns new uid
)
, d AS (
INSERT INTO department (<all columns except pk>)
SELECT <all columns except pk>
FROM m
JOIN department d USING (uidref)
RETURNING departmentid AS departmentref -- returns new departmentids
)
INSERT INTO sub_category (<all columns except pk>)
SELECT <all columns except pk>
FROM d
JOIN sub_category s USING (departmentref);
Replace <all columns except pk> with your actual columns. pk is for primary key, like main.uid.
The query returns nothing. You can return pretty much anything. You just didn't specify anything.
You wouldn't call that "replication". That term usually is applied for keeping multiple database instances or objects in sync. You are just duplicating an entry - and depending objects recursively.
Aside about naming conventions:
It would get even simpler with a naming convention that labels all columns signifying "ID of table foo" with the same (descriptive) name, like foo_id. There are other naming conventions floating around, but this is the best for writing queries, IMO.
This question already has answers here:
Is it possible to use a PG sequence on a per record label?
(4 answers)
Closed 9 years ago.
I have two models, A and B. A has many B. Originally, both A and B had an auto-incrementing primary key field called id, and B had an a_id field. Now I have found myself needing a unique sequence of numbers for each B within an A. I was keeping track of this within my application, but then I thought it might make more sense to let the database take care of it. I thought I could give B a compound key where the first component is a_id and the second component auto-increments, taking into consideration the a_id. So if I insert two records with a_id 1 and one with a_id 2 then I will have something like:
a_id | other_id
1 | 1
1 | 2
2 | 1
If ids with lower numbers are deleted, then the sequence should not recycle these numbers. So if (1, 2) gets deleted:
a_id | other_id
1 | 1
2 | 1
When the next record with a_id 1 is added, the table will look like:
a_id | other_id
1 | 1
2 | 1
1 | 3
How can I do this in SQL? Are there reasons not to do something like this?
I am using in-memory H2 (testing and development) and PostgreSQL 9.3 (production).
The answer to your question is that you would need a trigger to get this functionality. However, you could just create a view that uses the row_number() function:
create view v_table as
select t.*,
row_number() over (partition by a order by id) as seqnum
from table t;
Where I am calling the primary key for the table id.
I have 2 tables
Table name: Attributes
attribute_id | attribute_name
1 attr_name_1
2 attr_name_2
3 attr_name_1
4 attr_name_2
Table name: Products
product_id | product_name | attribute_id
1 prod_name_1 1
2 prod_name_2 2
3 prod_name_3 3
4 prod_name_4 4
If you can see, attribute_id in the table Products has the following id's (1,2,3,4), instead of (1,2,1,2).
The problem is in the table Attributes, namely, there are repeating values(attribute_names) with different ID, so I want:
To pick One ID of the repeating, from the table Attributes
Update the table Products with that "picked" ID(only in cases that attribute_id has same name in the table Attributes)
And after that, delete the repeating values from the table Attributes witch has no use in the table Products
Output:
Table name: Attributes
attribute_id | attribute_name
1 attr_name_1
2 attr_name_2
Table name: Products
product_id | product_name | attribute_id
1 prod_name_1 1
2 prod_name_2 2
3 prod_name_3 1
4 prod_name_4 2
Demo on SQLFiddle
Note:
it will help me a lot if i use sql instead fixing this issue manually.
update Products
set attribute_id = (
select min(attribute_id)
from Attributes a
where a.attribute_name=(select attribute_name from Attributes a2 where a2.attribute_id=Products.attribute_id)
);
DELETE
FROM Attributes
WHERE attribute_id NOT IN
(
SELECT MIN(attribute_id)
FROM Attributes
GROUP BY attribute_name
);
The following may be faster than #Alexander Sigachov's suggestion, but it does require at least SQL Server 2005 to run it, while Alexander's solution would work on any (reasonable) version of SQL Server. Still, even if only for the sake of providing an alternative, here you go:
WITH Min_IDs AS (
SELECT
attribute_id,
min_attribute_id = MIN(attribute_id) OVER (PARTITION BY attribute_name)
FROM Attributes
)
UPDATE p
SET p.attribute_id = a.min_attribute_id
FROM Products p
JOIN Min_IDs a ON a.attribute_id = p.attribute_id
WHERE a.attribute_id <> a.min_attribute_id
;
DELETE FROM Attributes
WHERE attribute_id NOT IN (
SELECT attribute_id
FROM Products
WHERE attribute_id IS NOT NULL
)
;
The first statement's CTE returns a row set where every attribute_id is mapped to the minimum attribute_id for the same attribute_name. By joining to this mapping set, the UPDATE statement uses it to replace attribute_ids in the Products table.
When subsequently deleting from Attributes, it is enough just to check if Attributes.attribute_id is not found in the Products.attribute_id column, which is what the the second statement does. That is to say, grouping and aggregation, as in the other answer, is not needed at this point.
The WHERE attribute_id IS NOT NULL condition is added to the second query's subquery in case the column is nullable and may indeed contain NULLs. NULLs need to be filtered out in this case, or their presence would result in the NOT IN predicate's evaluation to UNKNOWN, which SQL Server would treat same as FALSE (and so no row would effectively be deleted). If there cannot be NULLs in Products.attribute_id, the condition may be dropped.