Update table row with certain id while deleting the recurrent row - sql

I have 2 tables
Table name: Attributes
attribute_id | attribute_name
1 attr_name_1
2 attr_name_2
3 attr_name_1
4 attr_name_2
Table name: Products
product_id | product_name | attribute_id
1 prod_name_1 1
2 prod_name_2 2
3 prod_name_3 3
4 prod_name_4 4
If you can see, attribute_id in the table Products has the following id's (1,2,3,4), instead of (1,2,1,2).
The problem is in the table Attributes, namely, there are repeating values(attribute_names) with different ID, so I want:
To pick One ID of the repeating, from the table Attributes
Update the table Products with that "picked" ID(only in cases that attribute_id has same name in the table Attributes)
And after that, delete the repeating values from the table Attributes witch has no use in the table Products
Output:
Table name: Attributes
attribute_id | attribute_name
1 attr_name_1
2 attr_name_2
Table name: Products
product_id | product_name | attribute_id
1 prod_name_1 1
2 prod_name_2 2
3 prod_name_3 1
4 prod_name_4 2
Demo on SQLFiddle
Note:
it will help me a lot if i use sql instead fixing this issue manually.

update Products
set attribute_id = (
select min(attribute_id)
from Attributes a
where a.attribute_name=(select attribute_name from Attributes a2 where a2.attribute_id=Products.attribute_id)
);
DELETE
FROM Attributes
WHERE attribute_id NOT IN
(
SELECT MIN(attribute_id)
FROM Attributes
GROUP BY attribute_name
);

The following may be faster than #Alexander Sigachov's suggestion, but it does require at least SQL Server 2005 to run it, while Alexander's solution would work on any (reasonable) version of SQL Server. Still, even if only for the sake of providing an alternative, here you go:
WITH Min_IDs AS (
SELECT
attribute_id,
min_attribute_id = MIN(attribute_id) OVER (PARTITION BY attribute_name)
FROM Attributes
)
UPDATE p
SET p.attribute_id = a.min_attribute_id
FROM Products p
JOIN Min_IDs a ON a.attribute_id = p.attribute_id
WHERE a.attribute_id <> a.min_attribute_id
;
DELETE FROM Attributes
WHERE attribute_id NOT IN (
SELECT attribute_id
FROM Products
WHERE attribute_id IS NOT NULL
)
;
The first statement's CTE returns a row set where every attribute_id is mapped to the minimum attribute_id for the same attribute_name. By joining to this mapping set, the UPDATE statement uses it to replace attribute_ids in the Products table.
When subsequently deleting from Attributes, it is enough just to check if Attributes.attribute_id is not found in the Products.attribute_id column, which is what the the second statement does. That is to say, grouping and aggregation, as in the other answer, is not needed at this point.
The WHERE attribute_id IS NOT NULL condition is added to the second query's subquery in case the column is nullable and may indeed contain NULLs. NULLs need to be filtered out in this case, or their presence would result in the NOT IN predicate's evaluation to UNKNOWN, which SQL Server would treat same as FALSE (and so no row would effectively be deleted). If there cannot be NULLs in Products.attribute_id, the condition may be dropped.

Related

Updating uniqueidentifier column with same value for rows with matching column value

I need a little help. I have this (simplified) table:
ID
Title
Subtype
RelatedUniqueID
1
My Title 1
1
NULL
2
My Title 2
1
NULL
3
My Title 3
2
NULL
4
My Title 4
2
NULL
5
My Title 5
2
NULL
6
My Title 6
3
NULL
What I am trying to accomplish is generating the same uniqueidentifier for all rows having the same subtype.
So result would be this:
ID
Title
Subtype
RelatedUniqueID
1
My Title 1
1
439753d3-9103-4d0e-9dd0-569dc71fd6a3
2
My Title 2
1
439753d3-9103-4d0e-9dd0-569dc71fd6a3
3
My Title 3
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
4
My Title 4
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
5
My Title 5
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
6
My Title 6
3
055838c6-a814-4bd1-a859-63d4544bb449
Requirements
One query to update all rows at once
The actual table has many more rows with hundreds of subtypes, so manually building a query for each subtype is not an option
Using SQL Server 2017
Thanks for any assist.
Because newid() is applied per-row, you have to generate the values first, so this has to involve the use of a temporary or permanent table to store the correlated ID>Subtype value.
So first you need to generate the GUID values per Subtype :
with subtypes as (
select distinct subtype
from t
)
select Subtype, NewId() RelatedId into #Id
from subtypes
And then you can use an updatable CTE to apply these to your base table:
with r as (
select t.*, id.RelatedId
from #id id
join t on t.subtype=id.Subtype
)
update r
set relatedUniqueId=RelatedId
See example DB<>Fiddle
You can use an updatable CTE with a window function to get this data:
with r as (
select t.*,
RelatedId = first_value(newid()) over (partition by t.Subtype order by ID rows unbounded preceding)
from t
)
update r
set relatedUniqueId = RelatedId;
db<>fiddle
I warn though, that newid() is somewhat unpredictable in when it is calculated, so don't try messing about with a joined update (unless you pre-save the IDs like #Stu has done).
For example, see this fiddle, the IDs were calculated differently for every row.
I have found the single query solution.
Pre-requirement for this to work is that RelatedUniqueID must already contain random values. (e.g. set default field value to newid)
UPDATE TestTable SET ForeignUniqueID = TG.ForeignUniqueID FROM TestTable TG INNER JOIN TestTable ON TestTable.SubType = TG.SubType
Update
As Stu mentions in the comments, this solution might affect performance on large datasets. Please keep that in mind.

union table, change serial primary key, postgresql

Postgresql:
I have two tables 'abc' and 'xyz' in postgresql. Both tables have same 'id' columns which type is 'serial primary key;'.
abc table id column values are 1,2,3 and also xyz id column containing same values 1,2,3,4
I want to union both tables with 'union all' constraint. But I want to change 'xyz' id column values to next value of 'abc' id column last value as 1,2,3,4,5,6,7
select id from abc
union all
select id from xyz
|id|
1
2
3
1
2
3
4
my wanted resuls as
|id|
1
2
3
4
5
6
7
BETTER - Thanks to #CaiusJard
This should do it for you
select id FROM abc
UNION ALL select x.id + a.maxid FROM xyz x,
(SELECT MAX(id) as maxid from abc) a
ORDER BY id
For anyone who's doing something like this:
I had a similar problem to this, I had table A and table B which had two different serials. My solution was to create a new table C which was identical to table B except it had an "oldid" column, and the id column was set to use the same sequence as table A. I then inserted all the data from table B into table C (putting the id in the oldid field). Once I fixed the refernces to point to from the oldid to the (new)id I was able to drop the oldid column.
In my case I needed to fix the old relations, and needed it to remain unique in the future (but I don't care that the ids from table A HAVE to all be before those from table C). Depending on what your trying to accomplish, this approach may be useful.
If anyone is going to use this approach, strictly speaking, there should be a trigger to prevent someone from manually setting an id in one table to match another. You should also alter the sequence to be owned by NONE so it's not dropped with table A, if table A is ever dropped.

How to write sql query in teradata based on multiple conditions

I have two tables.Table Bill has following fields:
Field_Name Field_Type
===============================
Bill_Sts_Sk decimal(18) PK
epn_id bigint child key
epn_seq_id bigint child key
ref_id integer child key
Table CLM_Crg has following fields:
Field_Name Field_Type
===============================
Bill_Sts_Sk decimal(18)
epn_id bigint PK
epn_seq_id bigint PK
ref_id integer PK
I need to look up Bill_Sts_Sk against parent key(Bill_Sts_Sk) against BILL table.Following is the condition for look up:
Lookup on BILL matching on epn_id,
epn_seq_id and ref_id
if not found,
try again only using the first 2 fields.
If not found use default value, -1.
If more than 1 key is found, use the maximum value
How can we achieve it by writing a sql query ? I have written following query for first part:
select Bill_Sts_Sk
from Bill bl
left join CLM_CRG crg
ON bl.epn_id = crg.epn_id
and bl.epn_seq_id = crg.epn_seq_id
and bl.ref_id = crg.ref_id
Can any any please help me to write sql query in Teradata(14.10.06.05) for the above conditions
You can join on the first two columns and then look for the best match using a ROW_NUMBER:
SELECT bl.*, COALESCE(crg.Bill_Sts_Sk, -1)
FROM Bill bl
LEFT JOIN CLM_CRG crg
ON bl.epn_id = crg.epn_id
AND bl.epn_seq_id = crg.epn_seq_id
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY bl.epn_id, bl.epn_seq_id
ORDER BY CASE WHEN bl.ref_id = crg.ref_id THEN 1 ELSE 2 END -- best: matching ref_ids
,crg.Bill_Sts_Sk DESC) = 1 -- 2nd best: highest Bill_Sts_St
The best PI for this would be on (epn_id, epn_seq_id) for both tables.

Merge and order rows

I have a table in the following structure. I am writing a query to get all item_ids where key_name='topic' and key_string_value='investing', which is the simple part.
select item_id from table where key_name='topic' and key_string_value='investing'
But then for all the item_ids returned above, I want to order them by the values set for each item_id in key_name='importance' and key_name='product'.The table structure is making it very difficult as I am not an SQL expert. Any help would be appreciated.
item_id key_name key_string_value Key_float_value
1 topic investing null
1 importance null 500
1 product A null
1 product B null
2 topic Starting null
2 product B null
2 importance null 300
2 topic retail null
3 importance null 400
3 topic investing null
3 product C null
4 topic Starting null
4 topic investing null
4 importance null 400
4 product D null
#Schwern is on right - your structure should be normalized, and the names should be better too. All this makes me think: homework.
The answer to the homework question is a self join, and looks like this:
select t1.item_id , imp.key_float_value, prd.key_string_value
from [table] t1
LEFT OUTER JOIN [table] imp on imp.item_id = t1.item_id and imp.key_name='importance'
LEFT OUTER JOIN [table] prd on prd.item_id = t1.item_id and prd.key_name='product'
where t1.key_name='topic' and t1.key_string_value='investing'
ORDER BY imp.key_float_value, prd.key_string_value
The square brackets on `[table] are because the use of the table keyword as the table name requires the name to be delimited. Square brackets for TSQL. Others use double quotes (")
You have a very poorly design table that will be slow and difficult to work with. SQL is not a key/value store; it works on rows, columns and relationships. Rather than fight it, I would suggest redesigning it. Either use a NoSQL database which is easier to use and works more like normal programming data structures, or redesign it.
Here's the redesign I would suggest.
CREATE TABLE item (
id INTEGER PRIMARY KEY,
importance INTEGER DEFAULT 0
);
CREATE TABLE item_topics (
item_id INTEGER REFERENCES item(id),
topic TEXT NOT NULL
);
CREATE TABLE item_products (
item_id INTEGER REFERENCES item(id),
product TEXT NOT NULL
);
The item itself, and any scalar (ie. single value) attributes go into one table. Anything which can be a list (products and topics) needs its own table relating each item to its elements. If this seems clunky, that's because it is, but that's how SQL works.
To find all items whose topic is investing, you have to join on the item_topics table.
SELECT item.id
FROM item
JOIN item_topics ON item.id = item_topics.id
WHERE topic = 'investing'
Then to order them, add ORDER BY item.importance.

T-SQL to "Merge" two rows, or "Rekey" all FK relationships

I have a production database where occasionally redundant rows in a single table need to be "Merged".
Let's assume that both rows in this table have identical values, except their IDs.
Table "PrimaryStuff"
ID | SomeValue
1 | "I have value"
2 | "I have value"
3 | "I am different"
Let's also assume that a number of related tables exist. Because duplicates were created in the "PrimaryStuff" table, often rows are created in these child tables that SHOULD all be related to a single record on the PrimaryStuff table. The number and names of these tables are not under my control and should be considered dynamically at runtime. IE: I don't know the names or even the number of related records, as other people may edit the database without my knowledge.
Table "ForeignStuff"
ID | PrimaryStuffId | LocalValue
1| 1| "I have the correct FK"
2| 1| "I have the correct FK"
3| 2| "I should get pointed to an FK of 1"
To resolve the duplication of PrimaryStuff's row 1 and 2, I wish to have ALL related tables change their FK's to 1s and then delete the PrimaryStuff's row 2. This SHOULD be trivial, as if PrimaryStuff's row 1 didn't exist, I could just update the Primary Key on Row 2 to 1, and the changes would cascade out. I cannot do this because that would be a duplicate key in the PrimaryStuff's unique index.
Feel free to ask questions and I'll try to clear up anything that's confusing.
First lets get a list of the rows that need to be updated (as I understand it you want the lowest ID to replace all the higher IDs)
SELECT MIN(ID) OVER (PARTITION BY SomeValue ORDER BY SomeValue, ID ASC) AS FirstID,
ID,
SOMEVALUE
FROM PrimaryStuff
We can remove the ones where FirstID and ID match, these don't matter
SELECT FirstID, ID FROM
(
SELECT MIN(ID) OVER (PARTITION BY SomeValue ORDER BY SomeValue, ID ASC) AS FirstID,
ID,
SOMEVALUE
FROM PrimaryStuff
) T
WHERE FirstID != ID
Now we have a change list. We can use this in an update statement, put it in a temp table (or a CTE as I did below):
WITH ChangeList AS
(
SELECT FirstID, ID FROM
(
SELECT MIN(ID) OVER (PARTITION BY SomeValue ORDER BY SomeValue, ID ASC) AS FirstID,
ID
FROM PrimaryStuff
) T
WHERE FirstID != ID
)
UPDATE ForeignStuff
SET PrimaryStuffId = ChangeList.FirstID
FROM ForeignStuff
JOIN ChangeList ON ForeignStuff.ID = ChangeList.ID
NB - Code not tested, might have typos.
Could you be more proactive and either use the existing ID when SomeValue already exists and enforce a unique constraint on PrimaryStuff.SomeValue, or why not make SomeValue the primary key of PrimaryStuff. With it as the PrimaryKey then you would only ever add a record to PrimaryStuff if SomeValue did not already exist in it.
Lastly, and most simply, if SomeValue is always arbitrarily defined by others and you take whatever they give you, why not just drop PrimaryStuff altogether and let users enter whatever they wish into ForeignStuff? If you need a unique listing for SomeValue, create a view based on your main table. If you need to speed up querying then add an index to ForeignStuff.SomeValue field.
Here's an (untested) view when there are multiple tables like ForeignStuff:
-- dynamically generate a distinct list of values of interest
select SomeValue from ForeignStuffA
union select SomeValue from ForeignStuffB
union select SomeValue from ForeignStuffC
-- and so on, the union applies distinct