Update rows only when a join is unambiguous

Update rows only when a join is unambiguous - sql

I've got two tables, a and b, both with product_name and value. The value column in a is null.
I'd like to update the a table with values from the b table. Because of a quirk in the data, product_name is not unique in either table.
I only want to set the value when there is one unambiguous match on product_name between the two. When more than one row in a has the same product name, or more than one row matches from b, I'd like to keep the value empty. Is there an efficient way to do this in Postgres?
A simpler version of this would be to first identify unique product names in a. Then, update rows where only a single row in b matches -- but I'm also not sure how to write that constraint.

The simple way:
update a
set value = (select min(b.value) from b where b.product_name = a.product_name)
where product_name in (select product_name from a group by product_name having count(*) = 1)
and product_name in (select product_name from b group by product_name having count(*) = 1)
;

You can use aggregation:
update a
set value = b.value
from (select b.product_name, max(b.value) as value
from b
group by b.product_name
having min(b.value) = max(b.value) -- there is only one value
) b
where b.product_name = a.product_name;
Note that this assumes that b.value is not null. It is easy to include logic for null values, if that is needed.

Just put together [How to Select Every Row Where Column Value is NOT Distinct and #Gordon Linoff's answer:
create table a (
id serial primary key
,product_name text
,value int
);
create table b (
id serial primary key
,product_name text
,value int not null
);
insert into a (product_name) values
('A')
,('B')
,('C')
,('D')
,('E')
,('E');
insert into b (product_name,value) values
('A',1)
,('A',1)
,('B',42)
,('C',1)
,('C',2)
,('E',1)
;
update a
set value = b.value
from (select product_name, min(value) as value
from b
group by b.product_name
having 1 = count(*)
) b
where b.product_name = a.product_name
and a.product_name not in
(select product_name
from a
group by product_name
having 1 < count(*));
#Gordon Lindof your answer fails if product_name and value both dupilcated in b (example product_name=A) and misses the requirement product_name not duplicated in a.

Related

HIVE - Update a Column value from Record A using the same table Record B

I'm looking for a sample Hive Update query that uses the same table as source to update the column. Eg: having 2 records with id as common and one record has the acc no. and i am trying to update the same acc no. in the another record. Can u pl. share if you have ?
tried the below but not helping..
UPDATE a FROM udc.hive.horton.eoquality.testtable a,
(
select id, cust_id from (select cust_id,id,row_number() over (partition by id by evnt_dt_pst_ts desc) as findacc
from
udc.hive.horton.eoquality.testtable) where findacc = 1 and cust_id <> '' and id = '1087'
) b
SET cust_id= b. cust_id
WHERE a.id = b.id

Oracle query to find the latest record which is not null and based on the other table field value

Table A looks like,
am doing to select details of max(key) say,
select * from A where key in (select max(key) from A);
Running the above query gives output,
Key Number type
2915935 B
where Number is Null.
I want to find the number from next max(key) but type from the current max value. If null again find the number field from next max(wo_key) so that i get output like below,
2915935 06924278753 B
Please suggest a way i can do the above.

If i got it right, I think the idea here is to fetch the number from next max(key) provided the number for current max(key) is null.
select a.key,c.Number,a.type from tableA a
join (select max(key) as key from tableA) b on a.key = b.key
join (select number from
(
select number from tableA
where number is not null
order by key desc
)where rownum = 1)c on 1=1;
let me know if this is what you looking for.

Try this:
CREATE TABLE A("Key" NUMBER(20), "Number" NUMBER(20), "Type" VARCHAR(10));
INSERT INTO A VALUES(2915929,'','A');
INSERT INTO A VALUES(2915935,'','B');
INSERT INTO A VALUES(1582987,'03892448882','A');
INSERT INTO A VALUES(2175622,'05924488825','C');
INSERT INTO A VALUES(2385156,'06924278753','V');
select "Key"
,NVL(NULLIF("Number",0), (SELECT MAX("Number") FROM A)) AS "Number"
,"Type"
from A
where "Key" in (select max("Key") from A);
Output:
KEY NUMBER TYPE
2915935 6924278753 B
Check the # SQL Fiddle

SQL Server Group By in an UPDATE statement

How is it possible to use GROUP BY in this statement?
UPDATE LoanMaster
SET LeadsID1 = NEXT VALUE FOR LM
WHERE PrdAcctId IS NOT NULL
GROUP BY LBrCode, CustNo

I am going to speculate that you want a unique id for the pair LBrCode and CustNo. You can do this as:
with nums as (
SELECT t.*, (NEXT VALUE FOR LM) as newval
FROM (SELECT DISTINCT LBrCode, CustNo
FROM LoanMaster
WHERE PrdAcctId IS NOT NULL
) t
)
update lm
set LeadsId1 = newval
from LoanMaster lm JOIN
nums
on lm.LBrCode = nums.LBrCode and lm.CustNo = nums.CustNo;
Note: Although this should work, you should really create a Leads table with one row per value. It seems like you want a foreign key relationship, and you should have an entity for that relationship.

sql - select row id based on two column values in same row as id

Using a SELECT, I want to find the row ID of 3 columns (each value is unique/dissimilar and is populated by separate tables.) Only the ID is auto incremented.
I have a middle table I reference that has 3 values: ID, A, B.
A is based on data from another table.
B is based on data from another table.
How can I select the row ID when I only know the value of A and B, and A and B are not the same value?

Do you mean that columns A and B are foreign keys?
Does this work?
SELECT [ID]
FROM tbl
WHERE A = #a AND B = #b

SELECT ID FROM table WHERE A=value1 and B=value2

It's not very clear. Do you mean this:
SELECT ID
FROM middletable
WHERE A = knownA
AND B = knownB
Or this?
SELECT ID
FROM middletable
WHERE A = knownA
AND B <> A
Or perhaps "I know A" means you have a list of values for A, which come from another table?
SELECT ID
FROM middletable
WHERE A IN
( SELECT otherA FROM otherTable ...)
AND B IN
( SELECT otherB FROM anotherTable ...)

Tricky MS Access SQL query to remove surplus duplicate records

I have an Access table of the form (I'm simplifying it a bit)
ID AutoNumber Primary Key
SchemeName Text (50)
SchemeNumber Text (15)
This contains some data eg...
ID SchemeName SchemeNumber
--------------------------------------------------------------------
714 Malcolm ABC123
80 Malcolm ABC123
96 Malcolms Scheme ABC123
101 Malcolms Scheme ABC123
98 Malcolms Scheme DEF888
654 Another Scheme BAR876
543 Whatever Scheme KJL111
etc...
Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with the same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).
I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.

See if this query returns the rows you want to keep:
SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
(SELECT
SchemeNumber,
SchemeName,
Len(SchemeName) AS name_length,
ID
FROM tblSchemes
) AS r
INNER JOIN
(SELECT
SchemeNumber,
Max(Len(SchemeName)) AS name_length
FROM tblSchemes
GROUP BY SchemeNumber
) AS w
ON
(r.SchemeNumber = w.SchemeNumber)
AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;
If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.
DELETE
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);
Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:
DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));

DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
WHERE t1.SchemeNumber = t2.SchemeNumber
AND Length(t2.SchemeName) > Length(t1.SchemeName)
)
Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)

delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))
(SQL Server flavored)
Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.

I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.
-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
id int(11) NOT NULL,
SchemeName varchar(50),
SchemeNumber varchar(15),
PRIMARY KEY (id)
);
insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );
-- Find all the records that have dups, find the longest one
create table bar as
select max(length(SchemeName)) as max_length, SchemeNumber
from foo
group by SchemeNumber
having count(*) > 1;
-- Find the one we want to keep
create table bat as
select min(a.id) as id, a.SchemeNumber
from foo a join bar b on a.SchemeNumber = b.SchemeNumber
and length(a.SchemeName) = b.max_length
group by SchemeNumber;
-- Select into this table all the rows to delete
create table baz as
select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber
and a.id != b.id;
This will give you a new table with only records for rows that you want to remove.
Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.
Then when you're ready, use this command to delete the rows using this command.
delete from foo where id in (select id from baz);
This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.

If your platform supports ranking functions and common table expressions:
with cte as (
select row_number()
over (partition by SchemeNumber order by len(SchemeName) desc) as rn
from Table)
delete from cte where rn > 1;

try this:
Select * From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or this:,...
Select * From Table t
Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
if either of these selects the records that should be deleted, just change it to a delete
Delete
From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or using the second construction:
Delete From Table t Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Update rows only when a join is unambiguous - sql

The simple way: update a set value = (select min(b.value) from b where b.product_name = a.product_name) where product_name in (select product_name from a group by product_name having count() = 1) and product_name in (select product_name from b group by product_name having count() = 1) ;

Related

HIVE - Update a Column value from Record A using the same table Record B

Oracle query to find the latest record which is not null and based on the other table field value

SQL Server Group By in an UPDATE statement

sql - select row id based on two column values in same row as id

Tricky MS Access SQL query to remove surplus duplicate records

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Update rows only when a join is unambiguous - sql

The simple way: update a set value = (select min(b.value) from b where b.product_name = a.product_name) where product_name in (select product_name from a group by product_name having count(*) = 1) and product_name in (select product_name from b group by product_name having count(*) = 1) ;

Related

HIVE - Update a Column value from Record A using the same table Record B

Oracle query to find the latest record which is not null and based on the other table field value

SQL Server Group By in an UPDATE statement

sql - select row id based on two column values in same row as id

Tricky MS Access SQL query to remove surplus duplicate records

Categories

Resources

The simple way: update a set value = (select min(b.value) from b where b.product_name = a.product_name) where product_name in (select product_name from a group by product_name having count() = 1) and product_name in (select product_name from b group by product_name having count() = 1) ;