SQL Merge Duplicate Records

SQL Merge Duplicate Records - sql

I'm trying to merge two tables in DB2 SQL, it keeps giving me the SQL State 21506 and error message SQL0788.
This is what I have so far:
merge into table1 as tgt
using table2 as src
on src.key1 = tgt.key1 and src.key2 = tgt.key2
when matched then
update set
(fld1, fld2, fld3) = (src.fld1, src.fld2, src.fld3)
when not matched then
insert (fld1, fld2, fld3) values (src.fld1, src.fld2, src.fld3)
I searched for duplicates like this:
select src.key1, src.key2, count(*)
from table1 as tgt
inner join table2 as src on tgt.key1 = src.key1 and tgt.key2 = src.key2
group by src.key1, src.key2
having count(*) > 1
With no (duplicate) records returned.
What am I missing?

SQLSTATE 21506 means that
The same row of the target table was identified more than once for an
update, delete, or insert operation of the MERGE statement.
So you've got two rows in SRC with the same keys...
You're not finding them with this statement
select src.key1, src.key2, count(*)
from table1 as tgt
inner join table2 as src on tgt.key1 = src.key1 and tgt.key2 = src.key2
group by src.key1, src.key2
having count(*) > 1
Because of the inner join, meaning the duplicate rows in table2 don't have a match in table1.
This should show them to you..
select src.key1, src.key2, count(*)
from table2 as src
group by src.key1, src.key2
having count(*) > 1

So the issue was in the update.
Instead of:
update set
(fld1, fld2, fld3) = (src.fld1, src.fld2, src.fld3)
Did in this way:
update set
fld1 = src.fld1,
fld2 = src.fld2,
fld3 = src.fld3
Without changing anything else, this worked, now why? Still don't know...

Related

Hive - Update From statement with a inline query

I have the following query and I wanted to run it in Hive but Hive does not support inline queries in update. Can anyone please help me with this update query in Hive?
UPDATE TABLE1 FROM
(SELECT COUNT(*) AS NEW_COUNT
FROM TABLE2
WHERE XTRCT_DT IN (SELECT MAX(XTRCT_DT) FROM TABLE3)) AS T
SET TBL = T.NEW_COUNT
WHERE XTRCT_DT IN(SELECT MAX(XTRCT_DT) FROM TABLE4) AND TN=1;
Currently I am using hive version Above 3.0.
I have tried Merge statement for this update but it didn't worked. Can someone please help?
This was the MERGE statement that I tried working but I was getting an error in ON clause for inclusion of IN.
MERGE INTO TABLE1 USING (
SELECT COUNT(*) AS NEW_COUNT FROM TABLE2 WHERE XTRCT_DT IN(SELECT MAX(XTRCT_DT) FROM TABLE3)) AS T
ON XTRCT_DT IN(SELECT MAX(XTRCT_DT) FROM TABLE4) AND TN=1
SET TBL=T.NEW_COUNT;

Your MERGE is missing a WHEN MATCHED THEN UPDATE clause and also does not have a join condition between the source subquery T and the target TABLE1. Move the filter date into the source subquery and use that to join:
MERGE INTO TABLE1
USING (SELECT COUNT(*) AS NEW_COUNT,
(SELECT MAX(XTRCT_DT) FROM TABLE4) AS MATCH_DT
FROM TABLE2
WHERE XTRCT_DT IN (SELECT MAX(XTRCT_DT) FROM TABLE3)
) AS T
ON XTRCT_DT = T.MATCH_DT AND TN=1
WHEN MATCHED THEN UPDATE
SET TBL=T.NEW_COUNT;
You could instead cross-join (JOIN with no ON clause) the subquery for MATCH_DT with TABLE2 rather than using a scalar subquery in the SELECT list.

unable to get a stable set of rows error

I am trying to perform a merge into a table (let's call it table1) from a table2. In the USING condition I need a third table (table3). This third table contains some IDs that I need in table1. A simplified version of my merge looks like:
MERGE INTO table1 a
USING (
SELECT ID, address
FROM table3 b
Where address IN
(
SELECT address
FROM table3
WHERE address IS NOT NULL
AND ID> 0
GROUP BY address
HAVING COUNT(*) = 1
)
) c
ON (a.address = c.address)
WHEN MATCHED THEN
UPDATE SET a.ID = c.ID
WHERE a.ID = 0
I know that the error I get is usually caused by the query in the USING clause, but theoretically this problem should be eliminated by the count(*)=1 condition.
I have duplicates in table2, but they should all get an ID from table3 or ID 0 if the address is duplicated in table3.
IDs are unique for an address, so they should be distinct.
P.S. This merge is performed automatically by a script that , so I can modify the query to add more conditions/restrictions, but I cannot change the structure [meaning I have to use these 3 tables as they are].
I hope this makes sense.
Any ideas why this still does not work for me?

Try this:
MERGE INTO table1 a
USING (
SELECT max(ID), address
FROM table3 b
WHERE address IS NOT NULL AND ID > 0
GROUP BY address
HAVING COUNT(*) = 1
) c
ON (a.address = c.address)
WHEN MATCHED THEN
UPDATE SET a.ID = c.ID
WHERE a.ID = 0;
you have where condition in inner query but not in outer query. If you want your original query please try:
MERGE INTO table1 a
USING (
SELECT ID, address
FROM table3 b
AND address IN
(
SELECT address
FROM table3
WHERE address IS NOT NULL
AND ID> 0
GROUP BY address
HAVING COUNT(*) = 1
)
WHERE address IS NOT NULL
AND ID> 0
) c
ON (a.address = c.address)
WHEN MATCHED THEN
UPDATE SET a.ID = c.ID
WHERE a.ID = 0

The issue is more than likely due to the duplicate rows from table2. Here's a simple test case demonstrating the issue:
Setup:
CREATE TABLE t1 (ID INTEGER PRIMARY KEY,
val VARCHAR2(1));
CREATE TABLE t2 (ID INTEGER,
val VARCHAR2(1));
INSERT INTO t1 (ID, val) VALUES (1, 'A');
INSERT INTO t2 (ID, val) VALUES (1, 'B');
INSERT INTO t2 (ID, val) VALUES (1, 'B');
COMMIT;
Merge that will error:
MERGE INTO t1 USING t2
ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET t1.val = t2.val;
ORA-30926: unable to get a stable set of rows in the source tables
Merge that will succeed:
MERGE INTO t1 USING (SELECT DISTINCT id, val FROM t2) t2
ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET t1.val = t2.val;
N.B. The second merge will still fail if you have different values returned for val for the same id; that means you will have more than one row returned for a given id, and Oracle won't know which one to use to update the target table with.
In order to make sure your merge statement will work, you will need to ensure that you will return at most 1 row per address in the source subquery.

Updating a key on table from another table in Oracle

I am trying to update a key on a table (t1) when the key value is (abc) by getting the value from table (t2).
It is working as expected when I am limiting it to a specific person
update table_a t1
set t1.u_key = (select t2.u_key
from table_b t2
where t2.name_f=t1.name_f
and t2.name_l=t1.name_l
and rownum<=1
and t2='NEVADA')
where t1.u_key = 'abc'
and e.name_f='Lori'
and e.name_l='U'
;
I initially tried without rownum and it said too many rows returned.
To run on all the data with t1.u_key='abc' and took out the specific name, I tried this which has been running until time out.
update table_a t1
set t1.u_key = (select t2.u_key
from table_b t2
where t2.name_f=t1.name_f
and t2.name_l=t1.name_l
and rownum<=1
and t2='NEVADA')
where t1.u_key = 'abc'
;
Can you please look at it and suggest what am I missing.

You should first take a look what is returned when you run the inner SELECT statement alone:
SELECT t2.u_key FROM table_b t2
WHERE t2.name_f IN (SELECT name_f FROM table_a WHERE u_key = 'abc')
AND t2.name_l IN (SELECT name_l FROM table_a WHERE u_key = 'abc')
AND t2='NEVADA'
Examine the results and you will see that there are more than one row returned.
If there should be only matching row per key, you would need to add the key to the inner SELECT as well but I can't tell you how it should look like without additional table descriptions and possibly some sample entries from table_a and table_b.

Use this:
update (
SELECT t2.u_key t2key,
t1.ukey t1key
FROM table_b t2,
table_a t1
where t2.name_f=t1.name_f
and t2.name_l=t1.name_l
and t2='NEVADA'
and rownum<=1 )
SET t1key = t2key
where t1key = 'abc';

merge into table_a t1
using(
select name_f, name_l, max(u_key) as new_key
from table_b t2
where t2='NEVADA'
group by name_f, name_l
) t2
on (t1.name_f=t2.name_f and t1.name_l=t2.name_l and t1.u_key='abc')
when matched then
update set t1.u_key=t2.new_key

Error: ORA-30926: unable to get a stable set of rows in the source tables

I am getting the error : unable to get a stable set of rows in the source tables when I ran the below statement.
merge into table_1 c
using (select rep_nbr, T_nbr, SF from table_2) b
on (c.rep_id=b.rep_nbr)
when matched then
update set
c.T_ID =b.T_nbr,
c.SF=b.SF
when not matched then
insert(T_id, SF)
values(null, null);
When i put distinct before the rep_nbr
merge into table_1 c
using (select distinct rep_nbr,t_nbr,SF from table_2) b
on (c.rep_id=b.rep_nbr)
when matched then
update set
c.T_ID =b.T_nbr,
c.SF=b.SF
when not matched then
insert(T_id, SF)
values(null, null);
ORA-01400: cannot insert NULL into c.rep_id.
I don't want to populate c.rep_id, i just need them to match and then update those records that matched.

If I understood correctly, you don't need MERGE, just an UPDATE. Something like this should work:
UPDATE table_1 t1
SET t1.t_id = (SELECT DISTINCT t2.t_nbr FROM table_2 t2 WHERE t1.rep_id = t2.rep_nbr),
t1.sf = (SELECT DISTINCT t2.sf FROM table_2 t2 WHERE t1.rep_id = t2.rep_nbr),
WHERE t1.rep_id IN (SELECT rep_nbr FROM table_2)
But, however, in later versions of Oracle (from 10th or 11th) you can take out parts of merge, for example, take out WHEN NOT MATCHED part.

remove part "with not matched then"
merge into table_1 c
using (select distinct rep_nbr,t_nbr,SF from table_2) b
on (c.rep_id=b.rep_nbr)
when matched then
update set
c.T_ID =b.T_nbr,
c.SF=b.SF

How can I make this SQL update statement more efficient?

I am trying to add count, sum, and average values from one table to another, but I end up querying the same data for each value. I'm using PostgreSQL. I'm turning this over to the experts to learn how to make this update statement more efficient. Here it is:
update "table1" set
"col1" = (SELECT COUNT(*) FROM "table2" WHERE "table2Id" = "table1"."table1Id"),
"col2" = (SELECT AVG("someCol") FROM "table2" WHERE "table2Id" = "table1"."table1Id"),
"col3" = (SELECT SUM("someCol") FROM "table2" WHERE "table2Id" = "table1"."table1Id");
I should be able to run a subquery like this once and access the returned values for the update, correct?
SELECT COUNT(*), AVG("someCol"), SUM("someCol") FROM "table2" WHERE "table2Id" = "table1"."table1Id";
Any help is much appreciated.

Try a subquery:
UPDATE table1
SET col1 = YourCount, col2 = YourAverage, col3 = YourSum
FROM table1 t1
INNER JOIN (
SELECT table2Id, COUNT(*) AS YourCount, AVG(someCol1) YourAverage,
SUM(someCol2) YourSum
FROM table2
GROUP BY table2Id
) t2 ON t1.table1Id = t2.table2Id

I believe in recent (9.0+) versions of Postgresql, it is possible to use a CTE for a cleaner looking query.
WITH calculations AS
(SELECT table2ID, COUNT(*) AS n, SUM(someCol) AS s, AVG(someCol) AS a
FROM table2
GROUP BY table2ID)
UPDATE table1
SET col1=n, col2=s, col3=a
FROM calculations WHERE calculations.table2ID=table1.table1ID;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Merge Duplicate Records - sql

So the issue was in the update. Instead of: update set (fld1, fld2, fld3) = (src.fld1, src.fld2, src.fld3) Did in this way: update set fld1 = src.fld1, fld2 = src.fld2, fld3 = src.fld3 Without changing anything else, this worked, now why? Still don't know...

Related

Hive - Update From statement with a inline query

unable to get a stable set of rows error

Updating a key on table from another table in Oracle

Error: ORA-30926: unable to get a stable set of rows in the source tables

How can I make this SQL update statement more efficient?

Categories

Resources