Merge into (in SQL), but ignore the duplicates - sql

I try to merge two tables in snowflake with:
On CONCAT(tab1.column1, tab1.column2) = CONCAT(tab1.column1, tab1.column2)
The problem here is that there are duplicates. that means rows where column1 and column2 in table2 are identical. the only difference is the column timestamp. Therefore i would like to have two options: either i ignore the duplicate and take only one row (with the biggest timestamp), or distinguish again based on the timestamp. the second would be nicer
But I have no clue how to do it
Example:
Table1:
Col1 Col2 Col3 Timestamp
24 10 3 05.05.2022
34 19 2 04.05.2022
24 10 4 06.05.2022
Table2:
Col1 Col2 Col3
24 10 Null
34 19 Null
What I want to do:
MERGE INTO table1 AS dest USING
(SELECT * FROM table2) AS src
ON CONCAT(dest.col1, dest.col2) = CONCAT(src.col1, src.col2)
WHEN MATCHED THEN UPDATE
SET dest.col3 = src.col3

It feels like you want to update from TABLE1 too TABLE2 not the other way around, because as your example is there is no duplicates.
It also feels like you want to use two equi join's on col1 AND col2 not concat them together:
thus how I see your data, and the words you used, I think you should do this:
create or replace table table1(Col1 number, Col2 number, Col3 number, timestamp date);
insert into table1 values
(24, 10, 3, '2022-05-05'::date),
(34, 19, 2, '2022-05-04'::date),
(24, 10, 4, '2022-05-06'::date);
create or replace table table2(Col1 number, Col2 number, Col3 number);
insert into table2 values
(24, 10 ,Null),
(34, 19 ,Null);
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON d.col1 = s.col1 AND d.col2 = s.col2
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
which runs fine:
number of rows updated
2
select * from table2;
shows it's has been updated:
COL1
COL2
COL3
24
10
4
34
19
2
but the JOIN being your way work as you have used if that is correct for your application, albeit it feels very wrong to me.
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON concat(d.col1, d.col2) = concat(s.col1, s.col2)
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;

This is it:
WITH CTE AS
(
SELECT *,
RANK() OVER (PARTITION BY col1,col2
ORDER BY Timestamp desc) AS rn
FROM table1
)
UPDATE CTE
SET col3 = (select col3 from table2 where CONCAT(table2.col1,table2.col2) = CONCAT(CTE.col1, CTE.col2))
where CTE.rn =1;

Related

Match columns 1 if data not found then search column 2 oracle query

I am trying to find a way if data is not found based on col1 of a table then search with other column value
SELECT * FROM TABLE
WHERE COL1='123'
IF NULL
THEN
SELECT * FROM TABLE
WHERE COL2='ABC';
Thanks
This a typical SQL select statement involving an OR expression.
SELECT * from TABLE WHERE Col1 = '123' or Col2 = 'ABC';
You want all rows that satisfy the first condition - but if no row matches, then you want all rows that satisfy the second condition.
I would adress this with a row limiting clause (available starting version 12c):
select *
from mytable
where 'ABC' in (col1, col2)
order by rank() over(order by case col1 = 'ABC' then 1 else 2 end)
fetch first 1 row with ties
This is more efficient than union all because it does not require two scans on the table.
You can use exists with union all :
select t.*
from table t
where col1 = 123 union all
select t.*
from table t
where col2 = 'abc' and
not exists (select 1 from table t1 where t1.col1 = 123);
If you are expecting only one row, you can use:
SELECT t.*
FROM TABLE t
WHERE COL1 = '123' OR COL2 = 'ABC'
ORDER BY (CASE WHEN COL1 = '123' THEN 1 ELSE 2 END)
FETCH FIRST 1 ROW ONLY;
With multiple possible rows in the result set, I would go for:
SELECT t.*
FROM TABLE t
WHERE COL1 = '123' OR
(COL2 = 'ABC' AND
NOT EXISTS (SELECT 1 FROM TABLE t2 WHERE t2.COL1 = '123');

Oracle SQL How to find duplicate values in different columns?

I have a set of rows with many columns. For example,
ID | Col1 | Col2 | Col3 | Duplicate
------------------------------------
81 | 101 | 102 | 101 | YES
82 | 101 | 103 | 104 | NO
I need to calculate the "Duplicate" column. It is duplicate because it has the same value in Col1 and Col3. I know there is the LEAST function, which is similar to the MIN function but with columns. Does something similar to achieve this exists?
The approach I have in mind is to write all possible combinations in a case like this:
SELECT ID, col1, col2, col3,
CASE WHEN col1 = col2 or col1 = col3 or col2 = col3 then 1 else 0 end as Duplicate
FROM table
But, I wish to avoid that, since I have too many columns in some cases, and is very prone to errors.
What is the best way to solve this?
Hmmm. You are looking for within-row duplicates. This is painful. More recent versions of Oracle support lateral joins. But for just a handful of non-NULL columns, you can do:
select id, col1, col2, col3,
(case when col1 in (col2, col3) or col2 in (col3) then 1 else 0 end) as Duplicate
from t;
For each additional column, you need to add one more in comparison and update the other in-lists.
Something like this... note that in the lateral clause we still need to unpivot, but that is one row at a time - resulting in possibly much faster execution than simple unpivot and standard aggregation.
with
input_data ( id, col1, col2, col3 ) as (
select 81, 101, 102, 101 from dual union all
select 82, 101, 103, 104 from dual
)
-- End of simulated input data (for testing purposes only).
-- Solution (SQL query) begins BELOW THIS LINE.
select i.id, i.col1, i.col2, i.col3, l.duplicates
from input_data i,
lateral ( select case when count (distinct val) = count(val)
then 'NO' else 'YES'
end as duplicates
from input_data
unpivot ( val for col in ( col1, col2, col3 ) )
where id = i.id
) l
;
ID COL1 COL2 COL3 DUPLICATES
-- ---- ---- ---- ----------
81 101 102 101 YES
82 101 103 104 NO
You can do this by unpivoting and then counting the distinct values per id and checking if it equals the number of rows for that id. Equal means there are no duplicates. Then left join this result to the original table to caclulate the duplicate column.
SELECT t.*,
CASE WHEN x.id IS NOT NULL THEN 'Yes' ELSE 'No' END AS duplicate
FROM t
LEFT JOIN
(SELECT id
FROM
(SELECT *
FROM t
unpivot (val FOR col IN (col1,col2,col3)) u
) t
GROUP BY id
HAVING count(*)<>count(DISTINCT val)
) x ON x.id=t.id
The best way† is to avoid storing repeating groups of columns. If you have multiple columns that essentially store comparable data (i.e. a multi-valued attribute), move the data to a dependent table, and use one column.
CREATE TABLE child (
ref_id INT,
col INT
);
INSERT INTO child VALUES
(81, 101), (81, 102), (81, 101),
(82, 101), (82, 103), (82, 104);
Then it's easier to find cases where a value occurs more than once:
SELECT id, col, COUNT(*)
FROM child
GROUP BY id, col
HAVING COUNT(*) > 1;
If you can't change the structure of the table, you could simulate it using UNIONs:
SELECT id, col1, COUNT(*)
FROM (
SELECT id, col1 AS col FROM mytable
UNION ALL SELECT id, col2 FROM mytable
UNION ALL SELECT id, col3 FROM mytable
... for more columns ...
) t
GROUP BY id, col
HAVING COUNT(*) > 1;
† Best for the query you are trying to run. A denormalized storage strategy might be better for some other types of queries.
SELECT ID, col1, col2,
NVL2(NULLIF(col1, col2), 'Not duplicate', 'Duplicate')
FROM table;
If you want to compare more than 2 columns can implement same logic with COALESCE
I think you want to use fresh data that doesnot contains any duplicate values inside table if it right then use SELECT DISTINCT statement like
SELECT DISTINCT * FROM TABLE_NAME
It will conatins duplicate free data,
Note: It will also applicable for a particular column like
SELECT DISTINCT col1 FROM TABLE_NAME

Identifying rows for deletion/update based on criteria from matching rows

I have a data set that contains rows considered duplicates based on certain fields. I need to match the duplicate rows, evaluate the non-matching fields, and flag one of them for deletion. A sample table is:
ID Col1 Col2 Col3
1 A B CC
2 A B DD
3 E F GG
4 E F HH
So I need to identify rows 1 & 2 as duplicates based on Col1 and Col2 matching, and compare the Col3 fields, ultimately flagging either row 1 or 2 for deletion. And the same for rows 3 & 4. This table consists entirely of rows that match at least one other row across Col1 and Col2.
My first thought was to join onto itself to flatten the rows into this format:
t1.ID t2.ID t1.Col1 t1.Col2 TableOneCol3 TableTwoCol3
1 2 A B CC DD
3 4 E F GG HH
Then it would be simple to evaluate TableOneCol3 and TableTwoCol3 for each row.
I tried to do this with a self join:
select t1.ID, t2.ID, t1.Col1, t1.Col2, t1.Col3 as TableOneCol3, t2.col3 as TableTwoCol3
into #temptable
from tableOne t1
join tableTwo t2
where t1.Col1 = t2.Col2
and t1.Col2 = t2.Col2
and t1.ID <> t2.ID
But of course this doesn't remove duplicates - just adds the duplicate field information to each row.
I went down the path of pivoting the data - but I end up with a similar result -I pivot the duplicates as well.
I dug through SO but not sure if I have the specific words for what I need to do (the admittedly vague title might be a giveaway - apologies for that). I found many examples of flattening data into single columns and pivots, but nothing that would flatten paired rows and remove one of them from the resultset.
Not sure if I'm going down the wrong road for this or not. It seems I need to evaluate each row in the context of what has been evaluated prior - but I'm not sure how to do this without resorting to a cursor.
It is extremely unclear what you are trying to do. I tossed together a couple of quick ideas that might be what you are trying to do.
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something
(
ID int
, Col1 char(1)
, Col2 char(1)
, Col3 char(2)
)
insert #Something
(
ID
, Col1
, Col2
, Col3
)
VALUES
(1, 'A', 'B', 'CC'),
(2, 'A', 'B', 'DD'),
(3, 'E', 'F', 'GG'),
(4, 'E', 'F', 'HH');
with SortedResults as
(
select *
, ROW_NUMBER() over(partition by Col1, Col2 order by Col3) as RowNum
from #Something
)
delete SortedResults
where RowNum > 1
select *
from #Something;
--OR maybe you want to cross tab the data???
drop table #Something
GO
create table #Something
(
ID int
, Col1 char(1)
, Col2 char(1)
, Col3 char(2)
)
insert #Something
(
ID
, Col1
, Col2
, Col3
)
VALUES
(1, 'A', 'B', 'CC'),
(2, 'A', 'B', 'DD'),
(3, 'E', 'F', 'GG'),
(4, 'E', 'F', 'HH');
with SortedResults as
(
select *
, ROW_NUMBER() over(partition by Col1, Col2 order by Col3) as RowNum
from #Something
)
select
MAX(case when RowNum = 1 then ID end) as ID_1
, MAX(case when RowNum = 2 then ID end) as ID_2
, Col1
, Col2
, MAX(case when RowNum = 1 then Col3 end) as Col3_1
, MAX(case when RowNum = 2 then Col3 end) as Col3_2
from SortedResults
group by
Col1
, Col2
You could obtain a table in a form similar to the one you describe with use of the LEAD() analytic function. This will have the benefit that it works reasonably well when your dupes come in groups larger than two. For example:
select
ID,
lead(ID) over (partition by col1, col2 order by col3) as nextId,
Col1,
Col2,
Col3,
lead(Col3) over (partition by col1, col2 order by col3) as nextCol3
into #temptable
from tableOne
Results would be of the form
ID nextId Col1 Col2 Col3 nextCol3
1 2 A B CC DD
2 NULL A B DD NULL
3 4 E F GG HH
4 NULL E F HH NULL
If you are confident that you don't need to deal with groups larger than two then you could get the exact table you wanted by afterward filtering out, say, the rows having nextId IS NULL.

Writing SQL Query to query and update same table

I am not very friendly with SQL so I need to write a SQL update/delete query on a table. The requirement is as below.
Table A:
Col1(PK) Col2 Col3 Quantity
1 Code1
Value1 5
2 Code2 Value2 2
3 Code1 Value1 3
4 Code3 Value3 8
Considering above table, in which there are multiple rows with same value for Col2 and Col3. I want to write a query which will delete duplicate combination of Col2 and Col3 and sum the Quantity for the resulting record.
The result should be like this:
Col1(PK) Col2 Col3 Quantity
1 Code1
Value1 8
2 Code2 Value2 2
4 Code3 Value3 8
You will need to do this in two parts, and if you want to ensure the integrity of the data, the two parts should be wrapped in a transaction.
The first part updates the required rows' Quantity, the second deletes the now duplicated rows.
BEGIN TRANSACTION
UPDATE TableA
SET Quantity=upd.Quantity
FROM TableA a
INNER JOIN (
SELECT MIN(Col1) AS Col1, SUM(Quantity) AS Quantity
FROM TableA
GROUP BY Col2, Col3
) upd
ON a.Col1 = upd.col1
;WITH DELETES
AS
(
SELECT Col1,ROW_NUMBER() OVER (PARTITION BY Col2,Col3 ORDER BY Col1) rn
FROM TableA
)
DELETE FROM TableA WHERE Col1 IN (
SELECT Col1 FROM DELETES WHERE rn>1
)
COMMIT TRANSACTION
Live example: http://www.sqlfiddle.com/#!3/9efa9/7
(EDIT: Updated to fix issue noted in comments)
Use this:
select *
from (select *, sum(Quantity) over (partition by col2,col3) as Quantity
from tableA
) t
One option would be to just SELECT the result set you want into a new table, and then drop the previous table:
CREATE TABLE A_new(Col1 INT PRIMARY KEY,
Col2 varchar(255),
Col3 varchar(255),
Quantity INT);
INSERT INTO A_new (Col1, Col2, Col3, Quantity)
SELECT MIN(Col1) AS Col1, Col2, Col3, SUM(Quantity) AS Quantity
FROM A
GROUP BY Col2, Col3
Next you can drop table A and rename A_new to A:
DROP TABLE A
sp_rename A_New, A
For the first Update step(Assume that Table A has the name of Table_1) :
Update Table_1 set Quantity = t.total
from Table_1 As p inner join
(select Min(Col1) as Col1,SUM(quantity) as total from Table_1 group by Col2,Col3) as t
on p.Col1=t.Col1
this will update each row that has more than 1 row with its SUM of quantity.
then you can delete the same row which its code2 has the same value with :
WITH CTE AS(
SELECT
RN = ROW_NUMBER()OVER(PARTITION BY Col2,Col3 ORDER BY Col2,Col3)
FROM Table_1
)
DELETE FROM CTE WHERE RN > 1;
Sorry, I thought that Col2 would always be the same with Col3. *I have edited my statements. If you got more than 1 row, it could Delete all, except the first row.
Please execute the below query:
SELECT
Col2,
Col3,
SUM(Quantity)
FROM table_1
GROUP BY
Col2,
Col3

How to pivot rows without grouping, counting, averaging

I am reworking some tables from a screwed up database. A few of the tables had the same data with different table names, and each one of them also had similar data but different column names. Anyway, this is a weird request but this has to be down like this.
I need to pivot rows up to simulate one row so I can create one record from two different tables.
I have attached a photo. The table on the left will pull a single row and the table on the left will supply 1 - n rows based on the id from the left table. I need to pivot the rows up to simulate one row and create one record with the two results.
From my checking online the pivot seems to be the way to go but it seems to want me to group or do some type of aggregating.
What is the best way to go about doing this?
table1 ---Produces one row
table1id | col1 | col2 | col3
1 Wow Wee Zee
table2 ---Produces 1 - n rows
table2id | table1id | col1 | col2 | col3
1 1 sock cloth sup
2 1 bal baa zak
3 1 x y fooZ
needs to look like this (the below is not column names, they're the result set)
Woo,wee,zee,sock,cloth,sup,bla,baaa,zak,x,y,fooZ
If using MySQL:
SELECT a.table1id, GROUP_CONCAT(a.col) AS col_values
FROM
(
SELECT table1id, col1 col FROM table1 UNION ALL
SELECT table1id, col2 FROM table1 UNION ALL
SELECT table1id, col3 FROM table1 UNION ALL
SELECT table1id, col1 FROM table2 UNION ALL
SELECT table1id, col2 FROM table2 UNION ALL
SELECT table1id, col3 FROM table2
) a
GROUP BY a.table1id
SQLFiddle Demo
If using SQL-Server:
SELECT a.table1id, b.colnames
FROM table1 a
CROSS APPLY
(
SELECT STUFF((
SELECT ',' + aa.col
FROM
(
SELECT table1id, col1 col FROM table1 UNION ALL
SELECT table1id, col2 FROM table1 UNION ALL
SELECT table1id, col3 FROM table1 UNION ALL
SELECT table1id, col1 FROM table2 UNION ALL
SELECT table1id, col2 FROM table2 UNION ALL
SELECT table1id, col3 FROM table2
) aa
WHERE aa.table1id = a.table1id
FOR XML PATH('')
), 1, 1, '') AS colnames
) b
SQLFiddle Demo