Writing SQL Query to query and update same table - sql

I am not very friendly with SQL so I need to write a SQL update/delete query on a table. The requirement is as below.
Table A:
Col1(PK) Col2 Col3 Quantity
1 Code1
Value1 5
2 Code2 Value2 2
3 Code1 Value1 3
4 Code3 Value3 8
Considering above table, in which there are multiple rows with same value for Col2 and Col3. I want to write a query which will delete duplicate combination of Col2 and Col3 and sum the Quantity for the resulting record.
The result should be like this:
Col1(PK) Col2 Col3 Quantity
1 Code1
Value1 8
2 Code2 Value2 2
4 Code3 Value3 8

You will need to do this in two parts, and if you want to ensure the integrity of the data, the two parts should be wrapped in a transaction.
The first part updates the required rows' Quantity, the second deletes the now duplicated rows.
BEGIN TRANSACTION
UPDATE TableA
SET Quantity=upd.Quantity
FROM TableA a
INNER JOIN (
SELECT MIN(Col1) AS Col1, SUM(Quantity) AS Quantity
FROM TableA
GROUP BY Col2, Col3
) upd
ON a.Col1 = upd.col1
;WITH DELETES
AS
(
SELECT Col1,ROW_NUMBER() OVER (PARTITION BY Col2,Col3 ORDER BY Col1) rn
FROM TableA
)
DELETE FROM TableA WHERE Col1 IN (
SELECT Col1 FROM DELETES WHERE rn>1
)
COMMIT TRANSACTION
Live example: http://www.sqlfiddle.com/#!3/9efa9/7
(EDIT: Updated to fix issue noted in comments)

Use this:
select *
from (select *, sum(Quantity) over (partition by col2,col3) as Quantity
from tableA
) t

One option would be to just SELECT the result set you want into a new table, and then drop the previous table:
CREATE TABLE A_new(Col1 INT PRIMARY KEY,
Col2 varchar(255),
Col3 varchar(255),
Quantity INT);
INSERT INTO A_new (Col1, Col2, Col3, Quantity)
SELECT MIN(Col1) AS Col1, Col2, Col3, SUM(Quantity) AS Quantity
FROM A
GROUP BY Col2, Col3
Next you can drop table A and rename A_new to A:
DROP TABLE A
sp_rename A_New, A

For the first Update step(Assume that Table A has the name of Table_1) :
Update Table_1 set Quantity = t.total
from Table_1 As p inner join
(select Min(Col1) as Col1,SUM(quantity) as total from Table_1 group by Col2,Col3) as t
on p.Col1=t.Col1
this will update each row that has more than 1 row with its SUM of quantity.
then you can delete the same row which its code2 has the same value with :
WITH CTE AS(
SELECT
RN = ROW_NUMBER()OVER(PARTITION BY Col2,Col3 ORDER BY Col2,Col3)
FROM Table_1
)
DELETE FROM CTE WHERE RN > 1;
Sorry, I thought that Col2 would always be the same with Col3. *I have edited my statements. If you got more than 1 row, it could Delete all, except the first row.

Please execute the below query:
SELECT
Col2,
Col3,
SUM(Quantity)
FROM table_1
GROUP BY
Col2,
Col3

Related

Merge into (in SQL), but ignore the duplicates

I try to merge two tables in snowflake with:
On CONCAT(tab1.column1, tab1.column2) = CONCAT(tab1.column1, tab1.column2)
The problem here is that there are duplicates. that means rows where column1 and column2 in table2 are identical. the only difference is the column timestamp. Therefore i would like to have two options: either i ignore the duplicate and take only one row (with the biggest timestamp), or distinguish again based on the timestamp. the second would be nicer
But I have no clue how to do it
Example:
Table1:
Col1 Col2 Col3 Timestamp
24 10 3 05.05.2022
34 19 2 04.05.2022
24 10 4 06.05.2022
Table2:
Col1 Col2 Col3
24 10 Null
34 19 Null
What I want to do:
MERGE INTO table1 AS dest USING
(SELECT * FROM table2) AS src
ON CONCAT(dest.col1, dest.col2) = CONCAT(src.col1, src.col2)
WHEN MATCHED THEN UPDATE
SET dest.col3 = src.col3
It feels like you want to update from TABLE1 too TABLE2 not the other way around, because as your example is there is no duplicates.
It also feels like you want to use two equi join's on col1 AND col2 not concat them together:
thus how I see your data, and the words you used, I think you should do this:
create or replace table table1(Col1 number, Col2 number, Col3 number, timestamp date);
insert into table1 values
(24, 10, 3, '2022-05-05'::date),
(34, 19, 2, '2022-05-04'::date),
(24, 10, 4, '2022-05-06'::date);
create or replace table table2(Col1 number, Col2 number, Col3 number);
insert into table2 values
(24, 10 ,Null),
(34, 19 ,Null);
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON d.col1 = s.col1 AND d.col2 = s.col2
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
which runs fine:
number of rows updated
2
select * from table2;
shows it's has been updated:
COL1
COL2
COL3
24
10
4
34
19
2
but the JOIN being your way work as you have used if that is correct for your application, albeit it feels very wrong to me.
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON concat(d.col1, d.col2) = concat(s.col1, s.col2)
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
This is it:
WITH CTE AS
(
SELECT *,
RANK() OVER (PARTITION BY col1,col2
ORDER BY Timestamp desc) AS rn
FROM table1
)
UPDATE CTE
SET col3 = (select col3 from table2 where CONCAT(table2.col1,table2.col2) = CONCAT(CTE.col1, CTE.col2))
where CTE.rn =1;

Delete Duplicate record in sql server if 2 colums matching

Col1
Col2
Col3
A
B
1
A
B
1
A
B
2
A
B
2
A
c
1
When col1 and Col2 values are same and Col3 values are different I dont want that values in result set.
I want result as below. I tried with row_number, group by , so manythings but did not worked. Please help me here
Col1
Col2
Col3
A
c
1
You can use exists:
delete from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and t2.col2 = t.col1 and
t2.col3 <> t.col3
);
You can also use window functions:
with todelete as (
select t.*,
min(col3) over (partition by col1, col2) as min_col3,
max(col3) over (partition by col1, col2) as min_col4
from t
)
delete from todelete
where min_col3 <> max_col3;
Best way is to make these column a unique composite key. But here is a query to delete all records other than your desired result.
delete from Table_1
where
Col1=(SELECT Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
And
Col2 =(SELECT Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
this might not be the most optimized and efficient query but it works. if you don't want to delete duplicated records and just retrieve unique ones:
SELECT Col1,Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) = 1
To get duplicating records:
SELECT Col2,Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

I need to remove the duplicates from a table with 139 columns based on 2 columns and load the unique rows with 139 columns into another table.
eg :
col1 col2 col3 .....col139
a b .............
b c .............
a b .............
o/p:
col1 col2 col3 .....col139
a b .............
b c .............
need a SQL query for DB2?
If the "other table" does not exist yet you can create it like this
CREATE TABLE othertable LIKE originaltable
And the insert the requested row with this statement:
INSERT INTO othertable
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1
There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).
You might be better just deleting the duplicates in place. This can be done without specifying a column list.
DELETE FROM
( SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2) AS DUP
FROM t
)
WHERE
DUP > 1
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by a) as seqnum
from t
) t;
If you don't want seqnum in the result set, though, you need to list out all the columns.
To find duplicate values in col1 or any column, you can run the following query:
SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1;
And if you want to delete those duplicate rows using the value of col1, you can run the following query:
DELETE FROM your_table WHERE col1 IN (SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1);
You can use the same approach to delete duplicate rows from the table using col2 values.

HAVING clause: at least one of the ungrouped values is X

Example table:
Col1 | Col2
A | Apple
A | Banana
B | Apple
C | Banana
Output:
A
I want to get all values of Col1 which have more than one entry and at least one with Banana.
I tried to use GROUP BY:
SELECT Col1
FROM Table
GROUP BY Col1
HAVING count(*) > 1
AND ??? some kind of ONEOF(Col2) = 'Banana'
How to rephrase the HAVING clause that my query works?
Use conditional aggregation:
SELECT Col1
FROM Table
GROUP BY Col1
HAVING COUNT(DISTINCT col2) > 1 AND
COUNT(CASE WHEN col2 = 'Banana' THEN 1 END) >= 1
You can conditionally check for Col1 groups having at least one 'Banana' value using COUNT with CASE expression inside it.
Please note that the first COUNT has to use DISTINCT, so that groups with at least two different Col1 values are detected. If by having more than one entry you mean also rows having the same Col2 values repeated more than one time, then you can skip DISTINCT.
SELECT Col1
FROM Table
GROUP BY Col1
HAVING count(*) > 1
AND Col1 in (select distinct Col1 from Table where Col2 = 'Banana');
Here is a simple approach:
SELECT Col1
FROM table
GROUP BY Col1
HAVING COUNT(DISTINCT CASE WHEN col2= 'Banana' THEN 1 ELSE 2 END) = 2
Try this,
declare #t table(Col1 varchar(20), Col2 varchar(20))
insert into #t values('A','Apple')
,('A','Banana'),('B','Apple'),('C','Banana')
select col1 from #t A
where exists
(select col1 from #t B where a.col1=b.col1 and b.Col2='Banana')
group by col1
having count(*)>1

select all columns with one column has different value

In my table,some records have all column values are the same, except one. I need write a query to get those records. what's the best way to do it? the table is like this:
colA colB colC
a b c
a b d
a b e
What's the best way to get all records with all the columns? Thanks for everyone's help.
Assuming you know that column3 will always be different, to get the rows that have more than one value:
SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
If you need all the values in the three columns, then you can join this back to the original table:
SELECT t.*
FROM table t join
(SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
) cols
on t.col1 = cols.col1 and t.col2 = cols.col2
Just select those rows that have the different values:
SELECT col1, col2
FROM myTable
WHERE colWanted != knownValue
If this is not what you are looking for, please post examples of the data in the table and the wanted output.
How about something like
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) = 1
This will give you Col1, Col2 that have unique data.
Assuming col3 has the difs
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) > 1
OR TO SHOW ALL 3 COLS
SELECT Col1, Col2, Col3
FROM Table1
GROUP BY Col1, Col2, Col3
HAVING COUNT(Col3) > 1