De-duplicate data in postgresql based on few columns

De-duplicate data in postgresql based on few columns - sql

I have a table in psql
where column T1 represent time (t5> t4> t3> t2> t1)
I want to remove all the rows except the one with the latest time, with same value of column C1 and C2,
i.e I want
Can someone please help me with the query for this. I am new to psql so i am not able to figure out this on my own.
Thanks.

Use distinct on:
select distinct on (c1, c2) t.*
from t
order by c1, c2, t1 desc;
In a delete, you can use:
delete from t
where t.t1 < (select max(t2.t1)
from t t2
where t2.c1 = t.c1 and t2.c2 = t.c2
);

Related

Is there a way of switching/changing the "ON" condition in a SQL query when the columns do not match?

I have two tables (t1 and t2). I want to print/keep all records from t1 (we'll treat t1 as the left table). I want to perform a JOIN with t1 and t2 on two columns, but there's a problem. Table t1 consists of columns c1 and c2. Table t2 consists of columns c3, c4, and c5. I want to perform a JOIN between t1 and t2 on c1 (from t1) and c3 (from t2), but I also want to do a JOIN between t1 and t2 on c2 (from t1) and c4 (from t2) if records from c1 and c3 do not match.
Here are the two tables. Completely fictitious, but applicable
to my real, work-related problem
The table below is what I want
All the records/rows from t1 are printed.
I greatly appreciate anyone who comes forward with query solutions. With that being said, is there a way to solve this problem without UNION? Also, I am using SQL Server.

Looking at the input and output data, I think you mean to compare c2 and c4 if c1 and c3 differ, not c1 and c2. I recreated the tables in sql and this code below gives the result you're looking for.
In that case you can just join and use an OR:
SELECT
*
FROM
t1
LEFT JOIN t2 ON
t1.c1 = t2.c3
OR t1.c2 = t2.c4;

Return differing rows in tsql

I apologize for the basic non specific title. I can’t conceptualize how to ask this question or write the query I need in tsql. Any suggestions or guidance would be helpful. I have four columns that matter to me in a table:
c1(primarykey), c2, c3, c4
For any two rows, If c3 and c4 match but c2 doesn’t I want to return the rows. Amplify this to the entire table.
I’ve tried joining on a temp table then finding the difference through a left join on the table to itself but maybe I’m doing something incorrectly. Thank you in advance.

You could use:
WITH cte AS (
SELECT *, MIN(c2) OVER(PARTITION BY c3,c4) AS m, MAX(c2) OVER(PARTITION BY c3,c4) AS m2
FROM tab
)
SELECT *
FROM cte
WHERE m <> m2;

If you want to return the rows, then exists is a good way to go:
select t
from t
where exists (select 1
from t t2
where t2.c3 = t.c3 and t2.c4 = t.c4 and
t2.c2 <> t.c2
);
You do not mention NULL values in your question. If you have NULL values in any of the three columns, you would need to tweak the logic.
If you just wanted the c3/c4 pairs with different c2 values, you can use aggregation:
select c3, c4
from t
group by c3, c4
having min(c2) <> max(c2);
Finally, if you wanted to see pairs of non-matches on a single row, then:
select t.*, t2.c1, t2.c2
from t join
t t2
on t2.c3 = t.c3 and t2.c4 = t.c4 and
t2.c2 > t.c1;

With EXISTS:
select t.* from tablename t
where exists (
select 1 from tablename
where c2 <> t.c2 and c3 = t.c3 and c4 = t.c4
)

You can use Except
SELECT C1,C2,C3,C4 FROM TABLE1
EXCEPT
SELECT C4,C3,C2,C1 FROM TABLE1
This will check all the column values and if any of the value doesn't match then that record will be returns. More over you can add more columns to this query to match values.

How to merge two queries having different output into one query while sequence should remain same

In different SQL queries when I merge into one while follow the same sequence. Query are as follows-
select c1, c2, .....,
convert(varchar, t2.col1) AS col from table1 t1 inner join table2 t2 on t1.col2=t2.col1 AS col1,
....., c15 from table;
Here in above previous lots of columns before JOIN are there to fetch the data and mentioned as c1, c2, .... c15 are the column to fetching the values also lots of column are there after JOIN. But I want all these things into one SQL query. I stuck only on one JOINING two different tables as one column.

select all the columns as usual and at that time of JOINING write query like this-
select c1, c2, .....
convert(varchar, t2.col1) AS col,
...., c15 from table1
inner join on table2 on t1.col1 = t2.col2
The output you want merge into one.

select->insert->delete into one sql statement: possible?

I have 3 tables, T1 T2 and T3.
Each table has the same columns, except for T3 which has one additional "code" column.
My logic is the following:
-I have to search for any rows in T1 which are also contained in T2.
-For each found row I have to move it in T3, this would mean deleting it from T1 and create it into T3, with code 100.
I know that oracle allows for an insert...from select statement, in which case I have this:
insert into T3 (100,c1,c2,c3)
select c1,c2,c3 from T1 where exists (select null from T2 where
c1=T1.c1 and c2=T1.c2 and c3=T1.c3);
This solves the select/insert problem, but would it be possible to add a delete from T1 without having to repeat the select statement?

You are probably looking for MERGE
This statement is a convenient way to combine multiple operations. It
lets you avoid multiple INSERT, UPDATE, and DELETE DML statements.
Example
This example has all 3 DML operation (INSERT,UPDATE and DELETE)
MERGE INTO bonuses D
USING (SELECT employee_id, salary, department_id FROM employees
WHERE department_id = 80) S
ON (D.employee_id = S.employee_id)
WHEN MATCHED THEN UPDATE SET D.bonus = D.bonus + S.salary*.01
DELETE WHERE (S.salary > 8000)
WHEN NOT MATCHED THEN INSERT (D.employee_id, D.bonus)
VALUES (S.employee_id, S.salary*0.1)
WHERE (S.salary <= 8000);
Answer to OP
MERGE INTO tab3 D
USING (SELECT col1 FROM tab1 where col1 in(select col1 from tab2)) S
ON (D.col1 = S.col1)
WHEN NOT MATCHED THEN
INSERT (D.col1,D.code)
values(S.COL1,100);
DELETE tab1 WHERE(col1 in(select col1 from tab2) );
I have tested and working fine.

Perhaps useful to use cycle:
BEGIN
FOR rec IN
(SELECT c1, c2, c3 from T1 WHERE EXISTS (SELECT NULL FROM T2 WHERE
c1 = T1.c1 AND c2 = T1.c2 AND c3 = T1.c3))
LOOP
INSERT INTO T3 (100, rec.c1, rec.c2, rec.c3);
DELETE FROM T1 WHERE T1.c1 = rec.c1
and T1.c2 = rec.c2
AND T1.c3 = rec.c3;
END LOOP;
END;

SQL Server 2008 EXCEPT statement

Here is my example script:
SELECT c2, c3, c4 FROM Table1
EXCEPT
SELECT c2, c3, c4 FROM Table2
I'm successfully returning unique records from the left table that do not also exist in the right table. Both tables have identical schemas and for the most part identical data. The problem is that the unique id (let's call it column c1) does not match, so I need to exclude it in the EXCEPT query above. How can I return the same set of records, but with the unique IDs included?
I was thinking of using temporary tables, cursors and long WHERE statements inside the cursor, but that doesn't seem like a very elegant solution. is there another way to accomplish this seemingly simple task?

Can you take your supplied query, and simply inner join it with table 1 to get your 'c1' column?
SELECT T1.* FROM Table1 T1 INNER JOIN(
SELECT c2, c3, c4 FROM Table1
EXCEPT
SELECT c2, c3, c4 FROM Table2
) a on a.c2=T1.c2 and a.c3=T1.c3 and a.c4=T1.c4

Try this
SELECT A.c1, A.c2, A.c3, A.c4
FROM Table1 A
LEFT OUTER JOIN Table2 B ON A.c2 = B.C2 AND A.c3 = B.C3 AND A.c4 = B.C4
WHERE B.c1 IS NULL;

You probably can accomplish it using "NOT EXISTS" rather than "EXCEPT" since with "NOT EXISTS" you can specify conditions. Here's a thread that points this out: EXCEPT vs NOT EXISTS.

This is kind of ugly and, on large tables lacking "useful" indexes, might perform very poorly, but it will do the work:
SELECT t1.c1, t1.c2, t1.c3, t1.c4
from Table1 t1
inner join (-- Unique tuples
SELECT c2, c3, c4 FROM Table1
EXCEPT
SELECT c2, c3, c4 FROM Table2
) xx
on xx.c2 = t1.c2
and xx.c3 = t1.c3
and xx.c5 = t1.c4

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

De-duplicate data in postgresql based on few columns - sql

Use distinct on: select distinct on (c1, c2) t.* from t order by c1, c2, t1 desc; In a delete, you can use: delete from t where t.t1 < (select max(t2.t1) from t t2 where t2.c1 = t.c1 and t2.c2 = t.c2 );

Related

Is there a way of switching/changing the "ON" condition in a SQL query when the columns do not match?

Return differing rows in tsql

How to merge two queries having different output into one query while sequence should remain same

select->insert->delete into one sql statement: possible?

SQL Server 2008 EXCEPT statement

Categories

Resources