using partition by clause in delete statement postgresql - sql

I am trying to debug the below code. It throws me an error saying ERROR: syntax error at or near "(" .
My aim to to delete duplicate records in the table
delete FROM (SELECT *,
ROW_NUMBER() OVER (partition BY snapshot,col1,col2,col3,col4,col5) AS rnum
FROM table where snapshot='2019-08-31') as t
WHERE t.rnum > 1;

try like below
DELETE FROM table a
WHERE a.ctid <> (SELECT min(b.ctid)
FROM table b
WHERE a.snapshot = b.snapshot
and a.col1=b.col1 and a.col2=b.col2);

Postgres does not allow deleting from subqueries. You can join in other tables. But in this case, I think a correlated subquery is sufficient, assuming you have a unique id of some sort:
delete from t
where snapshot = '2019-08-31' and
id > (select min(id)
from t t2
where t2.snapshot = t.snapshot and
t2.col1 = t.col1 and
t2.col2 = t.col2 and
t2.col3 = t.col3 and
t2.col4 = t.col4 and
t2.col5 = t.col5
);
Note: This also assumes that the columns are not NULL. You can replace = with is not distinct from if NULLs are a possibility.
If you have lots of duplicates and no identity column, you might find it simpler to remove and re-insert the data:
create table temp_snapshot as
select distinct on (col1, col2, col3, col4, col5) t.*
from t
where snapshot = '2019-08-31'
order by col1, col2, col3, col4, col5;
delete from t
where col1, col2, col3, col4, col5;
insert into t
select *
from temp_snapshot;
If your table is partitioned by snapshot (possibly a very good idea), then you can drop the partition instead and then add the data back in. That process is typically faster than deleting records.

Related

Get Rows from table where column one has same value and column 2 has a different value

I need to find all the rows where col2 has same value but col3 has a different value .From the table above , It should return Pk1,Pk3 and Pk4. I tried the following a self join but i see duplicate records .
SELECT T1.COL1,T1.COL2,T1.COl3
FROM Tab T1, Tab T2
WHERE T1.Col2=T2.Col1
AND T1.Col3 <> T2.Col3
;
I would use exists:
select t.*
from t
where exists (select 1 from t t2 where t2.col2 = t.col2 and t2.col3 <> t.col3);
Analytic functions are better for this kind of job - they avoid all joins. For example:
select col1, col2, col3
from (
select t.*,
case when min(col3) over (partition by col2) !=
max(col3) over (partition by col2) then 0 end as flag
from tab t
)
where flag = 0;
It is not entirely clear how you want to handle null in col3 - does that count as a "different" value? What if you have null more than once (for the same value in col2)? Also - what if col2 can be null?
Try this:
SELECT COL1,COL2,COL3 FROM
(SELECT COL1,COL2,COL3, COUNT(DISTINCT COL3) OVER (PARTITION BY COL2) CNT
FROM TEST)
WHERE CNT > 1
db<>fiddle demo
Cheers!!

How to find records with duplicate values for some specific columns only in oracle PL/SQL

I work in PL/SQL Developer with Oracle.
I have this simple SQL query below:
select
col1
col2,
col3,
col4,
col5
from table t1
(condition required)
and col1=X or col=X or...
and I want to select all different records having col2 and col3 with identical values.
For example:
Record 1: col2=5 col3=orange Record 2: col2=5 col3=orange Record 3:
col2=8 col3=apple Record 4: col2=8 col3=apple
Use analytic functions:
select t.*
from (select t.*, count(*) over (partition by col2, col3) as cnt
from t
) t
where cnt > 1
order by col2, col3;
select
t1.col1
t1.col2,
t1.col3,
t1.col4,
t1.col5
from table t1
join table t2 on t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.rowid <> t2.rowid
where ...
;
If you have a primary key column on the table, use that instead of rowid.

Updating records in a specific order

I am trying to update all records in my table. As I read through the records I need to update a column in the current record with a value from the NEXT record in the set. The catch is the updates need to be done in a specified order.
I was thinking of something like this ...
Update t1
Set col1 = (select LEAD(col2,1) OVER (ORDER BY col3, col4, col5)
from t1);
This doesn't compile but you see what I'm driving at ... any ideas ?
... update
This peice does run successfully but writes only NULLS
Update t1 A
Set t1.col1 = (select LEAD(col2,1) OVER (ORDER BY col3, col4, col5)
from t1 B
where A.col3 = B.col3 AND
A.col4 = B.col4 AND
A.col5 = B.col5);
This should do it:
merge into t1
using
(
select rowid as rid,
LEAD(col2,1) OVER (ORDER BY col3, col4, col5) as ld
from t1
) lv on ( lv.rid = t1.rowid )
when matched then
update set col1 = lv.ld;
Not 100% sure if I got the syntax completely right, but as you didn't supply any testdata, I'll leave potential syntax errors for you to fix.
You can also replace the usage of rowid with the real primary key columns of your table.
Why don't you use cursor? You can use update within a cursor with specified order.
You can do this using the with statement:
with toupdate as (
select t1.*,
lead(col2, 1) over (order by col3, col4, col5) as nextval
from t1
)
Update toupdate
Set col1 = nextval
By the way, this does not guarantee the ordering of the updates. However, col2 is not mentioned in the partitioning clause so it should do the right thing.
The above syntax works in SQL Server, but not Oracle. The original question did not specify the database (and lead is a valid function in SQL Server 2012). It seems the merge statement is the way to get the values in the subquery.

sql insert into table from select without duplicates (need more then a DISTINCT)

I am selecting multiple rows and inserting them into another table. I want to make sure that it doesn't already exists in the table I am inserting multiple rows into.
DISTINCT works when there are duplicate rows in the select, but not when comparing it to the data already in the table your inserting into.
If I Selected one row at a time I could do a IF EXIST but since its multiple rows (sometimes 10+) it doesn't seem like I can do that.
INSERT INTO target_table (col1, col2, col3)
SELECT DISTINCT st.col1, st.col2, st.col3
FROM source_table st
WHERE NOT EXISTS (SELECT 1
FROM target_table t2
WHERE t2.col1 = st.col1
AND t2.col2 = st.col2
AND t2.col3 = st.col3)
If the distinct should only be on certain columns (e.g. col1, col2) but you need to insert all column, you will probably need some derived table (ANSI SQL):
INSERT INTO target_table (col1, col2, col3)
SELECT st.col1, st.col2, st.col3
FROM (
SELECT col1,
col2,
col3,
row_number() over (partition by col1, col2 order by col1, col2) as rn
FROM source_table
) st
WHERE st.rn = 1
AND NOT EXISTS (SELECT 1
FROM target_table t2
WHERE t2.col1 = st.col1
AND t2.col2 = st.col2)
If you already have a unique index on whatever fields need to be unique in the destination table, you can just use INSERT IGNORE (here's the official documentation - the relevant bit is toward the end), and have MySQL throw away the duplicates for you.
Hope this helps!
So you're looking to retrieve all unique rows from source table which do not already exist in target table?
SELECT DISTINCT(*) FROM source
WHERE primaryKey NOT IN (SELECT primaryKey FROM target)
That's assuming you have a primary key which you can base the uniqueness on... otherwise, you'll have to check each column for uniqueness.
pseudo code for what might work
insert into <target_table> select col1 etc
from <source_table>
where <target_table>.keycol not in
(select source_table.keycol from source_table)
There are a few MSDN articles out there about this, but by far this one is the best:
http://msdn.microsoft.com/en-us/library/ms162773.aspx
They made it real easy to implement and my problem is now fixed. Also the GUI is ugly, but you actually can set minute intervals without using the command line in windows 2003.

Best way to update/insert into a table based on a remote table

I have two very large enterprise tables in an Oracle 10g database. One table keeps the historical information of the other table. The problem is, I'm getting to the point where the records are just too many that my insert update is taking too long and my session is getting killed by the governor.
Here's a pseudocode of my update process:
sqlsel := 'SELECT col1, col2, col3, col4 sysdate
FROM table2#remote_location dpi
WHERE (col1, col2, col3) IN
(
SELECT col1, col2, col3
FROM table2#remote_location
MINUS
SELECT DISTINCT col1, col2, col3
FROM table1 mpc
WHERE facility = '''||load_facility||'''
)';
EXECUTE IMMEDIATE sqlsel BULK COLLECT
INTO table1;
I've tried the MERGE statement:
MERGE INTO table1 t1
USING (
SELECT col1, col2, col3 FROM table2#remote_location
) t2
ON (
t1.col1 = t2.col1 AND
t1.col2 = t2.col2 AND
t1.col3 = t2.col3
)
WHEN NOT MATCHED THEN
INSERT (t1.col1, t1.col2, t1.col3, t1.update_dttm )
VALUES (t2.col1, t2.col2, t2.col3, sysdate )
But there seems to be a confirmed bug on versions prior to Oracle 10.2.0.4 on the merge statement when doing a merge using a remote database. The chance of getting an enterprise upgrade is slim so is there a way to further optimize my first query or write it in another way to have it run best performance wise?
Thanks.
Have you looked at Materialized Views to perform your sync? A pretty good into can be found at Ask Anantha. This Oracle white paper is good, too.
If there are duplicate col1/col2/col3 entries in table2#remote, then your query will return them. if they are not needed, then you could do a
SELECT col1, col2, col3, sysdate
FROM (
SELECT col1, col2, col3
FROM table2#remote_location
MINUS
SELECT col1, col2, col3
FROM table1 mpc
WHERE facility = '''||load_facility||'''
)
You can get rid of the DISTINCT too. MINUS is a set operation and so it is unnecessary.