I am trying to delete duplicate records from netezza table. But few column contain null value so below code is not working.
DELETE FROM TABLE_NAME a
WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
FROM TABLE_NAME b
WHERE a.COL1 = b.COL1
AND a.COL2 = b.COL2
AND a.COL3 = b.COL3);
Sample Data:-
COL1 COL2 COL3
X NULL Y
A NULL B
X NULL Y
X NULL Y
E VAL F
Expected result:
COL1 COL2 COL3
X NULL Y
A NULL B
E VAL F
Note: COL2 column contain null value.
We have total 30 columns in this table and 6 columns contain null value for duplicate records.
Can anyone please help me on this issue.
DELETE FROM TABLE_NAME a
WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
FROM TABLE_NAME b
WHERE nvl(a.COL1,0) = nvl(b.COL1,0)
AND nvl(a.COL2,0) = nvl(b.COL2,0)
and nvl(a.COL3,0) = nvl(b.COL3,0));
Replace null value with 0 using NVL function
You can use the NVL function to translate nulls to something you can compare.
*Edit: you commented that NVL doesn't work. Alternatively, you can rewrite the query to explicitly handle NULL:
For instance:
DELETE FROM TABLE_NAME a
WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
FROM TABLE_NAME b
WHERE((a.COL1 = b.COL1) or (a.COL1 is null and b.COL1 is null))
AND ((a.COL2 = b.COL2) or (a.COL2 is null and b.COL2 is null))
AND ((a.COL3 = b.COL3) or (a.COL3 is null and b.COL3 is null));
Try using the /=/ operator instead of =
It usually works for me in these situations
For context, what are the distribution columns for the table, how many rows are in your table, and what percentage of those are you expecting to be duplicates? Depending on the scale a CTAS approach might be a better fit than a DELETE.
That being said, here's an approach that get's the delete logic right, but might not be the best performer.
TESTDB.ADMIN(ADMIN)=> select * from table_name;
COL1 | COL2 | COL3
------+------+------
X | | Y
X | | Y
E | VAL | F
A | | B
X | | Y
(5 rows)
delete
from
table_name
where rowid in
( select
rowid
from
( select
rowid,
row_number() over (
partition by col1,
col2 ,
col3
order by
col1) rn
from
table_name
) foo
where rn > 1
) ;
DELETE 2
TESTDB.ADMIN(ADMIN)=> select * from table_name;
COL1 | COL2 | COL3
------+------+------
A | | B
X | | Y
E | VAL | F
(3 rows)
Related
I have a table like
A | All,
B | X,
C | Y,
D | Z
so I have to create a view replacing all with rest of the values in column.
So my desired output will be
A | X,
A | Y,
A | Z,
B | X,
C | Y,
D | Z
Thanks in advance!
Here is one way
SELECT col1,
col2
FROM (SELECT DISTINCT col2
FROM Yourtable
WHERE col2 <> 'All') a
CROSS JOIN (SELECT col1
FROM Yourtable
WHERE col2 = 'All') b
UNION ALL
SELECT col1,
col2
FROM Yourtable
WHERE col2 <> 'All'
I have a scenerio where I need previous column value but it should not be same as current column value.
Table A:
+------+------+-------------+
| Col1 | Col2 | Lead_Col2 |
+------+------+-------------+
| 1 | A | NULL |
| 2 | B | A |
| 3 | B | A |
| 4 | C | B |
| 5 | C | B |
| 6 | C | B |
| 7 | D | C |
+------+------+-------------+
As Given above, I need previuos column(Col2) value. which is not same as current value.
Try:
select *
from (select col1,
col2,
lag(col2, 1) over(order by col1) as prev_col2
from table_a)
where col2 <> prev_col2
The name lead_col2 is misleading, because you really want a lag.
Here is a brute force method that uses a correlated subquery to get the index of the value and then joins the value in:
select aa.col1, aa.col2, aa.col2
from (select col1, col2,
(select max(col1) as maxcol1
from a a2
where a2.id < a.id and a2.col2 <> a.col2
) as prev_col1
from a
) aa left join
a
on aa.maxcol1 = a.col1
EDIT:
You can also use logic with lead() and ignore NULLs. If a value is the last in its sequence, then use that value, otherwise set it to NULL. Then use lag() with ignoreNULL`s:
select col1, col2,
lag(col3) over (order by col1 ignore nulls)
from (select col1, col2,
(case when col2 <> lead(col2) over (order by col1) then col2
end) as col3
from a
) a;
Try this:
select t.col1
,t.col2
,first_value(lag_col2) over (partition by col2 order by ord) lag_col2
from (select t.*
,case when lag_col2 = col2 then 1 else 0 end ord
from (select t.*
,lag (col2) over (order by col1) lag_col2
from table1 t
)t
)t
order by col1
SQL Fiddle
Please help me with the task below. I have table a with four columns
col1,col2,col3 and col4. I want to retrieve from these columns, removing nulls.
So, if my table has
col1 | col2 | col3 | col4
-----+------+------+-----
A | B | NULL| NULL
C | D | NULL| NULL
NULL | NULL | E | F
NULL | NULL | G | H
I want result to be
col1 | col2 | col3 | col4
-----+------+------+-----
A | B | E | F
C | D | G | H
Here is a solution. I have used the analytic ROW_NUMBER() to synthesize a key for joining the rows. The join is full outer in order to cater for unequal assignments of nulls and values.
with cte as (select * from t23)
, a as ( select col1, row_number() over (order by col1) as rn
from cte
where col1 is not null )
, b as ( select col2, row_number() over (order by col2) as rn
from cte
where col2 is not null )
, c as ( select col3, row_number() over (order by col3) as rn
from cte
where col3 is not null )
, d as ( select col4, row_number() over (order by col4) as rn
from cte
where col4 is not null )
select a.col1
, b.col2
, c.col3
, d.col4
from a
full outer join b
on a.rn = b.rn
full outer join c
on a.rn = c.rn
full outer join d
on a.rn = d.rn
/
The SQL Fiddle is for Oracle, but this solution will work for any flavour of database which supports a ranking analytic function. The common table expression is optional, it just makes the other sub-queries easier to write.
I'm new to SQL and I'm wondering how to pivot a table like:
Col1 Col2 Col3
1 a w
2 a x
1 b y
2 b z
Into
Col1 a b
1 w y
2 x z
I was playing with GROUP BY but I can't seem to be able to turn unique rows into columns
This can be done using an aggregate function with a CASE expression:
select col1,
max(case when col2 = 'a' then col3 end) a,
max(case when col2 = 'b' then col3 end) b
from yourtable
group by col1
See SQL Fiddle with Demo
If you are using an RDBMS with a PIVOT function (SQL Server 2005+ / Oracle 11g+), then your query would be similar to this (Note: Oracle syntax below):
select *
from
(
select col1, col2, col3
from yourtable
)
pivot
(
max(col3)
for col2 in ('a', 'b')
)
See SQL Fiddle with Demo
The last way that you can do this is by using multiple joins on the same table:
select t1.col1,
t1.col3 a,
t2.col3 b
from yourtable t1
left join yourtable t2
on t1.col1 = t2.col1
and t2.col2 = 'b'
where t1.col2 = 'a'
See SQL Fiddle with Demo
All give the result:
| COL1 | 'A' | 'B' |
--------------------
| 1 | w | y |
| 2 | x | z |
If you require that the distinct values in Col2 can change without forcing changes on your query definition, you may be looking for an OLAP structure like SQL Analysis Services.
You should try something like
select * from
(select Col1, Col2, Col3 from TableName)
pivot xml (max(Col3)
for Col2 in (any) )
I'm on a mobile so I can't test if it's working right now.
See the below table:
col1 col2
---- ----
1 | a
2 | b
3 | c
4 | a
5 | d
6 | b
7 | e
Now I want to show only the non-duplicate records. which means 3,5,7.
How to write a query to get the result?
SELECT col1, col2
FROM table
GROUP BY col2
HAVING COUNT(*) = 1;
SELECT B.*
FROM
(
SELECT col2
FROM YOURTABLE
GROUP BY col2
HAVING COUNT(*)=1
) A,
YOURTABLE B
WHERE A.col2 = B.col2
SELECT count(*) as cnt,col1, col2
FROM table
GROUP BY col2
HAVING cnt = 1;
Believe this is clear and correct enough:
SELECT *
FROM table
WHERE
col2 IN (SELECT col2 FROM table GROUP BY col2 HAVING COUNT(*) = 1)