How to remove null values and get row from values from table - sql

Please help me with the task below. I have table a with four columns
col1,col2,col3 and col4. I want to retrieve from these columns, removing nulls.
So, if my table has
col1 | col2 | col3 | col4
-----+------+------+-----
A | B | NULL| NULL
C | D | NULL| NULL
NULL | NULL | E | F
NULL | NULL | G | H
I want result to be
col1 | col2 | col3 | col4
-----+------+------+-----
A | B | E | F
C | D | G | H

Here is a solution. I have used the analytic ROW_NUMBER() to synthesize a key for joining the rows. The join is full outer in order to cater for unequal assignments of nulls and values.
with cte as (select * from t23)
, a as ( select col1, row_number() over (order by col1) as rn
from cte
where col1 is not null )
, b as ( select col2, row_number() over (order by col2) as rn
from cte
where col2 is not null )
, c as ( select col3, row_number() over (order by col3) as rn
from cte
where col3 is not null )
, d as ( select col4, row_number() over (order by col4) as rn
from cte
where col4 is not null )
select a.col1
, b.col2
, c.col3
, d.col4
from a
full outer join b
on a.rn = b.rn
full outer join c
on a.rn = c.rn
full outer join d
on a.rn = d.rn
/
The SQL Fiddle is for Oracle, but this solution will work for any flavour of database which supports a ranking analytic function. The common table expression is optional, it just makes the other sub-queries easier to write.

Related

Unable to delete duplicate data from Netezza table

I am trying to delete duplicate records from netezza table. But few column contain null value so below code is not working.
DELETE FROM TABLE_NAME a
WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
FROM TABLE_NAME b
WHERE a.COL1 = b.COL1
AND a.COL2 = b.COL2
AND a.COL3 = b.COL3);
Sample Data:-
COL1 COL2 COL3
X NULL Y
A NULL B
X NULL Y
X NULL Y
E VAL F
Expected result:
COL1 COL2 COL3
X NULL Y
A NULL B
E VAL F
Note: COL2 column contain null value.
We have total 30 columns in this table and 6 columns contain null value for duplicate records.
Can anyone please help me on this issue.
DELETE FROM TABLE_NAME a
WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
FROM TABLE_NAME b
WHERE nvl(a.COL1,0) = nvl(b.COL1,0)
AND nvl(a.COL2,0) = nvl(b.COL2,0)
and nvl(a.COL3,0) = nvl(b.COL3,0));
Replace null value with 0 using NVL function
You can use the NVL function to translate nulls to something you can compare.
*Edit: you commented that NVL doesn't work. Alternatively, you can rewrite the query to explicitly handle NULL:
For instance:
DELETE FROM TABLE_NAME a
WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
FROM TABLE_NAME b
WHERE((a.COL1 = b.COL1) or (a.COL1 is null and b.COL1 is null))
AND ((a.COL2 = b.COL2) or (a.COL2 is null and b.COL2 is null))
AND ((a.COL3 = b.COL3) or (a.COL3 is null and b.COL3 is null));
Try using the /=/ operator instead of =
It usually works for me in these situations
For context, what are the distribution columns for the table, how many rows are in your table, and what percentage of those are you expecting to be duplicates? Depending on the scale a CTAS approach might be a better fit than a DELETE.
That being said, here's an approach that get's the delete logic right, but might not be the best performer.
TESTDB.ADMIN(ADMIN)=> select * from table_name;
COL1 | COL2 | COL3
------+------+------
X | | Y
X | | Y
E | VAL | F
A | | B
X | | Y
(5 rows)
delete
from
table_name
where rowid in
( select
rowid
from
( select
rowid,
row_number() over (
partition by col1,
col2 ,
col3
order by
col1) rn
from
table_name
) foo
where rn > 1
) ;
DELETE 2
TESTDB.ADMIN(ADMIN)=> select * from table_name;
COL1 | COL2 | COL3
------+------+------
A | | B
X | | Y
E | VAL | F
(3 rows)

SQL - Left Join many-to-many only once

I have a two tables that are setup like the following examples
tablea
ID | Name
1 | val1
1 | val2
1 | val3
2 | other1
3 | other
tableb
ID | Amount
1 | $100
2 | $50
My desired output would be to left join tableb to tablea but only join tableb once on each value. ID is the only relationship
tablea.ID | tablea.Name | tableb.id | tableb.amount
1 | val1 | 1 | $100
1 | val2
1 | val3
2 | other1 | 2 | $50
3 | other
Microsoft SQL
You can do the following:
select ROW_NUMBER() OVER(ORDER BY RowID ASC) as RowNum, ID , Name
from tablea
which gives you :
RowNum | RowID | Name
1 | 1 | val1
2 |1 | val2
3 |1 | val3
4 |2 | other1
5 |3 | other
You then get the minimum row number for each RowID:
Select RowId, min(RowNum)
From (
select ROW_NUMBER() OVER(ORDER BY RowID ASC) as RowNum, ID , Name
from tablea )
Group By RowId
Once you have this you can then join tableb onto tablea only where the RowId is the minimum
WITH cteTableA As (
select ROW_NUMBER() OVER(ORDER BY RowID ASC) as RowNum, ID , Name
from tablea ),
cteTableAMin As (
Select RowId, min(RowNum) as RowNumMin
From cteTableA
Group By RowId
)
Select a.RowID, a.Name, b.Amount
From cteTableA a
Left join cteTableAMin amin on a.RowNum = amin.RowNumMin
and a.ID = amin.RowId
Left join tableb b on amin.ID = b.ID
This can be tidied up... but helps to show whats going on.
Then you MUST specify which row in tableA you wish to join to. If there are more than one row in the other table, How can the query processor know which one you want ?
If you want the one with the lowest value of name, then you might do this:
Select * from tableB b
join tableA a
on a.id = b.Id
and a.name =
(Select min(name) from tableA
where id = b.id)
but even that won't work if there multiple rows with the same values for both id AND name. What you might really need is a Primary Key on tableA.
Use:
select
a.id,
a.name,
b.amount
from
(select
id,
name,
row_number() over (partition by id order by name) as rn
from tablea) a
left join (
select
id,
amount,
row_number() over (partition by id order by amount) as rn
from tableb) b
on a.id = b.id
and a.rn = b.rn
order by a.id, a.name

Query to get previous value

I have a scenerio where I need previous column value but it should not be same as current column value.
Table A:
+------+------+-------------+
| Col1 | Col2 | Lead_Col2 |
+------+------+-------------+
| 1 | A | NULL |
| 2 | B | A |
| 3 | B | A |
| 4 | C | B |
| 5 | C | B |
| 6 | C | B |
| 7 | D | C |
+------+------+-------------+
As Given above, I need previuos column(Col2) value. which is not same as current value.
Try:
select *
from (select col1,
col2,
lag(col2, 1) over(order by col1) as prev_col2
from table_a)
where col2 <> prev_col2
The name lead_col2 is misleading, because you really want a lag.
Here is a brute force method that uses a correlated subquery to get the index of the value and then joins the value in:
select aa.col1, aa.col2, aa.col2
from (select col1, col2,
(select max(col1) as maxcol1
from a a2
where a2.id < a.id and a2.col2 <> a.col2
) as prev_col1
from a
) aa left join
a
on aa.maxcol1 = a.col1
EDIT:
You can also use logic with lead() and ignore NULLs. If a value is the last in its sequence, then use that value, otherwise set it to NULL. Then use lag() with ignoreNULL`s:
select col1, col2,
lag(col3) over (order by col1 ignore nulls)
from (select col1, col2,
(case when col2 <> lead(col2) over (order by col1) then col2
end) as col3
from a
) a;
Try this:
select t.col1
,t.col2
,first_value(lag_col2) over (partition by col2 order by ord) lag_col2
from (select t.*
,case when lag_col2 = col2 then 1 else 0 end ord
from (select t.*
,lag (col2) over (order by col1) lag_col2
from table1 t
)t
)t
order by col1
SQL Fiddle

PIVOT entire column set by group

For a table like:
COL1 COL2 COL3 COL4
item1 7/29/13 cat blue
item3 7/29/13 fish purple
item1 7/30/13 rat green
item2 7/30/13 bat grey
item3 7/30/13 bird orange
How would you PIVOT to get rows by COL2, all other columns repeated across as blocks by COL1 values?
COL2 COL1 COL3 COL4 COL1 COL3 COL4 COL1 COL3 COL4
7/29/13 item1 cat blue item2 NULL NULL item3 fish purple
7/30/13 item1 rat green item2 bat grey item3 bird orange
In order to get this result you will need to do a few things:
get a distinct list of values from col1 and col2
unpivot the data in your columns col1, col3 and col4
pivot the result from the unpivot
To get the distinct list of dates and items (col1 and col2) along with the values from your existing table you will need to use something similar to the following:
select t.col1, t.col2,
t2.col3, t2.col4,
row_number() over(partition by t.col2
order by t.col1) seq
from
(
select distinct t.col1, c.col2
from yourtable t
cross join
(
select distinct col2
from yourtable
) c
) t
left join yourtable t2
on t.col1 = t2.col1
and t.col2 = t2.col2;
See SQL Fiddle with Demo. Once you have this list, then you will need to unpivot the data. There are several ways you can do this, using the UNPIVOT function or using CROSS APPLY:
select d.col2,
col = col+'_'+cast(seq as varchar(10)),
value
from
(
select t.col1, t.col2,
t2.col3, t2.col4,
row_number() over(partition by t.col2
order by t.col1) seq
from
(
select distinct t.col1, c.col2
from yourtable t
cross join
(
select distinct col2
from yourtable
) c
) t
left join yourtable t2
on t.col1 = t2.col1
and t.col2 = t2.col2
) d
cross apply
(
select 'col1', col1 union all
select 'col3', col3 union all
select 'col4', col4
) c (col, value);
See SQL Fiddle with Demo. this will give you data that looks like:
| COL2 | COL | VALUE |
-------------------------------------------------
| July, 29 2013 00:00:00+0000 | col1_1 | item1 |
| July, 29 2013 00:00:00+0000 | col3_1 | cat |
| July, 29 2013 00:00:00+0000 | col4_1 | blue |
| July, 29 2013 00:00:00+0000 | col1_2 | item2 |
| July, 29 2013 00:00:00+0000 | col3_2 | (null) |
| July, 29 2013 00:00:00+0000 | col4_2 | (null) |
Finally, you will apply the PIVOT function to the items in the col columns:
select col2,
col1_1, col3_1, col4_1,
col1_2, col3_2, col4_2,
col1_3, col3_3, col4_3
from
(
select d.col2,
col = col+'_'+cast(seq as varchar(10)),
value
from
(
select t.col1, t.col2,
t2.col3, t2.col4,
row_number() over(partition by t.col2
order by t.col1) seq
from
(
select distinct t.col1, c.col2
from yourtable t
cross join
(
select distinct col2
from yourtable
) c
) t
left join yourtable t2
on t.col1 = t2.col1
and t.col2 = t2.col2
) d
cross apply
(
select 'col1', col1 union all
select 'col3', col3 union all
select 'col4', col4
) c (col, value)
) src
pivot
(
max(value)
for col in (col1_1, col3_1, col4_1,
col1_2, col3_2, col4_2,
col1_3, col3_3, col4_3)
)piv;
See SQL Fiddle with Demo. If you have an unknown number of values, then you can use dynamic SQL to get the result:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(col+'_'+cast(seq as varchar(10)))
from
(
select row_number() over(partition by col2
order by col1) seq
from yourtable
) t
cross apply
(
select 'col1', 1 union all
select 'col3', 2 union all
select 'col4', 3
) c (col, so)
group by col, seq, so
order by seq, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT col2, ' + #cols + '
from
(
select d.col2,
col = col+''_''+cast(seq as varchar(10)),
value
from
(
select t.col1, t.col2,
t2.col3, t2.col4,
row_number() over(partition by t.col2
order by t.col1) seq
from
(
select distinct t.col1, c.col2
from yourtable t
cross join
(
select distinct col2
from yourtable
) c
) t
left join yourtable t2
on t.col1 = t2.col1
and t.col2 = t2.col2
) d
cross apply
(
select ''col1'', col1 union all
select ''col3'', col3 union all
select ''col4'', col4
) c (col, value)
) x
pivot
(
max(value)
for col in (' + #cols + ')
) p '
execute sp_executesql #query;
See SQL Fiddle with Demo. All versions will give a result:
| COL2 | COL1_1 | COL3_1 | COL4_1 | COL1_2 | COL3_2 | COL4_2 | COL1_3 | COL3_3 | COL4_3 |
----------------------------------------------------------------------------------------------------------------
| July, 29 2013 00:00:00+0000 | item1 | cat | blue | item2 | (null) | (null) | item3 | fish | purple |
| July, 30 2013 00:00:00+0000 | item1 | rat | green | item2 | bat | grey | item3 | bird | orange |
The dynamic UNPIVOT+PIVOT method is always cool, when doing this sort of thing for a known and limited set of values subsequent JOIN's work nicely too (being lazy on the SELECT list):
WITH cte AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY COL2 ORDER BY COL1)'RowRank'
FROM #Table1)
SELECT *
FROM cte a
LEFT JOIN cte b
ON a.COL2 = b.COL2
AND a.RowRank = b.RowRank - 1
LEFT JOIN cte c
ON b.COL2 = c.COL2
AND b.RowRank = c.RowRank - 1
WHERE a.RowRank = 1
Or if the order of the fields is to be maintained:
WITH cte AS (SELECT a.*,b.RowRank
FROM #Table1 a
JOIN (SELECT Col1,ROW_NUMBER() OVER (ORDER BY Col1)'RowRank'
FROM #Table1
GROUP BY COL1) b
ON a.Col1 = b.Col1)
SELECT *
FROM cte a
LEFT JOIN cte b
ON a.COL2 = b.COL2
AND a.RowRank = b.RowRank - 1
LEFT JOIN cte c
ON a.COL2 = c.COL2
AND a.RowRank = c.RowRank - 2
WHERE a.RowRank = 1
But this falls apart without an 'anchor' value, ie if no record had item1 for a given date it wouldn't be included.

How to eliminate and show only non-duplicate records

See the below table:
col1 col2
---- ----
1 | a
2 | b
3 | c
4 | a
5 | d
6 | b
7 | e
Now I want to show only the non-duplicate records. which means 3,5,7.
How to write a query to get the result?
SELECT col1, col2
FROM table
GROUP BY col2
HAVING COUNT(*) = 1;
SELECT B.*
FROM
(
SELECT col2
FROM YOURTABLE
GROUP BY col2
HAVING COUNT(*)=1
) A,
YOURTABLE B
WHERE A.col2 = B.col2
SELECT count(*) as cnt,col1, col2
FROM table
GROUP BY col2
HAVING cnt = 1;
Believe this is clear and correct enough:
SELECT *
FROM table
WHERE
col2 IN (SELECT col2 FROM table GROUP BY col2 HAVING COUNT(*) = 1)