delete duplicate rows has null - sql

Here is the data
Id Name Value col1 col2 col3
1 test1 1 null null null
2 test1 1 x null null
3 test1 1 x y null
4 test2 2 x y z
5 test2 2 x y null
Find duplicate based on "Name" and "Value" column and delete the one which has null values in more columns.
I managed to delete duplicates by following http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm#null but dint know what should be done to achieve this in SQL
Expected result
ID Name Value col1 Col2 Col3
3 test1 1 X y null
4 test2 2 x y z

Oracle Setup:
CREATE TABLE table_name ( Id, Name, Value, col1, col2, col3 ) AS
SELECT 1, 'test1', 1, null, null, null FROM DUAL UNION ALL
SELECT 2, 'test1', 1, 'x', null, null FROM DUAL UNION ALL
SELECT 3, 'test1', 1, 'x', 'y', null FROM DUAL UNION ALL
SELECT 4, 'test2', 2, 'x', 'y', 'z' FROM DUAL UNION ALL
SELECT 5, 'test2', 2, 'x', 'y', null FROM DUAL;
Query:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY name, value
ORDER BY DECODE( col1, NULL, 0, 1 )
+ DECODE( col2, NULL, 0, 1 )
+ DECODE( col3, NULL, 0, 1 ) DESC,
col1, col2, col3
) AS rn
FROM table_name t
)
WHERE rn = 1;
Output:
ID NAME VALUE C C C RN
---------- ----- ---------- - - - ----------
3 test1 1 x y 1
4 test2 2 x y z 1

Related

Connect by lead incremental values Oracle

I have this table
COL1 COL2
---------
A 1
B 5
C 12
D 14
And I would like to obtain this other one. This is, until the next col2 for each col1 is reached, a row with the COL1 and incremental values.
COL1 COL2
---------
A 1
A 2
A 3
A 4
B 5
B 6
B 7
B 8
B 9
B 10
B 11
C 12
C 13
D 14
EDIT: this is what I've tried so far. It seems I'm not far away from the solution but struggling to progress further than this.
WITH aux (
col1,
col2
) AS (
SELECT
'A',
1
FROM
dual
UNION ALL
SELECT
'B',
5
FROM
dual
UNION ALL
SELECT
'C',
12
FROM
dual
UNION ALL
SELECT
'D',
14
FROM
dual
), aux1 AS (
SELECT
a.*,
nvl(LEAD(a.col2) OVER(
ORDER BY
a.col2
), a.col2) h
FROM
aux a
)
SELECT
*
FROM
aux1
CONNECT BY level >= col2
AND level <= h;
testseq is the table containing your initial 4 rows. Use lead to find the stop value for col2 for each col1, and recursion to iterate and create the additional rows.
WITH xrows (col1, col2, lastcol2) AS (
SELECT t.*, LEAD(col2) OVER (ORDER BY col1) - 1
FROM testseq t
UNION ALL
SELECT col1, col2+1, lastcol2
FROM xrows t
WHERE col2 < lastcol2
)
SELECT col1, col2
FROM xrows
ORDER BY col1, col2
;
First you need to find the "next" number (whatever ordering you prefer) and then generate such number of rows with recursive subquery:
with a(code, num) as(
select 'A', 1 from dual union all
select 'B', 5 from dual union all
select 'C', 12 from dual union all
select 'D', 14 from dual
)
, b as (
select
a.*
, lead(num - 1, 1, num) over(order by code asc) as next_num
from a
)
select
b.code
, gen.val
from b
cross join lateral(
select num + level - 1 as val
from dual
connect by num + level - 1 <= next_num
) gen
order by 2 asc
Or if you prefer recursive CTE:
with a(code, num) as(
select 'A', 1 from dual union all
select 'B', 5 from dual union all
select 'C', 12 from dual union all
select 'D', 14 from dual
)
, b(code, next_num, val) as (
select
a.code
, lead(num - 1, 1, num) over(order by code asc) as next_num
, num
from a
union all
select
code
, next_num
, val + 1
from b
where val < next_num
)
select
b.code
, val
from b
order by 2 asc
CODE
VAL
A
1
A
2
A
3
A
4
B
5
B
6
B
7
B
8
B
9
B
10
B
11
C
12
C
13
D
14
livesql demo

order columns by their value

I've got a table A with 3 columns that contains the same data, for exemple:
TABLE A
KEY COL1 COL2 COL3
1 A B C
2 B C null
3 A null null
4 D E F
5 null C B
6 B C A
7 D E F
As a result I expect the distinct values of this table and the order doesn't matter. So key 1 and 6 are the same and 2 and 5 also and 4 and 7. The rest is different.
Ofcourse, I can't use a distinct in my select that will only filter 4 and 7.
I could use a very complex case statement, or a select in a select with an order by. But this needs to be used in a conversion, so performance is an issue here.
Does anyone have a good performant way to do this?
The result I expect
COL1 COL2 COL3
A B C
B C null
A null null
D E F
If you can have many columns then you can UNPIVOT then order the values and then PIVOT and take the DISTINCT rows:
Oracle Setup:
CREATE TABLE table_name ( KEY, COL1, COL2, COL3 ) AS
SELECT 1, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 2, 'B', 'C', null FROM DUAL UNION ALL
SELECT 3, 'A', null, null FROM DUAL UNION ALL
SELECT 4, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 5, null, 'C', 'B' FROM DUAL UNION ALL
SELECT 6, 'B', 'C', 'A' FROM DUAL UNION ALL
SELECT 7, 'D', 'E', 'F' FROM DUAL
Query:
SELECT DISTINCT
COL1, COL2, COL3
FROM (
SELECT key,
value,
ROW_NUMBER() OVER ( PARTITION BY key ORDER BY value ) AS rn
FROM table_name
UNPIVOT ( value FOR name IN ( COL1, COL2, COL3 ) ) u
)
PIVOT ( MAX( value ) FOR rn IN (
1 AS COL1,
2 AS COL2,
3 AS COL3
) )
Output:
COL1 | COL2 | COL3
:--- | :--- | :---
A | B | C
B | C | null
D | E | F
A | null | null
db<>fiddle here
The complicated case expression is going to have the best performance. But the simplest method is going to be conditional aggregation:
select key,
max(case when seqnum = 1 then col end) as col1,
max(case when seqnum = 2 then col end) as col2,
max(case when seqnum = 3 then col end) as col3
from (select key,col,
row_number() over (partition by key order by col asc) as seqnum
from ((select key, col1 as col from t) union all
(select key, col2 as col from t) union all
(select key, col3 as col from t)
) kc
where col is not null
) kc
group by key;

Aggregate multiple columns into an array only when the columns have non null value in Bigquery

I have a table that looks like this:
+----+------+------+------+------+------+
| id | col1 | col2 | col3 | col4 | col5 |
+----+------+------+------+------+------+
| a | 1 | null | null | null | null |
| b | 1 | 2 | 3 | 4 | null |
| c | 1 | 2 | 3 | 4 | 5 |
| d | 2 | 1 | 7 | null | 4 |
+----+------+------+------+------+------+
I want to create an aggregated table where for each id I want an array that contains non null value from all the other columns. The output should look like this:
+-----+-------------+
| id | agg_col |
+-----+-------------+
| a | [1] |
| b | [1,2,3,4] |
| c | [1,2,3,4,5] |
| d | [2,1,7,4] |
+-----+-------------+
Is it possible to produce the output using bigquery standard sql?
Below is not super generic solution, but works for your specific example that you provided - id is presented with alphanumeric (not starting with digit) and rest of columns are numbers - integers
#standardSQL
SELECT id,
ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
You can test, play with above using sample data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
with result as
Row id agg_col_as_array agg_col_as_string
1 a 1 [1]
2 b 1 [1,2,3,4]
2
3
4
3 c 1 [1,2,3,4,5]
2
3
4
5
4 d 2 [2,1,7,4]
1
7
4
Do you think it is possible to do this by mentioning specific columns and then binding them into an array?
Sure, it is doable - see below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(
SELECT col
FROM UNNEST([col1, col2, col3, col4, col5]) col
WHERE NOT col IS NULL
) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
ARRAY(
SELECT CAST(col AS STRING)
FROM UNNEST([col1, col2, col3, col4, col5]) col
WHERE NOT col IS NULL
), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
BUT ... this is not the best option you have as you need to manage and adjust number and names of columns in each case for different uses
Below solution is adjusted version of my original answer to address your latest comment - Actually the sample was too simple. Both of my id and other columns have alphanumeric and special characters.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(
SELECT col
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET
WHERE col != 'null' AND OFFSET > 0
) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
ARRAY(
SELECT col
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET
WHERE col != 'null' AND OFFSET > 0
), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
both with same result as before
Row id agg_col_as_array agg_col_as_string
1 a 1 [1]
2 b 1 [1,2,3,4]
2
3
4
3 c 1 [1,2,3,4,5]
2
3
4
5
4 d 2 [2,1,7,4]
1
7
4

Get Top N row from each set from table with 4 column in SQL Server

Assume I have a table with 4 columns:
Col1 Col2 Col3 Col4
My initial query is :
SELECT Col1, Col2, Col3, Col4
FROM myTable
ORDER BY Col1, Col2, Col3 DESC, Col4
My desired result is all 4 columns, but with this condition that Top N Col3 different row when Col1, Col2 is equal.
Example with N=2 :
Table sample data:
Col1 Col2 Col3 Col4
---------------------
1 a 2000 s
1 a 2002 c
1 a 2001 b
2 b 1998 s
2 b 2002 c
2 b 2000 b
3 c 2000 b
1 f 1998 n
1 g 1999 e
Desired result:
1 a 2002 c
1 a 2001 b
1 f 1998 n
1 g 1999 e
2 b 2002 c
2 b 2000 b
3 c 2000 b
In another description, when (col1, col2) is repeated in multiple records, just export top N rows of those records when order by Col3 descending.
Can I do this with SQL script, without hard coding?
declare #t table (Col1 int, Col2 char, Col3 int, Col4 char);
insert into #t values
(1, 'a', 2000, 's'),
(1, 'a', 2002, 'c'),
(1, 'a', 2001, 'b'),
(2, 'b', 1998, 's'),
(2, 'b', 2002, 'c'),
(2, 'b', 2000, 'b'),
(3, 'c', 2000, 'b'),
(1, 'f', 1998, 'n'),
(1, 'g', 1999, 'e');
declare #N int = 2; -- number per "top"
with cte as
(
select *,
row_number() over(partition by col1, col2 order by col3 desc) as rn
from #t
)
select *
from cte c
where rn <= #N;
I think below code was as expected
declare #tab table (Col1 int, Col2 char(1), Col3 int, Col4 char(1))
declare #N int
insert into #tab
select 1, 'a' , 2000, 's'
union all
select 1 , 'a' , 2002 , 'c'
union all
select 1 , 'a' , 2001 , 'b'
union all
select 2 , 'b' , 1998 , 's'
union all
select 2 , 'b' , 2002 ,'c'
union all
select 2 , 'b' , 2000 ,'b'
union all
select 3 , 'c' , 2000 ,'b'
union all
select 1 , 'f' , 1998 ,'n'
union all
select 1 , 'g' , 1999 ,'e'
;with tab as
(
select ROW_NUMBER() over(partition by t.col1,t.col2 order by t.col3 desc) as row,t.*
from #tab t
)
select Col1,Col2,Col3,Col4
from tab
where row < 3
output
Col1 Col2 Col3 Col4
1 a 2002 c
1 a 2001 b
1 f 1998 n
1 g 1999 e
2 b 2002 c
2 b 2000 b
3 c 2000 b
METHOD 1- FOR MSSQL
http://sqlfiddle.com/#!6/4bda39/6
with a as (
select ROW_NUMBER() over(partition by t.col1,t.col2 order by t.col3 desc) as row,t.*
from myTable as t)
select * from a where a.row <= 2
Replace a.row <= 2 (2 with your N)
METHOD 2- FOR MYSQL
http://sqlfiddle.com/#!9/79e81a/63
SELECT myTable.Col1, myTable.Col2, myTable.Col3, myTable.Col4
FROM (
Select Col1 as Col1, Col2 as Col2, count(Col1) as cc, AVG(Col3) as aa
From myTable
group by Col1, Col2) as tt
join myTable on myTable.Col1 = tt.Col1 and myTable.Col2 = tt.Col2
where myTable.Col3 >= tt.aa
Order by Col1 ,Col2 ,Col3 Desc,Col4
METHOD 3- FOR MYSQL
http://sqlfiddle.com/#!9/79e81a/79
SELECT * FROM (
SELECT CASE Col1
WHEN #Col1 THEN
CASE Col2
WHEN #Col2 THEN #curRow := #curRow + 1
ELSE #curRow := 1
END
ELSE #curRow :=1
END AS rank,
#Col1 := Col1 AS Col1,
#Col2 := Col2 AS Col2,
Col3, Col4
FROM myTable p
JOIN (SELECT #curRow := 0, #Col1 := 0, #Col2 := '') r
ORDER BY Col1, Col2, Col3 DESC) as tt
WHERE tt.rank <= 2
Replace tt.rank <= 2 replace 2 by your desired index

Sorting variables in a record

I have an Oracle table with multiple columns some populated with a variable, there a large number of possible variables, the example below is not exhaustive.
ID Col1 Col2 Col3
--------------------
1 A B
2 B A D
3 B C
4 C B
5 B B
6 E D
7 B A C
I need to create a query that resorts the variables in each row:
ID Col1 Col2 Col3
--------------------
1 A B
2 A B D
3 B C
4 B C
5 B B
6 D E
7 A B C
I am looking for an elegant solution as the real world problem has 20 columns with up to 40 different variables (up to four characters in length each) and several million records.
Below is a variant for Oracle 10g and later. Because there are a big number of rows in original dataset I'll tried to avoid solutions which involves grouping and analytic functions atop of a full result set.
Base table for this example:
create table tab1 (
ID number,
col1 varchar2(4),
col2 varchar2(4),
col3 varchar2(4),
col4 varchar2(4)
)
First, collect all columns for each ID to sorted collection:
select
tab1.ID,
cast( multiset(
select
decode(level,
1, tab1.col1,
2, tab1.col2,
3, tab1.col3,
4, tab1.col4,
null
)
from dual
connect by level <= 4
order by
decode(level,
1, tab1.col1,
2, tab1.col2,
3, tab1.col3,
4, tab1.col4,
null
)
nulls last
) as sys.ODCIVarchar2List) sorted_values
from tab1;
Having such a dataset it's possible to decode values back to columns while maintaining specified order:
select
ID,
(
select column_value
from table(data_list.sorted_values)
where rownum = 1
) as col1,
(
select max(decode(rownum, 2, column_value, null))
from table(data_list.sorted_values)
) as col2,
(
select max(decode(rownum, 3, column_value, null))
from table(data_list.sorted_values)
) as col3,
(
select max(decode(rownum, 4, column_value, null))
from table(data_list.sorted_values)
) as col4
from (
select
rownum, -- this needed as workaround for Oracle bug
tab1.ID,
cast( multiset(
select
decode(level,
1, tab1.col1,
2, tab1.col2,
3, tab1.col3,
4, tab1.col4,
null
)
from dual
connect by level <= 4
order by
decode(level,
1, tab1.col1,
2, tab1.col2,
3, tab1.col3,
4, tab1.col4,
null
)
nulls last
) as sys.ODCIVarchar2List) sorted_values
from tab1
)
data_list
SQLFiddle test
Note that rownum must be present in inner select clause as a workaround for this Oracle error.