Group by with having and merge back to original table - sql

I have a table like this
A_Count
B_Count
A
B
C
1
0
A
NULL
C1
0
1
NULL
B
C1
1
1
A
B
C2
1
1
A
B
C2
and I want to have a result table (only need to show column A and B) like:
A_Count
B_Count
A
B
C
1
1
A
B
C1
1
1
A
B
C2
1
1
A
B
C2
So my goal is to merge two row having the following condiction:
both rows belong to same group C and only merge when one row has A being null and one row has B being null.
so its like:
group by C
having sum(A_COUNT) =1 AND sum(B_COUNT) =1
but the problem is, I want to keep those rows that are not merged (ROW 3 & 4) , can someone tell me how to do that? many thanks!

You can use conditional analytical function and group by as follows:
Select max(a) as a, max(b) as b, c from
(Select a, b, c,
case when nulla = 1 and nullb = 1 and (a is null or b is null)
then 0
else row_number() over (partition by c order by 1)
end as rn
from (Select a, b, c,
count(case when a is null then 1 end) over(partition by c) as nulla,
count(case when b is null then 1 end) over (partition by c) as nullb
From your_table t
)
)
Group by c, rn
DB<>Fiddle Thanks to MT0. Used the sample data from MT0's fiddle.

If you were using Oracle 12 then you could use MATCH_RECOGNIZE:
SELECT a_count, b_count, a, b, c
FROM (
SELECT t.*,
NVL2(
A,
ROW_NUMBER() OVER ( PARTITION BY C ORDER BY NVL2( B, 1, 0 ) DESC, ROWNUM ),
ROW_NUMBER() OVER ( PARTITION BY C ORDER BY NVL2( A, 1, 0 ) DESC, ROWNUM )
) AS rn
FROM table_name t
)
MATCH_RECOGNIZE (
PARTITION BY C
ORDER BY rn, A NULLS LAST
MEASURES
FIRST( a_count ) AS a_count,
LAST( b_count ) AS b_count,
FIRST( a ) AS a,
LAST( b ) AS b
PATTERN ( a b? )
DEFINE
a AS a.a IS NOT NULL,
b AS a.b IS NULL AND b.a IS NULL AND b.b IS NOT NULL
)
Before that Oracle version, you can get a similar effect using analytic functions to determine which rows to aggregate:
SELECT SUM( a_count ) AS a_count,
SUM( b_count ) AS b_count,
MAX( a ) AS a,
MAX( b ) AS b,
c
FROM (
SELECT t.*,
NVL2(
A,
ROW_NUMBER() OVER ( PARTITION BY C ORDER BY NVL2( B, 1, 0 ) DESC, ROWNUM ),
ROW_NUMBER() OVER ( PARTITION BY C ORDER BY NVL2( A, 1, 0 ) DESC, ROWNUM )
) AS rn
FROM table_name t
)
GROUP BY c, rn
Which, for the sample data (in an unordered state, with additional rows to demonstrate grouping additional pairs of rows):
CREATE TABLE table_name ( A_Count, B_Count, A, B, C ) AS
SELECT 1, 0, 'A', NULL, 'C1' FROM DUAL UNION ALL
SELECT 0, 1, NULL, 'B', 'C1' FROM DUAL UNION ALL
SELECT 1, 1, 'A', 'B', 'C2' FROM DUAL UNION ALL
SELECT 0, 1, NULL, 'B', 'C2' FROM DUAL UNION ALL -- Added row
SELECT 1, 0, 'A', NULL, 'C2' FROM DUAL UNION ALL -- Added row
SELECT 1, 0, 'A', NULL, 'C2' FROM DUAL UNION ALL -- Added row
SELECT 1, 1, 'A', 'B', 'C2' FROM DUAL UNION ALL
SELECT 0, 1, NULL, 'B', 'C2' FROM DUAL -- Added row
Both output:
A_COUNT | B_COUNT | A | B | C
------: | ------: | :- | :- | :-
1 | 1 | A | B | C1
1 | 1 | A | B | C2
1 | 1 | A | B | C2
1 | 1 | A | B | C2
1 | 1 | A | B | C2
db<>fiddle here

You can do this with join:
select (t1.a_count + coalesce(t2.a_count, 0)) as a_count,
(t1.b_count + coalesce(t2.b_count, 0)) as b_count,
coalesce(t1.a, t2.a) as a,
coalesce(t1.b, t2.b) as b,
t1.c
from t t1 left join
t t2
on t1.c = t2.c and
t1.a is not null and t2.a is null and
t1.b is null and t2.b is not null
where t1.a is not null;
As you've described the problem, aggregation doesn't seem necessary.
Here is a db<>fiddle with your original data.

Related

Count cummulative distinct

I have the below table. Is it possible to do a cummulative distinct count? For example, if A1 has 3 distinct values, then the count for it will be 3. Afterwards, check for A1 and A2. If A1 and A2 together have 5 distinct values, 5. Repeat until A1 + A2 ... + An and count the distinct values.
A
V
A1
V1
A1
V2
A1
V2
A2
V1
A2
V2
A2
V3
My expected output would be:
A
C
A1
2
A2
3
This answers the original version of the question.
You can aggregate twice . . . once to keep the first occurrence of v and the second to aggregate again:
select a, count(*) as new_cs
from (select v, min(a) as a
from t
group by v
) v
group by a;
Note: The above only shows as that have new values. If you want all a, then window functions are a better approach:
select a, sum(case when seqnum = 1 then 1 else 0 end) as c
from (select t.*, row_number() over (partition by v order by a) as seqnum
from t
) t
group by a
order by a;
Here is a db<>fiddle.
You can use ROW_NUMBER() window function to find the 1st occurrence of each V and then COUNT() window function to count only these 1st occurrences:
SELECT DISTINCT A,
COUNT(CASE WHEN rn = 1 THEN 1 END) OVER (ORDER BY A) C
FROM (
SELECT A, ROW_NUMBER() OVER (PARTITION BY V ORDER BY A) rn
FROM tablename
) t
ORDER BY A
See the demo.
You can use a partitioned outer join to ensure that all V values are counted for all A values and then use the FIRST_VALUE analytic function to find whether a value exists in the current or preceding A values for the V:
SELECT a,
COUNT( DISTINCT fv ) AS c
FROM (
SELECT t.a,
FIRST_VALUE(t.v) IGNORE NULLS OVER (
PARTITION BY v.v
ORDER BY t.a
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS fv
FROM ( SELECT DISTINCT v FROM table_name ) v
LEFT OUTER JOIN table_name t
PARTITION BY ( t.a )
ON ( t.v = v.v )
)
GROUP BY a
ORDER BY a
Which, for the sample data:
CREATE TABLE table_name ( A, V ) AS
SELECT 'A1', 'V1' FROM DUAL UNION ALL
SELECT 'A1', 'V2' FROM DUAL UNION ALL
SELECT 'A1', 'V3' FROM DUAL UNION ALL
SELECT 'A2', 'V1' FROM DUAL UNION ALL
SELECT 'A2', 'V3' FROM DUAL UNION ALL
SELECT 'A2', 'V4' FROM DUAL UNION ALL
SELECT 'A3', 'V2' FROM DUAL UNION ALL
SELECT 'A3', 'V3' FROM DUAL UNION ALL
SELECT 'A4', 'V1' FROM DUAL UNION ALL
SELECT 'A4', 'V5' FROM DUAL;
Outputs:
A
C
A1
3
A2
4
A3
4
A4
5
db<>fiddle here

Get Distinct values without null

I have a table like this;
--Table_Name--
A | B | C
-----------------
A1 NULL NULL
A1 NULL NULL
A2 NULL NULL
NULL B1 NULL
NULL B2 NULL
NULL B3 NULL
NULL NULL C1
I want to get like this ;
--Table_Name--
A | B | C
-----------------
A1 B1 C1
A2 B2 NULL
NULL B3 NULL
How should I do that ?
Here's one option:
sample data is from line #1 - 9
the following CTEs (lines #11 - 13) fetch ranked distinct not null values from each column
the final query (line #15 onward) returns desired result by outer joining previous CTEs on ranked value
SQL> with test (a, b, c) as
2 (select 'A1', null, null from dual union all
3 select 'A1', null, null from dual union all
4 select 'A2', null, null from dual union all
5 select null, 'B1', null from dual union all
6 select null, 'B2', null from dual union all
7 select null, 'B3', null from dual union all
8 select null, null, 'C1' from dual
9 ),
10 --
11 ta as (select distinct a, dense_rank() over (order by a) rn from test where a is not null),
12 tb as (select distinct b, dense_rank() over (order by b) rn from test where b is not null),
13 tc as (select distinct c, dense_rank() over (order by c) rn from test where c is not null)
14 --
15 select ta.a, tb.b, tc.c
16 from ta full outer join tb on ta.rn = tb.rn
17 full outer join tc on ta.rn = tc.rn
18 order by a, b, c
19 /
A B C
-- -- --
A1 B1 C1
A2 B2
B3
SQL>
If you have only one value per column, then I think a simpler solution is to enumerate the values and aggregate:
select max(a) as a, max(b) as b, max(c) as c
from (select t.*,
dense_rank() over (partition by (case when a is null then 1 else 2 end),
(case when b is null then 1 else 2 end),
(case when c is null then 1 else 2 end)
order by a, b, c
) as seqnum
from t
) t
group by seqnum;
This only "aggregates" once and only uses one window function, so I think it should have better performance than handling each column individually.
Another approach is to use lateral joins which are available in Oracle 12C -- but this assumes that the types are compatible:
select max(case when which = 'a' then val end) as a,
max(case when which = 'b' then val end) as b,
max(case when which = 'c' then val end) as c
from (select which, val,
dense_rank() over (partition by which order by val) as seqnum
from t cross join lateral
(select 'a' as which, a as val from dual union all
select 'b', b from dual union all
select 'c', c from dual
) x
where val is not null
) t
group by seqnum;
The performance may be comparable, because the subquery removes so many rows.

order columns by their value

I've got a table A with 3 columns that contains the same data, for exemple:
TABLE A
KEY COL1 COL2 COL3
1 A B C
2 B C null
3 A null null
4 D E F
5 null C B
6 B C A
7 D E F
As a result I expect the distinct values of this table and the order doesn't matter. So key 1 and 6 are the same and 2 and 5 also and 4 and 7. The rest is different.
Ofcourse, I can't use a distinct in my select that will only filter 4 and 7.
I could use a very complex case statement, or a select in a select with an order by. But this needs to be used in a conversion, so performance is an issue here.
Does anyone have a good performant way to do this?
The result I expect
COL1 COL2 COL3
A B C
B C null
A null null
D E F
If you can have many columns then you can UNPIVOT then order the values and then PIVOT and take the DISTINCT rows:
Oracle Setup:
CREATE TABLE table_name ( KEY, COL1, COL2, COL3 ) AS
SELECT 1, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 2, 'B', 'C', null FROM DUAL UNION ALL
SELECT 3, 'A', null, null FROM DUAL UNION ALL
SELECT 4, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 5, null, 'C', 'B' FROM DUAL UNION ALL
SELECT 6, 'B', 'C', 'A' FROM DUAL UNION ALL
SELECT 7, 'D', 'E', 'F' FROM DUAL
Query:
SELECT DISTINCT
COL1, COL2, COL3
FROM (
SELECT key,
value,
ROW_NUMBER() OVER ( PARTITION BY key ORDER BY value ) AS rn
FROM table_name
UNPIVOT ( value FOR name IN ( COL1, COL2, COL3 ) ) u
)
PIVOT ( MAX( value ) FOR rn IN (
1 AS COL1,
2 AS COL2,
3 AS COL3
) )
Output:
COL1 | COL2 | COL3
:--- | :--- | :---
A | B | C
B | C | null
D | E | F
A | null | null
db<>fiddle here
The complicated case expression is going to have the best performance. But the simplest method is going to be conditional aggregation:
select key,
max(case when seqnum = 1 then col end) as col1,
max(case when seqnum = 2 then col end) as col2,
max(case when seqnum = 3 then col end) as col3
from (select key,col,
row_number() over (partition by key order by col asc) as seqnum
from ((select key, col1 as col from t) union all
(select key, col2 as col from t) union all
(select key, col3 as col from t)
) kc
where col is not null
) kc
group by key;

Getting all the values in one query that aren't in another with a group by

Given that I am using Redshift, how would I get the counts for a query that asks:
Given table A and table B, give me all the count of values in Table A for that grouping that aren't in table B;
So if table A and B look like:
Table A
Id | Value
==========
1 | "A"
1 | "B"
2 | "C"
And table B:
Id | Value
==========
1 | "A"
1 | "D"
2 | "C"
I would want:
Id | Count
==========
1 | 1
2 | 0
You can use left join and group by:
select a.id, sum( (b.id is null)::int )
from a left join
b
on a.id = b.id and a.value = b.value
group by a.id;
Use except and subquery
with a as
(
select 1 as id, 'A' as v
union all
select 1,'B'
union all
select 2,'C'
),b as
(
select 1 as id, 'A' as v
union all
select 1,'D'
union all
select 2,'C'
), c as
(
select id,v from a except select id,v from b
)
select id,sum ( (select count(*) from c where c.id=a.id and c.v=a.v))
from a group by id
output
id cnt
1 1
2 0
online demo which will work in redshift

Filter unique records from a database while removing double not-null values

This is kind of hard to explain in words but here is an example of what I am trying to do in SQL. I have a query which returns the following records:
ID Z
--- ---
1 A
1 <null>
2 B
2 E
3 D
4 <null>
4 F
5 <null>
I need to filter this query so that each unique record (based on ID) appears only once in the output and if there are multiple records for the same ID, the output should contain the record with the value of Z column being non-null. If there is only a single record for a given ID and it has value of null for column Z the output still should return that record. So the output from the above query should look like this:
ID Z
--- ---
1 A
2 B
2 E
3 D
4 F
5 <null>
How would you do this in SQL?
You can use GROUP BY for that:
SELECT
ID, MAX(Z) -- Could be MIN(Z)
FROM MyTable
GROUP BY ID
Aggregate functions ignore NULLs, returning them only when all values on the group are NULL.
If you need to return both 2-B and 2-E rows:
SELECT *
FROM YourTable t1
WHERE Z IS NOT NULL
OR NOT EXISTS
(SELECT * FROM YourTable t2
WHERE T2.ID = T1.id AND T2.z IS NOT NULL)
SELECT ID
,Z
FROM YourTable
WHERE Z IS NOT NULL
DECLARE #T TABLE ( ID INT, Z CHAR(1) )
INSERT INTO #T
( ID, Z )
VALUES ( 1, 'A' ),
( 1, NULL )
, ( 2, 'B' ) ,
( 2, 'E' ),
( 3, 'D' ) ,
( 4, NULL ),
( 4, 'F' ),
( 5, NULL )
SELECT *
FROM #T
; WITH c AS (SELECT ID, r=COUNT(*) FROM #T GROUP BY ID)
SELECT t.ID, Z
FROM #T t JOIN c ON t.ID = c.ID
WHERE c.r =1
UNION ALL
SELECT t.ID, Z
FROM #T t JOIN c ON t.ID = c.ID
WHERE c.r >=2
AND z IS NOT NULL
This example assumes you want two rows returned for ID = 2.
with tmp (id, cnt_val) as
(select id,
sum(case when z is not null then 1 else 0 end)
from t
group by id)
select t.id, t.z
from t
inner join tmp on t.id = tmp.id
where tmp.cnt_val > 0 and t.z is not null
or tmp.cnt_val = 0 and t.z is null
WITH CTE
AS (
SELECT id
,z
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY coalesce(z, '') DESC
) rn
FROM #T
)
SELECT id
,z
FROM CTE
WHERE rn = 1