Count groups of NULL values - partition or window? - sql

n | g
---------
1 | 1
2 | NULL
3 | 1
4 | 1
5 | 1
6 | 1
7 | NULL
8 | NULL
9 | NULL
10 | 1
11 | 1
12 | 1
13 | 1
14 | 1
15 | 1
16 | 1
17 | NULL
18 | 1
19 | 1
20 | 1
21 | NULL
22 | 1
23 | 1
24 | 1
25 | 1
26 | NULL
27 | NULL
28 | 1
29 | 1
30 | NULL
31 | 1
From the above column g I should get this result:
x|y
---
1|4
2|1
3|1
where
x stands for the count of contiguous NULLs and
y stands for the times a single group of NULLs occurs.
I.e., there is ...
4 groups of only 1 NULL,
1 group of 2 NULLs and
1 group of 3 NULLs

Compute a running count of not-null values with a window function to form groups, then 2 two nested counts ...
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) FILTER (WHERE g IS NULL) AS x
FROM (
SELECT g, count(g) OVER (ORDER BY n) AS grp
FROM tbl
) sub1
WHERE g IS NULL
GROUP BY grp
) sub2
GROUP BY 1
ORDER BY 1;
count() only counts not null values.
This includes the preceding row with a not null g in the following group (grp) of NULL values - which has to be removed from the count.
I replaced the HAVING clause I had for that in my initial query with WHERE g IS NULL, like #klin uses in his answer), that's simpler.
Related:
Find ā€œnā€ consecutive free numbers from table
Select longest continuous sequence
If n is a gapless sequence of integer numbers, you can simplify further:
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) AS x
FROM (
SELECT n - row_number() OVER (ORDER BY n) AS grp
FROM tbl
WHERE g IS NULL
) sub1
GROUP BY 1
) sub2
GROUP BY 1
ORDER BY 1;
Eliminate not null values immediately and deduct the row number from n, thereby arriving at (meaningless) group numbers directly ...
While the only possible value in g is 1, sum() is a smart trick (like #klin provided). But that should be a boolean column then, wouldn't make sense as numeric type. So I assume that's just a simplification of the actual problem in the question.

select x, count(x) y
from (
select s, count(s) x
from (
select *, sum(g) over (order by i) as s
from example
) s
where g isnull
group by 1
) s
group by 1
order by 1;
Test it here.

Related

Sql assign unique key to groups having particular pattern

Hi I was trying to group data based on a particular pattern.
I have a table with two column as below,
Name rollingsum
A 5
A 10
A 0
A 5
A 0
B 6
B 0
I need to generate a key column that increment only after rollingsum equals 0 is encountered.As given below
Name rollingsum key
A 5 1
A 10 1
A 0 1
A 5 2
A 0 2
B 6 3
B 0 3
I am using postgres, I tried to increment variable in case statement as below
Declare a int;
a:=1;
........etc
Case when rolling sum =0 then a:=a+1 else a end as key
But I am getting an error near :
Thanks in advance for all help
You need an ordering columns because the results depend on the ordering of the rows -- and SQL tables represent unordered sets.
Then do a cumulative sum of the 0 counts from the end of the data. That is in reverse order, so subtract that from the total:
select t.*,
(1 + sum( (rolling_sum = 0)::int ) over () -
sum( (rolling_sum = 0)::int ) over (order by ordercol desc)
) as key
from t;
Assuming that you have a column called id to order the rows, here is one option using a cumulative count and a window frame:
select name, rollingsum,
1 + count(*) filter(where rollingsum = 0) over(
order by id
rows between unbounded preceding and 1 preceding
) as key
from mytable
Demo on DB Fiddle:
name | rollingsum | key
:--- | ---------: | --:
A | 5 | 1
A | 10 | 1
A | 0 | 1
A | 5 | 2
A | 0 | 2
B | 6 | 3
B | 0 | 3

get the nth-lowest value in a `group by` clause

Here's a tough one: I have data coming back in a temporary table foo in this form:
id n v
-- - -
1 3 1
1 3 10
1 3 100
1 3 201
1 3 300
2 1 13
2 1 21
2 1 300
4 2 1
4 2 7
4 2 19
4 2 21
4 2 300
8 1 11
Grouping by id, I need to get the row with the nth-lowest value for v based on the value in n. For example, for the group with an ID of 1, I need to get the row which has v equal to 100, since 100 is the third-lowest value for v.
Here's what the final results need to look like:
id n v
-- - -
1 3 100
2 1 13
4 2 7
8 1 11
Some notes about the data:
the number of rows for each ID may vary
n will always be the same for every row with a given ID
n for a given ID will never be greater than the number of rows with that ID
the data will already be sorted by id, then v
Bonus points if you can do it in generic SQL instead of oracle-specific stuff, but that's not a requirement (I suspect that rownum may factor prominently in any solutions). It has in my attempts, but I wind up confusing myself before I get a working solution.
I would use row_number function make row number the compare with n column value in CTE, do another CTE to make row number order by v desc.
get rn = 1 which is mean max value in the n number group.
CREATE TABLE foo(
id int,
n int,
v int
);
insert into foo values (1,3,1);
insert into foo values (1,3,10);
insert into foo values (1,3,100);
insert into foo values (1,3,201);
insert into foo values (1,3,300);
insert into foo values (2,1,13);
insert into foo values (2,1,21);
insert into foo values (2,1,300);
insert into foo values (4,2,1);
insert into foo values (4,2,7);
insert into foo values (4,2,19);
insert into foo values (4,2,21);
insert into foo values (4,2,300);
insert into foo values (8,1,11);
Query 1:
with cte as(
select id,n,v
from
(
select t.*, row_number() over(partition by id ,n order by n) as rn
from foo t
) t1
where rn <= n
), maxcte as (
select id,n,v, row_number() over(partition by id ,n order by v desc) rn
from cte
)
select id,n,v
from maxcte
where rn = 1
Results:
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
use window function
select * from
(
select t.*, row_number() over(partition by id ,n order by v) as rn
from foo t
) t1
where t1.rn=t1.n
as ops sample output just need 3rd highest value so i put where condition t1.rn=3 though accodring to description it would be t1.rn=t1.n
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=65abf8d4101d2d1802c1a05ed82c9064
If your database is version 12.1 or higher then there is a much simpler solution:
SELECT DISTINCT ID, n, NTH_VALUE(v,n) OVER (PARTITION BY ID) AS v
FROM foo
ORDER BY ID;
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
Depending on your real data you may have to add an ORDER BY n clause and/or windowing_clause as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, see NTH_VALUE

Unable to use lag function correctly in sql

I have created a table from multiple tables like this:
Week | Cid | CustId | L1
10 | 1 | 1 | 2
10 | 2 | 1 | 2
10 | 5 | 1 | 2
10 | 4 | 1 | 1
10 | 3 | 2 | 1
4 | 6 | 1 | 2
4 | 7 | 1 | 2
I want the output as:
Repeat
0
1
1
0
0
0
1
So, basically what I want is for each week, if a person (custid) comes in again with the same L1, then the value in the column Repeat should become 1, otherwise 0 ( so like, here, in row 2 & 3, custid 1, came with L1=2 again, so it will get 1 in column "Repeat", however in row 4, custid 1 came with L1=1, so it will get value as ).
By the way, the table isn't ordered (as I've shown).
I'm trying to do it as follows:
select t.*,
lag(0, 1, 0) over (partition by week, custid, L1 order by cid) as repeat
from
table;
But this is not giving the output and is giving empty result.
I think you need a case, but I would use row_number() for this:
select t.*,
(case when row_number() over (partition by week, custid, l1 order by cid) = 1
then 0 else 1
end) as repeat
from table;
This can also be computed without Window functions but by a self-join in the following way:
SELECT a.week, a.cid, a.custid, a.l1,
CASE WHEN b IS NULL THEN 1 ELSE 0 END AS repeat
FROM mytable a NATURAL LEFT JOIN
(SELECT week, min(cid) AS cid, custid, l1 FROM mytable
GROUP BY week,custid,l1) b
ORDER BY week DESC, custid, l1 DESC, cid;
It can be done simply by using an count(*) as analytic function. No case expression or self join needed. The query is even portable across databases that support analytic functions:
SELECT cust.*, least(count(*)
OVER (PARTITION BY Week, CustId, L1 ORDER BY Cid
ROWS UNBOUNDED PRECEDING) - 1, 1) repeat
FROM cust ORDER BY Week DESC, custId, L1 DESC;
Executing the query on your data results in the following output (last row is the repeat row):
Week | Cid | CustId | L1 | repeat
10 1 1 2 0
10 2 1 2 1
10 5 1 2 1
10 4 1 1 0
10 3 2 1 0
4 6 1 2 0
4 7 1 2 1
Tested on Oracle 11g and PostgreSQL 9.4. Note that the second ORDER BY is optional. See Oracle Language Reference, Analytic Functions for more details.

Find out the sequence number of a pattern

Consider below table
Table
ActivityId Flag Type
---------- ----- -----
1 N
2 N
3 Y EXT
4 Y
5 Y
6 N
7 Y INT
8 Y
9 N
10 N
11 N
12 Y EXT
13 N
14 N
15 N
16 Y EXT
17 Y
18 Y INT
19 Y
20 Y EXT
21 Y
22 N
23 N
First record has always Flag = N and then any sequence of Flag = Y or Flag = N can exist for next records. Every time flag changes from N to Y, the Type field is either EXT or INT. Next Y records (before next N) might have Type = EXT or INT or NULL and this is not important.
I want to calculate Cycle No for this sequence of N/Y. First cycle starts when Flag = N (always first record has flag = N) and cycle ends when flag changes to Y and Type = EXT. Then, next cycle starts when flag changes to N and ends when flag becomes Y and type = EXT. this is repeated until all records are processed. The result for above table is:
Result
ActivityId Flag Type Cycle No
---------- ----- ----- --------
1 N 1
2 N 1
3 Y EXT 1
4 Y
5 Y
6 N 2
7 Y INT 2
8 Y 2
9 N 2
10 N 2
11 N 2
12 Y EXT 2
13 N 3
14 N 3
15 N 3
16 Y EXT 3
17 Y
18 Y INT
19 Y
20 Y EXT
21 Y
22 N 4
23 N 4
I am using SQL Server 2008 R2 (no LAG/LEAD).
Can you please help me find the SQL query to calculate Cycle No?
I've got a solution, its not pretty, but through stepwise refinement I get to your result:
The solution is in three steps
Isolate the activityid for all the possible start cycle and end
cycle rows.
Filter out all the dud start events
Number the cycles, and find the interval of activityid for each
cycle
First i select out all start and end cycle events:
with tab as
(select * from (values
(1,'N',''),(2,'N',''),(3,'Y','EXT'),(4,'Y','')
,(5,'Y',''),(6,'N',''),(7,'Y','INT'),(8,'Y','')
,(9,'N',''),(10,'N',''),(11,'N',''),(12,'Y','EXT')
,(13,'N',''),(14,'N',''),(15,'N',''),(16,'Y','EXT')
,(17,'Y',''),(18,'Y','INT'),(19,'Y',''),(20,'Y','EXT')
,(21,'Y',''),(22,'N',''),(23,'N','')) a(ActivityId,Flag,[Type]))
,CTE1 as
( select
ROW_NUMBER() over (order by t1.ActivityId) rn
,t1.ActivityId
,case when t1.Flag='N' then 'Start' else 'End' end Cycle
from tab t1
where t1.Flag='N' or (t1.Flag='Y' and t1.[Type]='Ext')
)
select * from cte1
This returns
rn ActivityId Cycle
1 1 Start
2 2 Start
3 3 End
4 6 Start
5 9 Start
6 10 Start
7 11 Start
8 12 End
9 13 Start
10 14 Start
11 15 Start
12 16 End
13 20 End
14 22 Start
15 23 Start
The problem is now that while we are sure of when the cycle ends, that is when Flag is N and Type is Ext, we are not sure when the cycle starts. Row 1 and 2 both denote a possible start event. But luckily we can see that only a start event following an End event is to be counted. Since we do not have lag or lead we must join the CTE with itself:
,CTE2 as
(
select ROW_NUMBER() over (order by a1.activityid) rn
,a1.ActivityId
,a1.Cycle
,a2.Cycle PrevCycle
from CTE1 a1 left join CTE1 a2 on a1.rn=a2.rn+1
where
a2.Cycle is null -- First Cycle
or
( a2.Cycle is not null
and
(
(a1.Cycle='End' and a2.Cycle='Start') -- End of cycle
or
(a1.Cycle='Start'
and a2.Cycle='End') -- next cycles
)
)
)
select * from cte2
This returns
rn ActivityId Cycle PrevCycle
1 1 Start NULL
2 3 End Start
3 6 Start End
4 12 End Start
5 13 Start End
6 16 End Start
7 22 Start End
I Select the first start event - since we always start with one, and then keep the END events that follow a start event.
Finally we only keep the rest of the start events if the previous event was an End event.
Now we can find the start and end of each cycle, and number them:
,cte3 as
(
select ROW_NUMBER() over (order by b1.ActivityId) CycleNumber
,b1.ActivityId StartId,b2.ActivityId EndId
from cte2 b1 left join cte2 b2
on b1.rn=b2.rn-1
where b1.Cycle='Start'
)
select * from cte3
Which gives us what we need:
CycleNumber StartId EndId
1 1 3
2 6 12
3 13 16
4 22 NULL
Now we just need to join this back on our table:
select
a.ActivityId,a.Flag,a.[Type],CycleNumber
from tab a
left join cte3 b on a.ActivityId between b.StartId and isnull(b.EndId,a.ActivityId)
This gives the result you were looking for.
This is just a quick and dirty solution, perhaps with a little TLC you can pretty it up, and reduce the number of steps.
The full solution is here:
with tab as
(select * from (values
(1,'N',''),(2,'N',''),(3,'Y','EXT'),(4,'Y','')
,(5,'Y',''),(6,'N',''),(7,'Y','INT'),(8,'Y','')
,(9,'N',''),(10,'N',''),(11,'N',''),(12,'Y','EXT')
,(13,'N',''),(14,'N',''),(15,'N',''),(16,'Y','EXT')
,(17,'Y',''),(18,'Y','INT'),(19,'Y',''),(20,'Y','EXT')
,(21,'Y',''),(22,'N',''),(23,'N','')) a(ActivityId,Flag,[Type]))
,CTE1 as
( select
ROW_NUMBER() over (order by t1.ActivityId) rn
,t1.ActivityId
,case when t1.Flag='N' then 'Start' else 'End' end Cycle
from tab t1
where t1.Flag='N' or (t1.Flag='Y' and t1.[Type]='Ext')
)
,CTE2 as
(
select ROW_NUMBER() over (order by a1.activityid) rn
,a1.ActivityId
,a1.Cycle
,a2.Cycle PrevCycle
from CTE1 a1 left join CTE1 a2 on a1.rn=a2.rn+1
where
a2.Cycle is null -- First Cycle
or
( a2.Cycle is not null
and
(
a1.Cycle='End' -- End of cycle
or
(a1.Cycle='Start'
and a2.Cycle='End') -- next cycles
)
)
)
,cte3 as
(
select ROW_NUMBER() over (order by b1.ActivityId) CycleNumber
,b1.ActivityId StartId,b2.ActivityId EndId
from cte2 b1 left join cte2 b2
on b1.rn=b2.rn-1
where b1.Cycle='Start'
)
select
a.ActivityId,a.Flag,a.[Type],CycleNumber
from tab a
left join cte3 b on a.ActivityId between b.StartId and isnull(b.EndId,a.ActivityId)
If you are happy with recursion, this can be achieved rather simply with a bit of comparison logic to the preceding row when ordered by your ActivityId:
declare #t table(ActivityId int,Flag nvarchar(1),TypeValue nvarchar(3));
insert into #t values(1 ,'N',null),(2 ,'N',null),(3 ,'Y','EXT'),(4 ,'Y',null),(5 ,'Y',null),(6 ,'N',null),(7 ,'Y','INT'),(8 ,'Y',null),(9 ,'N',null),(10,'N',null),(11,'N',null),(12,'Y','EXT'),(13,'N',null),(14,'N',null),(15,'N',null),(16,'Y','EXT'),(17,'Y',null),(18,'Y','INT'),(19,'Y',null),(20,'Y','EXT'),(21,'Y',null),(22,'N',null),(23,'N',null);
with rn as -- Derived table purely to guarantee incremental row number. If you can guarantee your ActivityId values are incremental start to finish, this isn't required.
( select row_number() over (order by ActivityId) as rn
,ActivityId
,Flag
,TypeValue
from #t
),d as
( select rn -- Recursive CTE that compares the current row to the one previous.
,ActivityId
,Flag
,TypeValue
,cast(1 as decimal(10,5)) as CycleNo
from rn
where rn = 1
union all
select rn.rn
,rn.ActivityId
,rn.Flag
,rn.TypeValue
,cast(
case when d.Flag = 'Y' and d.TypeValue = 'EXT' and d.CycleNo >= 1
then case when rn.Flag = 'N'
then d.CycleNo + 1
else (d.CycleNo + 1) * 0.0001 -- This part keeps track of the cycle number in fractional values, which can be removed by converting the final result to INT.
end
else case when rn.Flag = 'N' and d.CycleNo < 1
then d.CycleNo * 10000
else d.CycleNo
end
end
as decimal(10,5)) as CycleNo
from rn
inner join d
on d.rn = rn.rn - 1
)
select ActivityId
,Flag
,TypeValue
,cast(CycleNo as int) as CycleNo
from d
order by ActivityId;
Output:
+------------+------+-----------+---------+
| ActivityId | Flag | TypeValue | CycleNo |
+------------+------+-----------+---------+
| 1 | N | NULL | 1 |
| 2 | N | NULL | 1 |
| 3 | Y | EXT | 1 |
| 4 | Y | NULL | 0 |
| 5 | Y | NULL | 0 |
| 6 | N | NULL | 2 |
| 7 | Y | INT | 2 |
| 8 | Y | NULL | 2 |
| 9 | N | NULL | 2 |
| 10 | N | NULL | 2 |
| 11 | N | NULL | 2 |
| 12 | Y | EXT | 2 |
| 13 | N | NULL | 3 |
| 14 | N | NULL | 3 |
| 15 | N | NULL | 3 |
| 16 | Y | EXT | 3 |
| 17 | Y | NULL | 0 |
| 18 | Y | INT | 0 |
| 19 | Y | NULL | 0 |
| 20 | Y | EXT | 0 |
| 21 | Y | NULL | 0 |
| 22 | N | NULL | 4 |
| 23 | N | NULL | 4 |
+------------+------+-----------+---------+

Count rows in each 'partition' of a table

Disclaimer: I don't mean partition in the window function sense, nor table partitioning; I mean it in the more general sense, i.e. to divide up.
Here's a table:
id | y
----+------------
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | null
7 | 2
8 | 2
9 | null
10 | null
I'd like to partition by checking equality on y, such that I end up with counts of the number of times each value of y appears contiguously, when sorted on id (i.e. in the order shown).
Here's the output I'm looking for:
y | count
-----+----------
1 | 3
2 | 2
null | 1
2 | 2
null | 2
So reading down the rows in that output we have:
The first partition of three 1's
The first partition of two 2's
The first partition of a null
The second partition of two 2's
The second partition of two nulls
Try:
SELECT y, count(*)
FROM (
SELECT y,
sum( xyz ) OVER (
ORDER BY id
rows between unbounded preceding
and current row
) qwe
FROM (
SELECT *,
case
when y is null and
lag(y) OVER ( ORDER BY id ) is null
then 0
when y = lag(y) OVER ( ORDER BY id )
then 0
else 1 end xyz
FROM table1
) alias
) alias
GROUP BY qwe, y
ORDER BY qwe;
demo: http://sqlfiddle.com/#!15/b1794/12