TSQL OVER clause: COUNT(*) OVER (ORDER BY a)

TSQL OVER clause: COUNT(*) OVER (ORDER BY a) - sql

This is my code:
USE [tempdb];
GO
IF OBJECT_ID(N'dbo.t') IS NOT NULL
BEGIN
DROP TABLE dbo.t
END
GO
CREATE TABLE dbo.t
(
a NVARCHAR(8),
b NVARCHAR(8)
);
GO
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('e', NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
GO
SELECT a, b,
COUNT(*) OVER (ORDER BY a)
FROM t;
On this page of BOL, Microsoft says that:
If PARTITION BY is not specified, the function treats all rows of the
query result set as a single group.
So based on my understanding, the last SELECT statement will give me the following result. Since all records are considered as in one single group, right?
a b
-------- -------- -----------
NULL NULL 12
NULL NULL 12
NULL NULL 12
NULL NULL 12
a b 12
a b 12
a b 12
c d 12
c d 12
c d 12
c d 12
e NULL 12
But the actual result is:
a b
-------- -------- -----------
NULL NULL 4
NULL NULL 4
NULL NULL 4
NULL NULL 4
a b 7
a b 7
a b 7
c d 11
c d 11
c d 11
c d 11
e NULL 12
Anyone can help to explain why? Thanks.

It gives a running total (this functionality was not implemented in SQL Server until version 2012.)
The ORDER BY defines the window to be aggregated with UNBOUNDED PRECEDING and CURRENT ROW as the default when not specified. SQL Server defaults to the less well performing RANGE option rather than ROWS.
They have different semantics in the case of ties in that the window for the RANGE version includes not just the current row (and preceding rows) but also any additional tied rows with the same value of a as the current row. This can be seen in the number of rows counted by each in the results below.
SELECT a,
b,
COUNT(*) OVER (ORDER BY a
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [Rows],
COUNT(*) OVER (ORDER BY a
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [Range],
COUNT(*) OVER() AS [Over()]
FROM t;
Returns
a b Rows Range Over()
-------- -------- ----------- ----------- -----------
NULL NULL 1 4 12
NULL NULL 2 4 12
NULL NULL 3 4 12
NULL NULL 4 4 12
a b 5 7 12
a b 6 7 12
a b 7 7 12
c d 8 11 12
c d 9 11 12
c d 10 11 12
c d 11 11 12
e NULL 12 12 12
To achieve the result that you were expecting to get omit both the PARTITION BY and ORDER BY and use an empty OVER() clause (also shown above).

If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as the default for window frame
So what does that mean, let's focus on "UNBOUNDED PRECEDING AND CURRENT ROW". This gives a running total from the starting row to the current row.
But in case if you want to have an overall count then you can also specify
"UNBOUNDED PRECEDING AND UNBOUNDED Following"
This considers entire data set and Over() is just a shortcut of this
select a,b,
count(*) over(order by a) as [count],
COUNT(*) OVER (ORDER BY a
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [Range],
COUNT(*) OVER (ORDER BY a
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [Rows],
COUNT(*) OVER (ORDER BY a
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED Following) AS [Range_Unbounded_following],
COUNT(*) OVER (ORDER BY a
ROWs BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED Following) AS [Row_Unbounded_following]
,COUNT(*) OVER () AS [Plain_over]
from t
order by [count]
Result is
a b count Range Rows Range_Unbounded_following Row_Unbounded_following Plain_over
-------- -------- ----------- ----------- ----------- ------------------------- ----------------------- -----------
NULL NULL 4 4 1 12 12 12
NULL NULL 4 4 2 12 12 12
NULL NULL 4 4 3 12 12 12
NULL NULL 4 4 4 12 12 12
a b 7 7 5 12 12 12
a b 7 7 6 12 12 12
a b 7 7 7 12 12 12
c d 11 11 8 12 12 12
c d 11 11 9 12 12 12
c d 11 11 10 12 12 12
c d 11 11 11 12 12 12
e NULL 12 12 12 12 12 12

Related

SQL Gap Fill by ID

I am attempting to gap fill in two scenarios. I can do it with one group but am uncertain with multiple
the data:
Order ID Amount
1 NULL NULL
2 A 500
3 NULL NULL
4 A 700
1 B 1000
2 NULL NULL
3 NULL NULL
4 B 1500
Target Result
Order ID Amount
1 A 500
2 A 500
3 A 700
4 A 700
1 B 1000
2 B 1500
3 B 1500
4 B 1500

Consider below approach
select * except(amount),
first_value(amount ignore nulls) over win as amount
from (select distinct `order` from your_table where not `order` is null),
(select distinct id from your_table where not id is null)
left join your_table using(`order`, id)
window win as (partition by id order by `order` rows between current row and unbounded following)
if applied to sample data in your question - output is

Distance from maximum value for each distance

I would like to calculate the distance to maximum value for each possible distance. As an example:
Row Distance Value
1 1 2 --> 1 (Distance from Row 1)
2 2 3 --> 2 (Distance from Row 2)
3 3 3 --> 2 (Distance from Row 2)
4 4 1 --> 2 (Distance from Row 2)
5 5 5 --> 5 (Distance from Row 5)
6 6 1 --> 5 (Distance from Row 5)
Explanation: Row 6 has value of 5 because the first occurrence of maximum value between rows 1 through 6 was at distance 5.
I have tried to use some windows functions but cannot figure out how to put it together.
Sample data:
--drop table tmp_maxval;
create table tmp_maxval (dst number, val number);
insert into tmp_maxval values(1, 3);
insert into tmp_maxval values(2, 2);
insert into tmp_maxval values(3, 1);
insert into tmp_maxval values(4, 2);
insert into tmp_maxval values(5, 4);
insert into tmp_maxval values(6, 2);
insert into tmp_maxval values(7, 2);
insert into tmp_maxval values(8, 5);
insert into tmp_maxval values(9, 5);
insert into tmp_maxval values(10,1);
commit;
Functions I think can be useful in solving this:
select t.*,
max(val) over(order by dst),
case when val >= max(val) over(order by dst) then 1 else 0 end ,
case when row_number() over(partition by val order by dst) = 1 then 1 else 0 end as first_occurence
from
ap_risk.tmp_maxval t

select dst, val,
max(case when flag is null then dst end) over (order by dst)
as first_occurrence
from (
select dst, val,
case when val <= max(val) over (order by dst
rows between unbounded preceding and 1 preceding)
then 1 end as flag
from tmp_maxval
)
order by dst
;
DST VAL FIRST_OCCURRENCE
---------- ---------- ----------------
1 3 1
2 2 1
3 1 1
4 2 1
5 4 5
6 2 5
7 2 5
8 5 8
9 5 8
10 1 8
Or, if you are on Oracle version 12.1 or higher, MATCH_RECOGNIZE can do quick work of this assignment:
select dst, val, first_occurrence
from tmp_maxval t
match_recognize(
order by dst
measures a.dst as first_occurrence
all rows per match
pattern (a x*)
define x as val <= a.val
)
order by dst
;

You can get the maximum value using a cumulative max:
select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv;
I think this answers your question. If you want the distance itself:
select mv.*,
min(case when max_value = value then distance end) over (order by distance) as first_distance_at_max_value
from (select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv
) mv;

You could use either max() or min() combined with case when:
select t.*,
min(case when val = mv then dst end) over (partition by mv order by dst) v1,
max(case when val = mv then dst end) over (partition by mv order by dst) v2
from (select t.*, max(val) over (order by dst) mv from tmp_maxval t) t
order by dst
Result:
DST VAL MV V1 V2
---------- ---------- ---------- ---------- ----------
1 3 3 1 1
2 2 3 1 1
3 1 3 1 1
4 2 3 1 1
5 4 4 5 5
6 2 4 5 5
7 2 4 5 5
8 5 5 8 8
9 5 5 8 9
10 1 5 8 9
Explained logic and words first occurence suggest that you need min(), but third row in your example suggest max() ;-) In data which you provided you can observe difference in rows 9-10. Choose what you want.

skip consecutive rows after specific value

Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception:
skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare #req_data table (
id int primary key identity,
session_id int,
value int
)
insert into #req_data(session_id, value) values (5, 7)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (5, 1) -- ignore this one too
insert into #req_data(session_id, value) values (5, 12)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (6, 7)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (6, 3)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (7, 7)
insert into #req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into #req_data(session_id, value) values (8, 2)
insert into #req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from #req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?

The following should work well for this.
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
Results...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
Edit: How and why it works...
The basic premise is taken from Itzik Ben-Gan's "The Last non NULL Puzzle".
Essentially, we are relying 2 different behaviors that most people don't usually think about...
1) NULL + anything = NULL.
2) You can CAST or CONVERT an INT into a fixed length BINARY data type and it will continue to sort as an INT (as opposed to sorting like a text string).
This is easier to see when the intermittent steps are added to the query in the CTE...
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
Results...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
Looking at the BinVal column, we see an 8 byte hex value for all non-[value] = 1 rows and NULLS where [value] = 1... The 1st 4 bytes are the Id (used for ordering) and the 2nd 4 bytes are [value] (used to set the "previous non-1 value" or set the whole thing to NULL.
The 2nd step is to "smear" the non-NULL values into the NULLs using the window framed MAX function, partitioned by session_id and ordered by id.
The 3rd step is to parse out the last 4 bytes and convert them back to an INT data type (SecondHalfAsINT) and deal with any nulls that result from not having any non-1 preceding value (ControlValue).
Since we can't reference a windowed function in the WHERE clause, we have to throw the query into a CTE (a derived table would work just as well) so that we can use the new ControlValue in the where clause.

SELECT CRow.id
FROM #req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM #req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN #req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7

You can use the following query:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
to get:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
So, this query detects islands of consecutive 1 records that belong to the same group, as specified by the first preceding row with value <> 1.
You can use a window function once more to detect all 7 islands. If you wrap this in a second cte, then you can finally get the desired result by filtering out all 7 islands:
;with session_islands as (
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
), islands_with_7 as (
select id, grp, value,
count(case when value = 7 then 1 end)
over (partition by session_id, grp) as cnt_7
from session_islands
)
select id
from islands_with_7
where cnt_7 = 0 and value = 1

SQL to get next not null value in column

How can I get next not null value in column? I have MSSQL 2012 and table with only one column. Like this:
rownum Orig
------ ----
1 NULL
2 NULL
3 9
4 NULL
5 7
6 4
7 NULL
8 9
and I need this data:
Rownum Orig New
------ ---- ----
1 NULL 9
2 NULL 9
3 9 9
4 NULL 7
5 7 7
6 4 4
7 NULL 5
8 9 5
Code to start:
declare #t table (rownum int, orig int);
insert into #t values (1,NULL),(2,NULL),(3,9),(4,NULL),(5,7),(6,4),(7,NULL),(8,9);
select rownum, orig from #t;

One method is to use outer apply:
select t.*, t2.orig as newval
from #t t outer apply
(select top 1 t2.*
from #t t2
where t2.id >= t.id and t2.orig is not null
order by t2.id
) t2;
One way you can do this with window functions (in SQL Server 2012+) is to use a cumulative max on id, in inverse order:
select t.*, max(orig) over (partition by nextid) as newval
from (select t.*,
min(case when orig is not null then id end) over (order by id desc) as nextid
from #t
) t;
The subquery gets the value of the next non-NULL id. The outer query then spreads the orig value over all the rows with the same id (remember, in a group of rows with the same nextid, only one will have a non-NULL value for orig).

Sequencing and re-setting in SQL Server 2008

I am actually new to SQL server 2008, and I am trying to sequence and re-set a number in a table. The source is something like:
Row Refrec FLAG
1 5 NULL
2 4 X
3 3 NULL
4 2 NULL
5 1 Y
6 5 A
7 4 B
8 3 NULL
9 2 NULL
10 1 NULL
The result should look like:
Row Refrec FLAG SEQUENCE
1 5 NULL NULL
2 4 X 0
3 3 NULL 1
4 2 NULL 2
5 1 Y 0
6 5 A 0
7 4 B 0
8 3 NULL 1
9 2 NULL 2
10 1 NULL 3
Thanks!

It looks like you want to enumerate the sequence values for NULL values, setting all the other values to 0. I'm not sure why the first value is NULL, but that is easily fixed.
The following may do what you want:
select t.*,
(case when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);
If you really care about the first value:
select t.*,
(case when row = 1 then NULL
when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TSQL OVER clause: COUNT(*) OVER (ORDER BY a) - sql

Related

SQL Gap Fill by ID

Distance from maximum value for each distance

skip consecutive rows after specific value

SQL to get next not null value in column

Sequencing and re-setting in SQL Server 2008

Categories

Resources