Explaining window function frames - sql

Imagine, table with 2 columns - order no, value.
;with SourceTable as (
select *
from (values
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
) as T(OrderNo, Value)
)
select
*
,first_value(Value) over (
order by
case when Value is not null then 0 else 1 end
, OrderNo
rows between current row and unbounded following
) as X
from SourceTable
order by OrderNo
The issue is that it returns exactly same resultset as SourceTable. I don't understand why. E.g., if first row is processed (OrderNo = 1) I'd expect column X returns 5 because frame should include all rows (current row and unbound following) and it orders by Value - nonnulls first, then by OrderNo. So first row in frame should be OrderNo=2. Obviously it doesn't work like that but I don't get why.
Much appreciated if someone explains how is constructed the first frame.
Many thanks

Here is how I modified your query to investigte - I explicitly added the CASE as a column result, and then sorted the entire result set as your Window is ordered by:
;with SourceTable as (
select *
from (values
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
) as T(OrderNo, Value)
)
select
*
,case when Value is not null then 0 else 1 end AS CaseSort
,Value
,first_value(Value) over (
order by
case when Value is not null then 0 else 1 end
, OrderNo
rows between current row and unbounded following
) as X
from SourceTable
order by 3,OrderNo
Where you can see that "first_value" for the window matches the "Value" amount in each of the result rows.

Related

How can I delete trailing contiguous records in a partition with a particular value?

I'm using the latest version of SQL Server and have the following problem. Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0. Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted. Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay.
create table #x (_id int identity, _category int, _date date, _value int)
insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)
insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)
insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)
I see a few ways to do it. The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes. I'm trying to avoid T-SQL though if I can do it in a query.
To that end, I thought I could apply some gaps and islands trickery and do something with window functions. I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.
The desired result is the deletion of records 10, 11, 12 and 17.
Appreciate any help.
I'm not sure that your requirement requires a gaps and islands approach. Simple exists logic should work.
SELECT _id, _catrgory, _date, _value
FROM #x x1
WHERE _value <> 0 OR
EXISTS (
SELECT 1
FROM #x x2
WHERE x2._category = x1._category AND
x2._date > x1._date AND
x2._value <> 0
);
Assuming that all _values are greater than or equal to 0 you can use MAX() window function in an updatable CTE:
WITH cte AS (
SELECT *,
MAX(_value) OVER (
PARTITION BY _category
ORDER BY _date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) max
FROM #x
)
DELETE FROM cte
WHERE max = 0;
If there are negative _values use MAX(ABS(_value)) instead of MAX(_value).
See the demo.
Using common table expressions, you can use:
WITH CTE_NumberedRows AS (
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY _category ORDER BY _date)
FROM #x
),
CTE_Keepers AS (
SELECT _category, rnLastKeeper = MAX(rn)
FROM CTE_NumberedRows
WHERE _value <> 0
GROUP BY _category
)
DELETE NR
FROM CTE_NumberedRows NR
LEFT JOIN CTE_Keepers K
ON K._category = NR._category
WHERE NR.rn > ISNULL(K.rnLastKeeper, 0)
See this db<>fiddle for a working demo.
EDIT: My original post did not handle the all-zero's edge case. This has been corrected above, together with some naming tweaks. (The original can still be found here.
Tim Biegeleisen's post may be the simpler approach.

T-SQL Calculate the max value from an Alpha Numeric key

I have a customers table which has an Alphanumeric key consisting of 5 letters and 3 numbers.
I'm trying to calculate the next 3 digit number in sequence for each 5 letters for example:
Example Key
ALPHA001
ALPHA002
NUMBE001
NUMBE002
NUMBE003
PREST001
PREST002
PREST003
PREST004
PREST005
From the list of keys above i'd like to return the maximum of each unique 5 letter key. i.e.
Returned Values
ALPHA002
NUMBE003
PREST005
First of all: Do not store more than one value within one column. should store the key and the running number in separate columns and combine them just for display purpose...
Try this
DECLARE #mockupTable TABLE(ID INT IDENTITY,YourKey VARCHAR(100));
INSERT INTO #mockupTable VALUES
('ALPHA001')
,('ALPHA002')
,('NUMBE001')
,('NUMBE002')
,('NUMBE003')
,('PREST001')
,('PREST002')
,('PREST003')
,('PREST004')
,('PREST005');
WITH cte AS
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY LEFT(YourKey,5) ORDER BY CAST(RIGHT(YourKey,3) AS INT) DESC) AS PartitionedRowNumber
FROM #mockupTable
)
SELECT *
FROM cte
WHERE PartitionedRowNumber =1;
The result
ID Key
2 ALPHA002
5 NUMBE003
10 PREST005
You can use row_number():
select top (1) with ties t.*
from table t
order by row_number() over (partition by left(col, patindex('%[0-9]%', col)) order by col desc);
If the letters are fixed then just use left() :
order by row_number() over (partition by left(col, 5) order by col desc);
I'm trying to calculate the next 3 digit number in sequence for each 5
letters
This should do it:
SELECT CONCAT(LEFT(k, 5), FORMAT(MAX(RIGHT(k, 3)) + 1, '000'))
FROM (VALUES
('ALPHA001'),
('ALPHA002'),
('NUMBE001'),
('NUMBE002'),
('NUMBE003'),
('PREST001'),
('PREST002'),
('PREST003'),
('PREST004'),
('PREST005')
) tests(k)
GROUP BY LEFT(k, 5)
You can do this with GROUP BY and MAX:
SELECT KeyPrefix = LEFT(ExampleKey, 5),
NextKey = CONCAT(LEFT(ExampleKey, 5),
RIGHT(CONCAT('000', MAX(CONVERT(INT, RIGHT(ExampleKey, 3))) + 1), 3))
FROM (VALUES
('ALPHA001'), ('ALPHA002'), ('NUMBE001'), ('NUMBE002'), ('NUMBE003'),
('PREST001'), ('PREST002'), ('PREST003'), ('PREST004'), ('PREST005')
) t (ExampleKey)
GROUP BY LEFT(ExampleKey, 5);
The key operations being:
Get number part of key: RIGHT(ExampleKey, 3)
Convert this to an integer: CONVERT(INT, <output from 1>)
Find the max for the key type and add 1: MAX(<output from 2>) + 1
Pad this with zeros: RIGHT(CONCAT('000', MAX(<output from 3>), 3)
Concatenate withthe original prefix: CONCAT(LEFT(ExampleKey, 5), <output from 4>)
I would however highly recommed storing this in two columns, and use a computed column to combine then:
CREATE TABLE dbo.T
(
KeyPrefix CHAR(5) NOT NULL,
KeySequence INT NOT NULL,
TKey AS CONCAT(KeyPrefix, RIGHT(CONCAT('000', KeySequence), 3))
);
Then your query becomes much simpler:
SELECT KeyPrefix,
KeySequence = MAX(KeySequence) + 1,
TKey = CONCAT(KeyPrefix, RIGHT(CONCAT('000', MAX(KeySequence) + 1), 3))
FROM (VALUES
('ALPHA', 1), ('ALPHA', 2), ('NUMBE', 1), ('NUMBE', 2), ('NUMBE', 3),
('PREST', 1), ('PREST', 2), ('PREST', 3), ('PREST', 4), ('PREST', 5)
) t (KeyPrefix, KeySequence)
GROUP BY KeyPrefix;
Although worth noting that you would never actually need to reconstruct the key as I have done above in the column TKey, you just need the max keysequence.
Use this query.
GO
;WITH cte AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY LEFT(YourKey,patindex('%[0-9]%', YourKey)) ORDER BY CAST(RIGHT(YourKey,patindex('%[A-Z]%', YourKey)) AS INT) DESC) AS rr , YourKey FROM #mockupTable
)
SELECT YourKey FROM cte WHERE rr =1;
GO

How to Assign Numbers to a Set of Rows Partitioning Based on a Gap in Consecutive Numbers SQL

Hello Development Community,
Is there a way to assign a number to a group of rows partitioning based on a gap in a field of consecutive numbers in SQL? I've been searching/trying various things for an answer to this for a few days and have come up empty. Please consider the example:
CREATE TABLE #example
(ID int, Service_Date date, Item_Num int, Desired_Column int)
INSERT INTO #example VALUES
('1111', GetDate(), 4, 1),
('1111', GetDate(), 5, 1),
('1111', GetDate(), 7, 2),
('1111', GetDate(), 8, 2),
('1111', GetDate(), 9, 2),
('1111', GetDate(), 11, 3),
('1111', GetDate(), 12, 3),
('1111', GetDate(), 13, 3)
I am trying to assign the values in Desired_Column but am failing. A new number should be assigned each time there is a gap in consecutive Item_Num values. I've tried multiple approaches using DENSE_RANK(), PARTITION BY, NTILE(), finding the differenece between the first/next row item number, but I just can't get this working. Is this even possible?
Thanks for taking the time, it is appreciated.
This is a gaps & islands problem, a common solution applies nested Analytical Functions. First you calculate a flag based on a condition (here: there's a gap > 1 between the current and the previous row) and then you do a Cumulative Sum over that flag:
with cte as
(
select ...,
case when lag(Item_Num) over (partition by ID order by Item_Num) + 1
= Item_Num
then 0 -- gap = 1 -> part of the previous group
else 1 -- gap > 1 ->new group
end as flag
from #example
)
select ...,
sum(flag) over (partition by ID order by Item_Num)
from cte

How to get AVG in CASE with condition?

I have a table with integer values.
They could be negative, 0, positive and NULL.
I need treat NULL as 0, calculate average for a given date and if average value is less than 0 then put 0 there.
My query is the following:
select
Id,
ValueDate,
case
when avg(isnull(Value, 0)) > 0 then avg(isnull(Value, 0))
else 0
end AvgValue
from SomeTable
where ValueDate = #givenDate
group by Id, ValueDate
How to avoid double aggregate function definition in case statement (aggregate statement could be much more complex)?
I think the greatest function could help you:
select
Id,
ValueDate,
greatest(avg(isnull(Value, 0)),0) AvgValue
from SomeTable
where ValueDate = #givenDate
group by Id, ValueDate
This is a solution without creating implementation of any not build-in functions. I know your example will be more complex but this is just an idea:
CREATE TABLE DataSource
(
[ID] TINYINT
,[Value] INT
)
INSERT INTO DataSource ([ID], [Value])
VALUES (1, 2)
,(1, 0)
,(1, NULL)
,(1, 98)
,(1, NULL)
,(2, -4)
,(2, 0)
,(2, 0)
,(2, NULL)
SELECT [ID]
,MAX([Value])
FROM
(
SELECT [ID]
,AVG(COALESCE([Value],0))
FROM DataSource
GROUP BY [ID]
UNION ALL
SELECT DISTINCT [ID]
,0
FROM DataSource
) Data([ID],[Value])
GROUP BY [ID]
Here is the fiddle - http://sqlfiddle.com/#!6/3d223/14

How can get null column after UNPIVOT?

I have got the following query:
WITH data AS(
SELECT * FROM partstat WHERE id=4
)
SELECT id, AVG(Value) AS Average
FROM (
SELECT id,
AVG(column_1) as column_1,
AVG(column_2) as column_2,
AVG(column_3) as column_3
FROM data
GROUP BY id
) as pvt
UNPIVOT (Value FOR V IN (column_1,column_2,column_3)) AS u
GROUP BY id
if column_1,column_2 and column_3 (or one of this columns) have values then i get result as the following:
id, Average
4, 5.12631578947368
if column_1,column_2 and column_3 have NULL values then the query does not return any rows as the following:
id, Average
my question is how can i get as the following result if columns contents NULL values?
id, Average
4, NULL
Have you tried using COALESCE or ISNULL?
e.g.
ISNULL(AVG(column_1), 0) as column_1,
This does mean that you will get 0 as the result instead of 'NULL' though - do you need null when they are all NULL?
Edit:
Also, is there any need for an unpivot? Since you are specifying all 3 columns, why not just do:
SELECT BankID, (column_1 + column_2 + column_3) / 3 FROM partstat
WHERE bankid = 4
This gives you the same results but with the NULL
Of course this is assuming you have 1 row per bankid
Edit:
UNPIVOT isn't supposed to be used like this as far as I can see - I'd unpivot first then try the AVG... let me have a go...
Edit:
Ah I take that back, it is just a problem with NULLs - other posts suggest ISNULL or COALESCE to eliminate the nulls, you could use a placeholder value like -1 which could work e.g.
SELECT bankid, AVG(CASE WHEN value = -1 THEN NULL ELSE value END) AS Average
FROM (
SELECT bankid,
isnull(AVG(column_1), -1) as column_1 ,
AVG(Column_2) as column_2 ,
Avg(column_3) as column_3
FROM data
group by bankid
) as pvt
UNPIVOT (Value FOR o in (column_1, column_2, column_3)) as u
GROUP BY bankid
You need to ensure this will work though as if you have a value in column2/3 then column_1 will no longer = -1. It might be worth doing a case to see if they are all NULL in which case replacing the 1st null with -1
Here is an example without UNPIVOT:
DECLARE #partstat TABLE (id INT, column_1 DECIMAL(18, 2), column_2 DECIMAL(18, 2), column_3 DECIMAL(18, 2))
INSERT #partstat VALUES
(5, 12.3, 1, 2)
,(5, 2, 5, 5)
,(5, 2, 2, 2)
,(4, 2, 2, 2)
,(4, 4, 4, 4)
,(4, 21, NULL, NULL)
,(6, 1, NULL, NULL)
,(6, 1, NULL, NULL)
,(7, NULL, NULL, NULL)
,(7, NULL, NULL, NULL)
,(7, NULL, NULL, NULL)
,(7, NULL, NULL, NULL)
,(7, NULL, NULL, NULL)
;WITH data AS(
SELECT * FROM #partstat
)
SELECT
pvt.id,
(ISNULL(pvt.column_1, 0) + ISNULL(pvt.column_2, 0) + ISNULL(pvt.column_3, 0))/
NULLIF(
CASE WHEN pvt.column_1 IS NULL THEN 0 ELSE 1 END +
CASE WHEN pvt.column_2 IS NULL THEN 0 ELSE 1 END +
CASE WHEN pvt.column_3 IS NULL THEN 0 ELSE 1 END
, 0)
AS Average
FROM (
SELECT id,
AVG(column_1) as column_1,
AVG(column_2) as column_2,
AVG(column_3) as column_3
FROM data
GROUP BY id
) as pvt