I am attempting to gap fill in two scenarios. I can do it with one group but am uncertain with multiple
the data:
Order ID Amount
1 NULL NULL
2 A 500
3 NULL NULL
4 A 700
1 B 1000
2 NULL NULL
3 NULL NULL
4 B 1500
Target Result
Order ID Amount
1 A 500
2 A 500
3 A 700
4 A 700
1 B 1000
2 B 1500
3 B 1500
4 B 1500
Consider below approach
select * except(amount),
first_value(amount ignore nulls) over win as amount
from (select distinct `order` from your_table where not `order` is null),
(select distinct id from your_table where not id is null)
left join your_table using(`order`, id)
window win as (partition by id order by `order` rows between current row and unbounded following)
if applied to sample data in your question - output is
I would like to calculate the distance to maximum value for each possible distance. As an example:
Row Distance Value
1 1 2 --> 1 (Distance from Row 1)
2 2 3 --> 2 (Distance from Row 2)
3 3 3 --> 2 (Distance from Row 2)
4 4 1 --> 2 (Distance from Row 2)
5 5 5 --> 5 (Distance from Row 5)
6 6 1 --> 5 (Distance from Row 5)
Explanation: Row 6 has value of 5 because the first occurrence of maximum value between rows 1 through 6 was at distance 5.
I have tried to use some windows functions but cannot figure out how to put it together.
Sample data:
--drop table tmp_maxval;
create table tmp_maxval (dst number, val number);
insert into tmp_maxval values(1, 3);
insert into tmp_maxval values(2, 2);
insert into tmp_maxval values(3, 1);
insert into tmp_maxval values(4, 2);
insert into tmp_maxval values(5, 4);
insert into tmp_maxval values(6, 2);
insert into tmp_maxval values(7, 2);
insert into tmp_maxval values(8, 5);
insert into tmp_maxval values(9, 5);
insert into tmp_maxval values(10,1);
commit;
Functions I think can be useful in solving this:
select t.*,
max(val) over(order by dst),
case when val >= max(val) over(order by dst) then 1 else 0 end ,
case when row_number() over(partition by val order by dst) = 1 then 1 else 0 end as first_occurence
from
ap_risk.tmp_maxval t
select dst, val,
max(case when flag is null then dst end) over (order by dst)
as first_occurrence
from (
select dst, val,
case when val <= max(val) over (order by dst
rows between unbounded preceding and 1 preceding)
then 1 end as flag
from tmp_maxval
)
order by dst
;
DST VAL FIRST_OCCURRENCE
---------- ---------- ----------------
1 3 1
2 2 1
3 1 1
4 2 1
5 4 5
6 2 5
7 2 5
8 5 8
9 5 8
10 1 8
Or, if you are on Oracle version 12.1 or higher, MATCH_RECOGNIZE can do quick work of this assignment:
select dst, val, first_occurrence
from tmp_maxval t
match_recognize(
order by dst
measures a.dst as first_occurrence
all rows per match
pattern (a x*)
define x as val <= a.val
)
order by dst
;
You can get the maximum value using a cumulative max:
select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv;
I think this answers your question. If you want the distance itself:
select mv.*,
min(case when max_value = value then distance end) over (order by distance) as first_distance_at_max_value
from (select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv
) mv;
You could use either max() or min() combined with case when:
select t.*,
min(case when val = mv then dst end) over (partition by mv order by dst) v1,
max(case when val = mv then dst end) over (partition by mv order by dst) v2
from (select t.*, max(val) over (order by dst) mv from tmp_maxval t) t
order by dst
Result:
DST VAL MV V1 V2
---------- ---------- ---------- ---------- ----------
1 3 3 1 1
2 2 3 1 1
3 1 3 1 1
4 2 3 1 1
5 4 4 5 5
6 2 4 5 5
7 2 4 5 5
8 5 5 8 8
9 5 5 8 9
10 1 5 8 9
Explained logic and words first occurence suggest that you need min(), but third row in your example suggest max() ;-) In data which you provided you can observe difference in rows 9-10. Choose what you want.
Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception:
skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare #req_data table (
id int primary key identity,
session_id int,
value int
)
insert into #req_data(session_id, value) values (5, 7)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (5, 1) -- ignore this one too
insert into #req_data(session_id, value) values (5, 12)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (6, 7)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (6, 3)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (7, 7)
insert into #req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into #req_data(session_id, value) values (8, 2)
insert into #req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from #req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?
The following should work well for this.
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
Results...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
Edit: How and why it works...
The basic premise is taken from Itzik Ben-Gan's "The Last non NULL Puzzle".
Essentially, we are relying 2 different behaviors that most people don't usually think about...
1) NULL + anything = NULL.
2) You can CAST or CONVERT an INT into a fixed length BINARY data type and it will continue to sort as an INT (as opposed to sorting like a text string).
This is easier to see when the intermittent steps are added to the query in the CTE...
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
Results...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
Looking at the BinVal column, we see an 8 byte hex value for all non-[value] = 1 rows and NULLS where [value] = 1... The 1st 4 bytes are the Id (used for ordering) and the 2nd 4 bytes are [value] (used to set the "previous non-1 value" or set the whole thing to NULL.
The 2nd step is to "smear" the non-NULL values into the NULLs using the window framed MAX function, partitioned by session_id and ordered by id.
The 3rd step is to parse out the last 4 bytes and convert them back to an INT data type (SecondHalfAsINT) and deal with any nulls that result from not having any non-1 preceding value (ControlValue).
Since we can't reference a windowed function in the WHERE clause, we have to throw the query into a CTE (a derived table would work just as well) so that we can use the new ControlValue in the where clause.
SELECT CRow.id
FROM #req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM #req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN #req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7
You can use the following query:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
to get:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
So, this query detects islands of consecutive 1 records that belong to the same group, as specified by the first preceding row with value <> 1.
You can use a window function once more to detect all 7 islands. If you wrap this in a second cte, then you can finally get the desired result by filtering out all 7 islands:
;with session_islands as (
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
), islands_with_7 as (
select id, grp, value,
count(case when value = 7 then 1 end)
over (partition by session_id, grp) as cnt_7
from session_islands
)
select id
from islands_with_7
where cnt_7 = 0 and value = 1
How can I get next not null value in column? I have MSSQL 2012 and table with only one column. Like this:
rownum Orig
------ ----
1 NULL
2 NULL
3 9
4 NULL
5 7
6 4
7 NULL
8 9
and I need this data:
Rownum Orig New
------ ---- ----
1 NULL 9
2 NULL 9
3 9 9
4 NULL 7
5 7 7
6 4 4
7 NULL 5
8 9 5
Code to start:
declare #t table (rownum int, orig int);
insert into #t values (1,NULL),(2,NULL),(3,9),(4,NULL),(5,7),(6,4),(7,NULL),(8,9);
select rownum, orig from #t;
One method is to use outer apply:
select t.*, t2.orig as newval
from #t t outer apply
(select top 1 t2.*
from #t t2
where t2.id >= t.id and t2.orig is not null
order by t2.id
) t2;
One way you can do this with window functions (in SQL Server 2012+) is to use a cumulative max on id, in inverse order:
select t.*, max(orig) over (partition by nextid) as newval
from (select t.*,
min(case when orig is not null then id end) over (order by id desc) as nextid
from #t
) t;
The subquery gets the value of the next non-NULL id. The outer query then spreads the orig value over all the rows with the same id (remember, in a group of rows with the same nextid, only one will have a non-NULL value for orig).
I am actually new to SQL server 2008, and I am trying to sequence and re-set a number in a table. The source is something like:
Row Refrec FLAG
1 5 NULL
2 4 X
3 3 NULL
4 2 NULL
5 1 Y
6 5 A
7 4 B
8 3 NULL
9 2 NULL
10 1 NULL
The result should look like:
Row Refrec FLAG SEQUENCE
1 5 NULL NULL
2 4 X 0
3 3 NULL 1
4 2 NULL 2
5 1 Y 0
6 5 A 0
7 4 B 0
8 3 NULL 1
9 2 NULL 2
10 1 NULL 3
Thanks!
It looks like you want to enumerate the sequence values for NULL values, setting all the other values to 0. I'm not sure why the first value is NULL, but that is easily fixed.
The following may do what you want:
select t.*,
(case when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);
If you really care about the first value:
select t.*,
(case when row = 1 then NULL
when flag is not null then 0
else row_number() over (partition by seqnum - row order by row)
end) as Sequence
from (select t.*, row_number() over (partition by flag order by row) as seqnum
from table t
);