I would like to calculate the distance to maximum value for each possible distance. As an example:
Row Distance Value
1 1 2 --> 1 (Distance from Row 1)
2 2 3 --> 2 (Distance from Row 2)
3 3 3 --> 2 (Distance from Row 2)
4 4 1 --> 2 (Distance from Row 2)
5 5 5 --> 5 (Distance from Row 5)
6 6 1 --> 5 (Distance from Row 5)
Explanation: Row 6 has value of 5 because the first occurrence of maximum value between rows 1 through 6 was at distance 5.
I have tried to use some windows functions but cannot figure out how to put it together.
Sample data:
--drop table tmp_maxval;
create table tmp_maxval (dst number, val number);
insert into tmp_maxval values(1, 3);
insert into tmp_maxval values(2, 2);
insert into tmp_maxval values(3, 1);
insert into tmp_maxval values(4, 2);
insert into tmp_maxval values(5, 4);
insert into tmp_maxval values(6, 2);
insert into tmp_maxval values(7, 2);
insert into tmp_maxval values(8, 5);
insert into tmp_maxval values(9, 5);
insert into tmp_maxval values(10,1);
commit;
Functions I think can be useful in solving this:
select t.*,
max(val) over(order by dst),
case when val >= max(val) over(order by dst) then 1 else 0 end ,
case when row_number() over(partition by val order by dst) = 1 then 1 else 0 end as first_occurence
from
ap_risk.tmp_maxval t
select dst, val,
max(case when flag is null then dst end) over (order by dst)
as first_occurrence
from (
select dst, val,
case when val <= max(val) over (order by dst
rows between unbounded preceding and 1 preceding)
then 1 end as flag
from tmp_maxval
)
order by dst
;
DST VAL FIRST_OCCURRENCE
---------- ---------- ----------------
1 3 1
2 2 1
3 1 1
4 2 1
5 4 5
6 2 5
7 2 5
8 5 8
9 5 8
10 1 8
Or, if you are on Oracle version 12.1 or higher, MATCH_RECOGNIZE can do quick work of this assignment:
select dst, val, first_occurrence
from tmp_maxval t
match_recognize(
order by dst
measures a.dst as first_occurrence
all rows per match
pattern (a x*)
define x as val <= a.val
)
order by dst
;
You can get the maximum value using a cumulative max:
select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv;
I think this answers your question. If you want the distance itself:
select mv.*,
min(case when max_value = value then distance end) over (order by distance) as first_distance_at_max_value
from (select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv
) mv;
You could use either max() or min() combined with case when:
select t.*,
min(case when val = mv then dst end) over (partition by mv order by dst) v1,
max(case when val = mv then dst end) over (partition by mv order by dst) v2
from (select t.*, max(val) over (order by dst) mv from tmp_maxval t) t
order by dst
Result:
DST VAL MV V1 V2
---------- ---------- ---------- ---------- ----------
1 3 3 1 1
2 2 3 1 1
3 1 3 1 1
4 2 3 1 1
5 4 4 5 5
6 2 4 5 5
7 2 4 5 5
8 5 5 8 8
9 5 5 8 9
10 1 5 8 9
Explained logic and words first occurence suggest that you need min(), but third row in your example suggest max() ;-) In data which you provided you can observe difference in rows 9-10. Choose what you want.
Related
Here is my table. I am using Snowflake
CREATE TABLE testx
(
c_order int,
c_condition varchar(10)
);
INSERT INTO testx
VALUES (1, 'and'), (2, 'and'), (3, 'or'), (4, 'and'), (5, 'or');
SELECT * FROM testx
c_order c_condition
--------------------
1 and
2 and
3 or
4 and
5 or
I am trying to write a query which will give me group numbers based on the fact that consecutive 'and's should be with same group number. when 'or' comes, it should increase the group number. by the way, we should maintain the c_order also.
Here is the expected result:
c_order c_condition group_no
-------------------------------
1 and 1
2 and 1
3 or 2
4 and 2
5 or 3
I have tried using dense_rank like this:
SELECT
*,
DENSE_RANK() OVER (ORDER BY c_condition)
FROM testx
But it doesn't return exactly what I want. Can somebody please help?`
Idea is to use same value for C_ORDER as group_no if
C_ORDER is more then previous OR's c_order.
In CTE we only select rows with OR and assign them a
group number using ROW_NUMBER() generator -
Main query -
with temp_cte as
(
select c_order,
case -- to check if 'or' is the first row or not
when (select min(c_order) from testx where c_condition='or') =
(select min(c_order) from testx)
then row_number() over (order by c_order)
else
row_number() over (order by c_order)+1
end rn
from testx, table(generator(rowcount=>1))
where c_condition='or'
)
select x.c_order, x.c_condition,
case
when x.c_order = w.c_order
then w.rn
when x.c_order > (select min(c_order) from temp_cte)
then (select max(rn) from temp_cte where c_order < x.c_order)
else 1
end seq1
from testx x left join temp_cte w
on x.c_order = w.c_order
order by x.c_order;
Output -
C_ORDER
C_CONDITION
SEQ1
1
and
1
2
and
1
3
or
2
4
or
3
5
and
3
6
and
3
7
or
4
8
and
4
9
or
5
For data-set
select * from testx;
C_ORDER
C_CONDITION
1
and
2
and
3
or
4
or
5
and
6
and
7
or
8
and
9
or
Or, just use CONDITIONAL_TRUE_EVENT. Refer
with data(C_ORDER,C_CONDITION) as(
select * from values
(1,'and'),
(2,'and'),
(3,'or'),
(4,'or'),
(5,'and'),
(6,'and'),
(7,'or'),
(8,'and'),
(9,'or')
)select c_order, c_condition,
conditional_true_event(c_condition='or') over (order by c_order) grp
from data;
C_ORDER
C_CONDITION
GRP
1
and
0
2
and
0
3
or
1
4
or
2
5
and
2
6
and
2
7
or
3
8
and
3
9
or
4
I wrote these 2 queries, the first one is keeping duplicates and the second one is dropping them
Does anyone know a more efficient way to achieve this?
Queries are for MSSQL, returning the top 3 values
1-
SELECT TMP.entity_id, TMP.value
FROM(
SELECT TAB.entity_id, LEAD(TAB.entity_id, 3, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_id, TAB.value
FROM mytable TAB
) TMP
WHERE TMP.entity_id <> TMP.next_id
2-
SELECT TMP.entity_id, TMP.value
FROM(
SELECT TMX.entity_id, LEAD(TMX.entity_id, 3, 0) OVER(ORDER BY TMX.entity_id, TMX.value) AS next_id, TMX.value
FROM(
SELECT TAB.entity_id, LEAD(TAB.entity_id, 1, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_id, TAB.value, LEAD(TAB.value, 1, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_value
FROM mytable TAB
) TMX
WHERE TMP.entity_id <> TMP.next_id OR TMX.value <> TMX.next_value
) TMP
WHERE TMP.entity_id <> TMP.next_id
Example:
Table:
entity_id value
--------- -----
1 9
1 11
1 12
1 3
2 25
2 25
2 5
2 37
3 24
3 9
3 2
3 15
Result Query 1 (25 appears twice for entity_id 2):
entity_id value
--------- -----
1 9
1 11
1 12
2 25
2 25
2 37
3 9
3 15
3 24
Result Query 2 (25 appears only once for entity_id 2):
entity_id value
--------- -----
1 9
1 11
1 12
2 5
2 25
2 37
3 9
3 15
3 24
You can use the ROW_NUMBER which will allow duplicates as follows:
select entity_id, value from
(select t.*, row_number() over (partition by entity_id order by value desc) as rn
from your_Table) where rn <= 3
You can use the rank to remove the duplicate as follows:
select distinct entity_id, value from
(select t.*, rank() over (partition by entity_id order by value desc) as rn
from your_Table) where rn <= 3
Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception:
skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare #req_data table (
id int primary key identity,
session_id int,
value int
)
insert into #req_data(session_id, value) values (5, 7)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (5, 1) -- ignore this one too
insert into #req_data(session_id, value) values (5, 12)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (6, 7)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (6, 3)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (7, 7)
insert into #req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into #req_data(session_id, value) values (8, 2)
insert into #req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from #req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?
The following should work well for this.
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
Results...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
Edit: How and why it works...
The basic premise is taken from Itzik Ben-Gan's "The Last non NULL Puzzle".
Essentially, we are relying 2 different behaviors that most people don't usually think about...
1) NULL + anything = NULL.
2) You can CAST or CONVERT an INT into a fixed length BINARY data type and it will continue to sort as an INT (as opposed to sorting like a text string).
This is easier to see when the intermittent steps are added to the query in the CTE...
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
Results...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
Looking at the BinVal column, we see an 8 byte hex value for all non-[value] = 1 rows and NULLS where [value] = 1... The 1st 4 bytes are the Id (used for ordering) and the 2nd 4 bytes are [value] (used to set the "previous non-1 value" or set the whole thing to NULL.
The 2nd step is to "smear" the non-NULL values into the NULLs using the window framed MAX function, partitioned by session_id and ordered by id.
The 3rd step is to parse out the last 4 bytes and convert them back to an INT data type (SecondHalfAsINT) and deal with any nulls that result from not having any non-1 preceding value (ControlValue).
Since we can't reference a windowed function in the WHERE clause, we have to throw the query into a CTE (a derived table would work just as well) so that we can use the new ControlValue in the where clause.
SELECT CRow.id
FROM #req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM #req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN #req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7
You can use the following query:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
to get:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
So, this query detects islands of consecutive 1 records that belong to the same group, as specified by the first preceding row with value <> 1.
You can use a window function once more to detect all 7 islands. If you wrap this in a second cte, then you can finally get the desired result by filtering out all 7 islands:
;with session_islands as (
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
), islands_with_7 as (
select id, grp, value,
count(case when value = 7 then 1 end)
over (partition by session_id, grp) as cnt_7
from session_islands
)
select id
from islands_with_7
where cnt_7 = 0 and value = 1
I have a table:
Trip Stop Time
-----------------
1 A 1:10
1 B 1:16
1 B 1:20
1 B 1:25
1 C 1:31
1 B 1:40
2 A 2:10
2 B 2:17
2 C 2:20
2 B 2:25
I want to add one more column to my query output:
Trip Stop Time Sequence
-------------------------
1 A 1:10 1
1 B 1:16 2
1 B 1:20 2
1 B 1:25 2
1 C 1:31 3
1 B 1:40 4
2 A 2:10 1
2 B 2:17 2
2 C 2:20 3
2 B 2:25 4
The hard part is B, if B is next to each other I want it to be the same sequence, if not then count as a new row.
I know
row_number over (partition by trip order by time)
row_number over (partition by trip, stop order by time)
None of them will meet the condition I want. Is there a way to query this?
create table test
(trip number
,stp varchar2(1)
,tm varchar2(10)
,seq number);
insert into test values (1, 'A', '1:10', 1);
insert into test values (1, 'B', '1:16', 2);
insert into test values (1, 'B', '1:20', 2);
insert into test values (1 , 'B', '1:25', 2);
insert into test values (1 , 'C', '1:31', 3);
insert into test values (1, 'B', '1:40', 4);
insert into test values (2, 'A', '2:10', 1);
insert into test values (2, 'B', '2:17', 2);
insert into test values (2, 'C', '2:20', 3);
insert into test values (2, 'B', '2:25', 4);
select t1.*
,sum(decode(t1.stp,t1.prev_stp,0,1)) over (partition by trip order by tm) new_seq
from
(select t.*
,lag(stp) over (order by t.tm) prev_stp
from test t
order by tm) t1
;
TRIP S TM SEQ P NEW_SEQ
------ - ---------- ---------- - ----------
1 A 1:10 1 1
1 B 1:16 2 A 2
1 B 1:20 2 B 2
1 B 1:25 2 B 2
1 C 1:31 3 B 3
1 B 1:40 4 C 4
2 A 2:10 1 B 1
2 B 2:17 2 A 2
2 C 2:20 3 B 3
2 B 2:25 4 C 4
10 rows selected
You want to see if the stop changes between one row and the next. If it does, you want to increment the sequence. So use lag to get the previous stop into the current row.
I used DECODE because of the way it handles NULLs and it is more concise than CASE, but if you are following the text book, you should probably use CASE.
Using SUM as an analytic function with an ORDER BY clause will give the answer you are looking for.
select *, dense_rank() over(partition by trip, stop order by time) as sqnc
from yourtable;
Use dense_rank so you get all the numbers consecutively, with no skipped numbers in between.
I think this is more complicated than a simple row_number(). You need to identify groups of adjacent stops and then enumerate them.
You can identify the groups using a difference of row numbers. Then, a dense_rank() on the difference does what you want if there are no repeated stops on a trip:
select t.*,
dense_rank() over (partition by trip order by grp, stop)
from (select t.*,
(row_number() over (partition by trip order by time) -
row_number() over (partition by trip, stop order by time)
) as grp
from table t
) t;
If there are:
select t.*, dense_rank() over (partition by trip order by mintime)
from (select t.*,
min(time) over (partition by trip, grp, stop) as mintime
from (select t.*,
(row_number() over (partition by trip order by time) -
row_number() over (partition by trip, stop order by time)
) as grp
from table t
) t
) t;
How can I select change points from this data set
1 0
2 0
3 0
4 100
5 100
6 100
7 100
8 0
9 0
10 0
11 100
12 100
13 0
14 0
15 0
I want this result
4 7 100
11 12 100
This query based on analytic functions lag() and lead() gives expected output:
select id, nid, point
from (
select id, point, p1, lead(id) over (order by id) nid
from (
select id, point,
decode(lag(point) over (order by id), point, 0, 1) p1,
decode(lead(point) over (order by id), point, 0, 2) p2
from test)
where p1<>0 or p2<>0)
where p1=1 and point<>0
SQLFiddle
Edit: You may want to change line 3 in case there only one row for changing point:
...
select id, point, p1,
case when p1=1 and p2=2 then id else lead(id) over (order by id) end nid
...
It would be simple to use ROW_NUMBER analytic function, MIN and MAX.
This is a frequently asked question about finding the interval/series of values and skip the gaps. I like the word given to it as Tabibitosan method by Aketi Jyuuzou.
For example,
SQL> SELECT MIN(A),
2 MAX(A),
3 b
4 FROM
5 ( SELECT a,b, a-Row_Number() over(order by a) AS rn FROM t WHERE b <> 0
6 )
7 GROUP BY rn,
8 b
9 ORDER BY MIN(a);
MIN(A) MAX(A) B
---------- ---------- ----------
4 7 100
11 12 100
SQL>