In Oracle, how to do Rank() (or any other ways that work) down to group level instead of row level? - sql

Let's say we have this set of code:
NAM T F
A 10 Y
A 11 N
A 12 N
A 13 Y
B 10 Y
B 11 Y
How can we use Rank() (or any other ways that work) to transform the above to:
NAM F ID MNT CNT
A Y 1 10 1
A N 2 11 2
A Y 3 13 1
B Y 1 11 2
(NAM and F is grouped, but for F they can be grouped only when the rows are next to each other - in other words, for F to be grouped togather, the rows must have the value of T = 0,1,2,3,4... the difference of each T must be exactly 1)
The new columns are ID and CNT - the main point is column ID, where the 2nd and 3rd row have to be considered to be in the same rank (ID=2) because both rows have the F flag as false/0.
The source data can be got from:
select 'A' NAM, 10 t, 'Y' f FROM dual union all
select 'A' NAM, 11 t, 'N' f FROM dual union all
select 'A' NAM, 12 t, 'N' f FROM dual union all
select 'A' NAM, 13 t, 'Y' f FROM dual union all
select 'B' NAM, 10 t, 'Y' f FROM dual union all
select 'B' NAM, 11 t, 'Y' f FROM dual
The order of the time field T has to be considered, in other words the following result should not result:
NAM F ID MNT CNT
A Y 1 10 2
A N 2 11 2
B Y 1 10 2
One more example:
NAM T F
A 10 Y
A 11 N
A 12 Y
A 13 Y
A 14 N
A 15 N
A 16 N
A 17 Y
B 10 Y
B 11 Y
Result should be:
NAM F ID MNT CNT
A Y 1 10 1
A N 2 11 1
A Y 3 12 2
A N 4 14 3
A Y 5 17 1
B Y 1 10 2
The source data set:
select 'A' NAM, 0 t, 'Y' f FROM dual union all
select 'A' NAM, 1 t, 'N' f FROM dual union all
select 'A' NAM, 2 t, 'Y' f FROM dual union all
select 'A' NAM, 3 t, 'Y' f FROM dual union all
select 'A' NAM, 4 t, 'N' f FROM dual union all
select 'A' NAM, 5 t, 'N' f FROM dual union all
select 'A' NAM, 6 t, 'N' f FROM dual union all
select 'A' NAM, 7 t, 'Y' f FROM dual union all
select 'B' NAM, 0 t, 'Y' f FROM dual union all
select 'B' NAM, 1 t, 'Y' f FROM dual

If you need to count consecutive rows in partitions by column A you could use this technique:
select a, min(f) f, rank() over (partition by a order by diff) i, count(1) cnt
from (
select test.*,
row_number() over (partition by a order by t)
- count(f) over (partition by a, f order by t) diff
from test)
group by a, diff order by a, diff
SQLFiddle
Edit: for updated part of question use these modifications
select nam, mnt, cnt,
row_number() over (partition by nam, diff order by mnt) id
from (
select nam, min(t) mnt, count(1) cnt, diff
from (
select nam, t, f,
row_number() over (partition by nam order by t)
- count(1) over (partition by nam, f order by t) diff
from test )
group by nam, diff, f )
order by nam, diff
This query gave me expected result, please test it.

Can be solved with "stacked" analytics:
Identify the "leader" of each group to be aggregated.
Propagate the "leader" to the rest of the elements in the group.
Aggregate the data via the "leader".
The query would be:
with source_data$ as (
/*
... your source data here ...
*/
),
find_the_leader$ as (
select X.*,
case when lag(t) over (partition by nam, f order by t asc) is null
or t != 1 + lag(t) over (partition by nam, f order by t asc)
then t
end as mnt_leader
from source_data$ X
),
propagage_the_leader$ as (
select X.*,
last_value(mnt_leader) ignore nulls over (partition by nam, f order by t asc) as mnt
from find_the_leader$ X
)
select nam, f,
row_number() over (partition by nam order by mnt asc) as id,
mnt, count(1) as cnt
from propagage_the_leader$
group by nam, f, mnt
order by nam, id, f
;
On my PC, with your source data no.1 it yields:
NAM F ID MNT CNT
A Y 1 10 1
A N 2 11 2
A Y 3 13 1
B Y 1 10 2
And with your source data no.2 (with values of t increased by 10 in the "union-all-from-dual" select) it yields:
NAM F ID MNT CNT
A Y 1 10 1
A N 2 11 1
A Y 3 12 2
A N 4 14 3
A Y 5 17 1
B Y 1 10 2
I hope that you don't have any additional constraints over how your results should look like, because I don't have more time to adjust the query to answer a different problem.

Using the well known two step approach to identify contignous groups (LAG / LAST_VALUE ignore nulls) here the query (updated for the new setup)
with tab1 as (
select nam,t,f,
nvl(lag(f) over (partition by nam order by t),-1) lag_f,
case when (nvl(lag(f) over (partition by nam order by t),-1) <> f) then
row_number() over (partition by nam order by t) end grp
from tst
), tab2 as (
select nam,t,f,
last_value(grp ignore nulls) over (partition by nam order by t) as grp2
from tab1
), tab3 as (
select
nam, f, count(*) cnt, min(t) mnt
from tab2
group by nam, f, grp2
)
select nam,f,
rank() over (partition by nam order by mnt) r,
mnt, cnt
from tab3
order by nam,4;
gives
NAM, F, R, MNT, CNT
A Y 1 0 1
A N 2 1 1
A Y 3 2 2
A N 4 4 3
A Y 5 7 1
B Y 1 0 2

Related

Gradually aggregating a string column in Oracle SQL

I would like to gradually aggregate a string column in Oracle sql.
From this table:
col_1 | col_2
-------------
1 | A
1 | B
1 | C
2 | C
2 | D
to:
col_1 | col_2
-------------
1 | A
1 | A,B
1 | A,B,C
2 | C
2 | C,D
I tried LISTAGG but it won't return all rows due to group by. I have about 2 million rows in the table.
Oracle doesn't support accumulating string concatenation with a single listagg() expression. However, you can use a subquery.
Just one note: SQL tables represent unordered sets. You seem to have an ordering in mind. The following code adds an ordering column:
with t as (
select 1 as id, 1 as x, 'A' as y from dual union all
select 2, 1 as x, 'B' as y from dual union all
select 3, 1 as x, 'C' as y from dual union all
select 4, 2 as x, 'C' as y from dual union all
select 5, 2 as x, 'D' as y from dual
)
select t.*,
(select listagg(t2.y, ',') within group (order by t2.id)
from t t2
where t2.x = t.x and t2.id <= t.id
)
from t;
A hierarchical query option looks like this:
SQL> with t as (
2 select 1 as id, 1 as x, 'A' as y from dual union all
3 select 2, 1 as x, 'B' as y from dual union all
4 select 3, 1 as x, 'C' as y from dual union all
5 select 4, 2 as x, 'C' as y from dual union all
6 select 5, 2 as x, 'D' as y from dual
7 )
8 select x,
9 ltrim(sys_connect_by_path(y, ','), ',') result
10 from (select x,
11 y,
12 row_number() over (partition by x order by y) rn
13 from t
14 )
15 start with rn = 1
16 connect by prior rn = rn - 1 and prior x = x;
X RESULT
---------- --------------------
1 A
1 A,B
1 A,B,C
2 C
2 C,D
SQL>

Oracle: Analytical functions Sub totals after each change in value

I have the following data (order of records as in the example):
A B
1 10
1 20
1 30
1 40
2 50
2 65
2 75
1 89
1 100
from SQL:
with x as (
select A, B
from (
select 1 as A, 10 as B from dual
union all
select 1 as A, 20 as B from dual
union all
select 1 as A, 30 as B from dual
union all
select 1 as A, 40 as B from dual
union all
select 2 as A, 50 as B from dual
union all
select 2 as A, 65 as B from dual
union all
select 2 as A, 75 as B from dual
union all
select 1 as A, 89 as B from dual
union all
select 1 as A, 100 as B from dual
)
)
select A, B
from X
I want to group the data for each change of value in column A,
I want to get the following result:
A MIN(B) MAX(B)
1 10 40
2 50 75
1 89 100
How to get such a result in the ORACLE 11. I would expect a simple implementation...
This is a gaps and islands problem, solved using row_number analytic function
SELECT a,
MIN(b),
MAX(b)
FROM (
SELECT x.*,
ROW_NUMBER() OVER(
ORDER BY b
) - ROW_NUMBER() OVER(
PARTITION BY a
ORDER BY b
) AS seq
FROM x
)
GROUP BY a,
seq;
Demo

Oracle get difference in Average of Current and Previous group (partition)

I am using Oracle 12.1.0.2.0
I want difference in average of current group(partition) - average of previous group(partition)
My code to get current group Average is
with rws as (
select rownum x, mod(rownum, 2) y from dual connect by level <= 10
), avgs as (
select x, y, avg(x) over (partition by y) mean from rws
)
select x, y, mean
from avgs;
Now I want something like :
X Y MEAN PREV_MEAN MEAN_DIFF
4 0 6
8 0 6
2 0 6
6 0 6
10 0 6
9 1 5 6 -1
7 1 5
3 1 5
1 1 5
5 1 5
2 2 3 5 -3
3 2 3
5 2 3
1 2 3
4 2 3
AVG( this partitioned group) - Avg( previous partition group)
In this case I need ( 5 - 6 ) to compute in GROUP_MEAN_DIFFERENCE column.
Also How can I get mean difference always w.r.t first group.
In the example above I need (5 - 6) and (3 - 6)
Can you please assist?
Use the function lag() with ignore nulls clause:
select id, val, av, av - lag(av ignore nulls) over (order by id) diff
from (select id, val,
case when row_number() over (partition by id order by null) = 1
then avg(val) over (partition by id) end av
from t)
order by id
Test:
with t (id, val) as (select 1, 44.520 from dual union all
select 1, 47.760 from dual union all
select 1, 50.107 from dual union all
select 1, 48.353 from dual union all
select 1, 47.640 from dual union all
select 2, 48.353 from dual union all
select 2, 50.447 from dual union all
select 2, 51.967 from dual union all
select 2, 45.800 from dual union all
select 2, 46.913 from dual )
select id, val, av, av - lag(av ignore nulls) over (order by id) diff
from (select id, val,
case when row_number() over (partition by id order by null) = 1
then avg(val) over (partition by id) end av
from t)
order by id
Output:
ID VAL AV DIFF
--- ------- ------- -------
1 44.520 47.676
1 47.760
1 50.107
1 48.353
1 47.640
2 48.353 48.696 1.02
2 50.447
2 51.967
2 45.800
2 46.913

create one list from to two columns

in need Help with oracle SQL.
I have a table with
from to
F B
B R
R D
E X
X Q
and I need the list
F
B
R
D
E
X
Q
so my problem is the jump from R-->D to E-->X
Edit: It's a big list with from and to, seperatet with a annother column as citerium. Normaly there is every from in the to column, so i used
SELECT from,snr as Nr FROM list where StrAbsNr = 1
union all
SELECT to,snr + 1 as Nr FROM list
to create a ordered list. But there are gaps in some parts, in the example there is D-->E missing
has anybody an idea ?
for your example this work:
WITH ft AS
(SELECT 'f' vfrom, 'b' AS vto FROM dual UNION ALL
SELECT 'b' , 'r' FROM dual UNION ALL
SELECT 'r','d' FROM dual UNION ALL
SELECT 'e','x' FROM dual UNION ALL
SELECT 'x','q' FROM dual )
SELECT a.a, MAX(rn), MIN(ob)
FROM
( SELECT vfrom a , rownum rn, 1 ob FROM ft
UNION ALL
SELECT vto , rownum rn, 2 ob FROM ft
) a
GROUP BY a
ORDER BY MAX(rn), MIN(ob)
A MAX(RN) MIN(OB)
- ---------- ----------
f 1 1
b 2 1
r 3 1
d 3 2
e 4 1
x 5 1
q 5 2
7 rows selected
or analityc func row_number:
SELECT *
FROM
(SELECT a.a,
row_number() over (partition BY a order by rn, ob) rna,
ob,
rn
FROM
( SELECT vfrom a, rownum rn, 1 ob FROM ft
UNION ALL
SELECT vto , rownum rn, 2 ob FROM ft
) a
)
WHERE rna=1
ORDER BY rn,
ob
A RNA OB RN
- ---------- ---------- ----------
f 1 1 1
b 1 2 1
r 1 2 2
d 1 2 3
e 1 1 4
x 1 2 4
q 1 2 5
7 rows selected
select "from" as val from table
union
select to from table
And if you want to keep the order:
select distinct val
from (select "from" as val, rownum, 1 as valOrder from table
union
select to, rownum, 2 as valOrder from table)
order by rownum,valOrder

Finding where a running sum of a time series is above given threshold

I have some time series data. For example look at the following values (Lets assume time here is minutes):
User Time Value
a 0 10
b 1 100
c 2 200
a 3 5
e 4 7
a 5 999
a 6 8
b 7 10
a 8 10
a 9 10
a 10 10
a 11 10
a 12 100
Now I want to find out if within any given 5 minute intervals a total SUM of more than 1000 is achieved.
For example in the above example I should get an output such as user a, minute 5,6,8,9.
That's an easy task for Window Function:
select *
from
(
select t.*
,sum("Value") -- cumulative sum over the previous five minutes
over (partition by "user"
order by "Time"
range 4 preceding) as sum_5_minutes
from Table1 t
) dt
where sum_5_minutes > 1000
See fiddle
Edit: SQLFiddle is offline again, but you can also search the next 5 minutes.
Edit2: SQLFiddle offline, but if the datatype is a TimeStamp or Date you must use intervals instead of integers:
select *
from
(
select t.*
,sum("Value")
over (partition by "User"
order by "Time"
range interval '4' minute preceding) as sum_prev5_minutes
,sum("Value")
over (partition by "User"
order by "Time"
range between interval '0' minute preceding -- or "current row" if there are no duplicate timestamps
and interval '4' minute following) as sum_next5_minutes
from Table1 t
) dt
where sum_prev5_minutes > 1000
or sum_next5_minutes > 1000
To illustrate my comment to dnoeth's post, and so don't take my answer as correct as he did the heavy lifting and deserves the green checkmark, the following shows how you can set the range at runtime...
WITH DAT AS (
SELECT 'a' u, 0 t, 10 v from dual union all
SELECT 'b' u, 1 t, 100 v from dual union all
SELECT 'c' u, 2 t, 200 v from dual union all
SELECT 'a' u, 3 t, 5 v from dual union all
SELECT 'e' u, 4 t, 7 v from dual union all
SELECT 'a' u, 5 t, 999 v from dual union all
SELECT 'a' u, 6 t, 8 v from dual union all
SELECT 'b' u, 7 t, 10 v from dual union all
SELECT 'a' u, 8 t, 10 v from dual union all
SELECT 'a' u, 9 t, 10 v from dual union all
SELECT 'a' u, 10 t, 10 v from dual union all
SELECT 'a' u, 11 t, 10 v from dual union all
SELECT 'a' u, 12 t, 100 v from dual )
-- imaging passing a variable in to this second query, setting it in a config table, or whatever.
-- This is just showing that you don't have to hard-code it into the actual select clause, and that the value can be determined at runtime.
, wind as (select 5 rng from dual)
select d.*
,sum(v) -- cumulative sum over the previous five minutes
over (partition by u order by t
range w.rng preceding) as sum_5_minutes
from dat d
join wind w on 1=1
order by u,t;
I also note that lad2025 is correct that this windowing WILL miss some rows in the set. To correct that you need to bring back all rows in the set over the range for a user where the preceeding five seconds exceed 1000. This works correctly for user Z below, but would have only brought back the second row as originally coded.
WITH DAT AS (
SELECT 'a' u, 0 t, 10 v from dual union all
SELECT 'b' u, 1 t, 100 v from dual union all
SELECT 'c' u, 2 t, 200 v from dual union all
SELECT 'a' u, 3 t, 5 v from dual union all
SELECT 'e' u, 4 t, 7 v from dual union all
SELECT 'a' u, 5 t, 999 v from dual union all
SELECT 'a' u, 6 t, 8 v from dual union all
SELECT 'b' u, 7 t, 10 v from dual union all
SELECT 'a' u, 8 t, 10 v from dual union all
SELECT 'a' u, 9 t, 10 v from dual union all
SELECT 'a' u, 10 t, 10 v from dual union all
SELECT 'a' u, 11 t, 10 v from dual union all
-- two Z rows added. In the initial version only the second row would be caught.
SELECT 'z' u, 10 t, 999 v from dual union all
SELECT 'z' u, 11 t, 10 v from dual union all
SELECT 'a' u, 12 t, 100 v from dual )
, wind as (select 3 rng from dual)
SELECT dd.*, sum_5_minutes
from dat dd
JOIN (
SELECT * FROM (
select d.*
,sum(v) -- cumulative sum over the previous five minutes
over (partition by u order by t
range w.rng preceding) as sum_5_minutes
,min(t) -- start point of the range that we are covering
over (partition by u order by t
range w.rng preceding) as rng_5_minutes
from dat d
join wind w on 1=1
) WHERE sum_5_minutes > 1000 ) fails
on dd.u = fails.u
and dd.t >= fails.rng_5_minutes
and dd.t <= fails.t
order by dd.u, dd.t;
Here is my attempt at this:
select
s1."user", s1."time", sum (s2."value") as five_minute_value
from
sample s1
left join sample s2 on
s1."user" = s2."user" and
s1."time" between s2."time" and s2."time" + 4
group by
s1."user", s1."time"
having
sum (s2."value") > 1000
Output on your data:
a 8 1017
a 9 1027
a 6 1012
a 5 1004