Count distinct in rolling window function Oracle error ORA-30487 [duplicate] - sql

I have the task to determine fact of three events on different accounts is in 1 hour window.
The solution could be like
count(distinct account_id) over (order by time_key range between 20 PRECEDING and CURRENT ROW)
and check that count() > 3
But Oracle can't use distinct function with order by clause:
ORA-30487: ORDER BY not allowed here
I have the solution below, but it seems hard
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
)
select *
from (
select account_id,
time_key,
max(
case
when account_id = 1 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m1,
max(
case
when account_id = 2 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m2,
max(
case
when account_id = 3 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m3
from t_data
)
where m1 = 1 and m2 = 1 and m3 = 1
What is the simpler way to determine number of distinct events in a sliding window ?

It is not immediately obvious to me how you do this with window functions. You can use a correlated subquery:
select t.*,
(select count(distinct t2.account_id)
from t_data t2
where t2.time_key >= t.time_key - 20 and t2.time_key <= t.time_key
)
from t_data t;
Another method -- which could conceivably have better performance -- is to treat the problem as a gaps-and-island problem. The following version returns the number of simultaneous distinct accounts at each time key:
with t as (
select account_id, min(time_key) as min_time_key, max(time_key + 20) as max_time_key
from (select t.*, sum(case when time_key - prev_time_key <= 20 then 0 else 1 end) over (order by time_key) as grp
from (select t.*, lag(time_key) over (partition by account_id order by time_key) as prev_time_key
from t_data t
) t
) t
group by account_id
)
select td.account_id, td.time_key, count(distinct t.account_id) as num_distinct
from t_data td join
t
on td.time_key between t.min_time_key and t.max_time_key
group by td.account_id, td.time_key;
Finally, if you have only 3 (or 2) account ids that you want to find and you only care about getting some examples where the max is hit, then you can do the following:
select t.*
from (select t.*,
min(account_id) over (order by time_key range between 20 preceding and 1 preceding) as min_account_id,
max(account_id) over (order by time_key range between 20 preceding and 1 preceding) as max_account_id
from t_data t
) t
where min_account_id <> max_account_id and
account_id <> min_account_id and
account_id <> max_account_id;
This gets the max and min account ids from the preceding 20 rows -- excluding the current row. If these are different from the current value, then you have three different values.

Here is a super-simple way to do it. We can work on the performance, maybe if you want to post some details about the size of your table.
select t1.account_id, t1.time_key, count(distinct t2.account_id) cnt
from t_data t1 cross join t_data t2
where t2.time_key between t1.time_key - 20 and t1.time_key
group by t1.account_id, t1.time_key
having count(distinct t2.account_id) >= 3;

If you are really hell-bent on using a only a single windowing clause, here is a way:
with product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5)))
over ( order by time_key range between 20 preceding
and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;
Explanation:
Convert each distinct account_id into a prime number. So, the 1st account_id gets 2, the next gets 3, the next gets 5.
Take the natural log of that number
Sum the natural logs for all the events in the last hour (i.e., in our window), remembering that ln(a)+ln(b) = ln(a*b)
Take e to the power of the sum
(So far, this is just a long winded way to multiply all the prime numbers we mapped our account_ids to)
Any row where this result is evenly divisible by all three prime numbers we used (2,3,5 -- so, that is divisible by 30) has all three distinct account_ids in it's window.
If you were on my team and you wrote this, I would kill you.
Full example with data:
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
),
product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5)))
over ( order by time_key range between 20 preceding
and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;
Results:
+------------+----------+---------+
| ACCOUNT_ID | TIME_KEY | PRODUCT |
+------------+----------+---------+
| 3 | 1050 | 30 |
+------------+----------+---------+

Related

Oracle SQL row concatenation by periods: maximum period

I have the below table:
LAUFD
ID
NEXDT
ORDER_ROW
20140305
C1
20140310
14
20140226
C1
20140305
13
20131125
C1
20131126
12
20131021
C1
20131022
11
20130821
C1
20130828
10
20130814
C1
20130821
9
20130807
C1
20130814
8
20130731
C1
20130807
7
20130724
C1
20130731
6
20130710
C1
20130724
5
20130708
C1
20130709
4
20130624
C1
20130707
3
20130603
C1
20130608
2
20130527
C1
20130603
1
I would like to have the below output:
ID
START
END
C1
20140226
20140310
The logic is: if, ordering ID by order_row, the field NEXDT is equal or equal+1 or equal+2 to the field LAUFD of the next order_row, then continue with the next entry. If not, generate an entry in the output table with the start (earliest LAUFD) and end (latest NEXDT).
Basically, it's the same question as in Oracle SQL row concatenation by periods but I'd like just the latest period as an output.
Looks like this is what you need:
with t (LAUFD, ID, NEXDT, ORDER_ROW) as (
select 20140305,'C1', 20140310, 14 from dual union all
select 20140226,'C1', 20140305, 13 from dual union all
select 20131125,'C1', 20131126, 12 from dual union all
select 20131021,'C1', 20131022, 11 from dual union all
select 20130821,'C1', 20130828, 10 from dual union all
select 20130814,'C1', 20130821, 9 from dual union all
select 20130807,'C1', 20130814, 8 from dual union all
select 20130731,'C1', 20130807, 7 from dual union all
select 20130724,'C1', 20130731, 6 from dual union all
select 20130710,'C1', 20130724, 5 from dual union all
select 20130708,'C1', 20130709, 4 from dual union all
select 20130624,'C1', 20130707, 3 from dual union all
select 20130603,'C1', 20130608, 2 from dual union all
select 20130527,'C1', 20130603, 1 from dual
)
,t1 as (select id, order_row, to_date(laufd,'yyyymmdd') as laufd_dt, to_date(nexdt,'yyyymmdd') as nexdt_dt from t)
select *
from t1
match_recognize (
partition by id
order by order_row desc
measures
min(x.laufd_dt) as dt_start,
max(a.nexdt_dt) as dt_end,
x.laufd_dt-next(x.nexdt_dt) as dates_diff
one row per match
pattern(a x+ y* z*)
define
x as x.order_row=prev(order_row)-1 and prev(laufd_dt)-nexdt_dt<=3
,y as x.order_row=prev(order_row)-1
);
For just the latest period, you could use the previous solution. But instead, look for the first "break". Then only use the rows since that break;
select id, min(laufd), max(nextdt),
row_number() over (partition by id order by min(laufd)) as period
from (select t.*,
sum(case when prev_nextdt >= laufd - interval '2' day then 0 else 1 end) over
(partition by id order by order_row range desc) as grp,
sum(case when prev_nextdt >= laufd - interval '2' day then 0 else 1 end) over (partition by id) as num_grps
from (select t.id, t.order_row, -- any other columns you need
to_date(laufd, 'YYYYMMDD') as laufd,
to_date(nextdt, 'YYYYMMDD') as next_dt,
lag(to_date(nextdt, 'YYYYMMDD')) over (partition by id order by order_row) as prev_nextdt
from t
) t
) t
where num_grps = grp
group by id;
This is basically the same logic. It just keeps the first group.

Running count but reset on some column value in select query

I want to achieve a running value, but condition is reset on some specific column value.
Here is my select statement:
with tbl(emp,salary,ord) as
(
select 'A',1000,1 from dual union all
select 'B',1000,2 from dual union all
select 'K',1000,3 from dual union all
select 'A',1000,4 from dual union all
select 'B',1000,5 from dual union all
select 'D',1000,6 from dual union all
select 'B',1000,7 from dual
)
select * from tbl
I want to reset count on emp B if the column value is B, then count is reset to 0 and started again increment by 1:
emp salary ord running_count
A 1000 1 0
B 1000 2 1
K 1000 3 0
A 1000 4 1
B 1000 5 2
D 1000 6 0
B 1000 7 1
Here order column is ord.
I want to achieve the whole thing by select statement, not using the cursor.
You want to define groups were the counting takes place. Within a group, the solution is row_number().
You can define the group by doing a cumulative sum of B values. Because B ends the group, you want to count the number of B after each record.
This results in:
select t.*,
row_number() over (partition by grp order by ord) - 1 as running_count
from (select t.*,
sum(case when emp = 'B' then 1 else 0 end) over (order by ord desc) as grp
from tbl t
) t;

select rows between two character values of a column

I have a table which shows as below:
S.No | Action
1 | New
2 | Dependent
3 | Dependent
4 | Dependent
5 | New
6 | Dependent
7 | Dependent
8 | New
9 | Dependent
10 | Dependent
I here want to select the rows between the first two 'New' values in the Action column, including the first row with the 'New' action. Like [New,New)
For example:
In this case, I want to select rows 1,2,3,4.
Please let me know how to do this.
Hmmm. Let's count up the cumulative number of times that New appears as a value and use that:
select t.*
from (select t.*,
sum(case when action = 'New' then 1 else 0 end) over (order by s_no) as cume_new
from t
) t
where cume_new = 1;
you can do some magic with analytic functions
1 select group of NEW actions, to get min and max s_no
2 select lead of 2 rows
3 select get between 2 sno (min and max)
with t as (
select 1 sno, 'New' action from dual union
select 2,'Dependent' from dual union
select 3,'Dependent' from dual union
select 4,'Dependent' from dual union
select 5,'New' from dual union
select 6,'Dependent' from dual union
select 7,'Dependent' from dual union
select 8,'New' from dual union
select 9,'Dependent' from dual union
select 10,'Dependent' from dual
)
select *
from (select *
from (select sno, lead(sno) over (order by sno) a
from ( select row_number() over (partition by action order by Sno) t,
t.sno
from t
where t.action = 'New'
) a
where t <=2 )
where a is not null) a, t
where t.sno >= a.sno and t.sno < a.a

Use SUM function in oracle

I have a table in Oracle which contains :
id | month | payment | rev
----------------------------
A | 1 | 10 | 0
A | 2 | 20 | 0
A | 2 | 30 | 1
A | 3 | 40 | 0
A | 4 | 50 | 0
A | 4 | 60 | 1
A | 4 | 70 | 2
I want to calculate the payment column (SUM(payment)). For (id=A month=2) and (id=A month=4), I just want to take the greatest value from REV column. So that the sum is (10+30+40+70)=150. How to do it?
You can also use below.
select id,sum(payment) as value
from
(
select id,month,max(payment) from table1
group by id,month
)
group by id
Edit: for checking greatest rev value
select id,sum(payment) as value
from (
select id,month,rev,payment ,row_number() over (partition by id,month order by rev desc) as rno from table1
) where rno=1
group by id
This presupposes you don't have more than one value per rev. If that's not the case, then you probably want a row_number analytic instead of max.
with latest as (
select
id, month, payment, rev,
max (rev) over (partition by id, month) as max_rev
from table1
)
select sum (payment)
from latest
where rev = max_rev
Or there's this, if I've understood the requirement right:
with demo as (
select 'A'as id, 1 as month, 10 as payment, 0 as rev from dual
union all select 'A',2,20,0 from dual
union all select 'A',2,30,1 from dual
union all select 'A',3,40,0 from dual
union all select 'A',4,50,0 from dual
union all select 'A',4,60,1 from dual
union all select 'A',4,70,2 from dual
)
select sum(payment) keep (dense_rank last order by rev)
from demo;
You can check the breakdown by including the key columns:
with demo as (
select 'A'as id, 1 as month, 10 as payment, 0 as rev from dual
union all select 'A',2,20,0 from dual
union all select 'A',2,30,1 from dual
union all select 'A',3,40,0 from dual
union all select 'A',4,50,0 from dual
union all select 'A',4,60,1 from dual
union all select 'A',4,70,2 from dual
)
select id, month, max(rev)
, sum(payment) keep (dense_rank last order by rev)
from demo
group by id, month;
select sum(payment) from tableName where id='A' and month=2 OR month=4 order by payment asc;

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?
I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.
A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |
This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;