Finding where a running sum of a time series is above given threshold - sql

I have some time series data. For example look at the following values (Lets assume time here is minutes):
User Time Value
a 0 10
b 1 100
c 2 200
a 3 5
e 4 7
a 5 999
a 6 8
b 7 10
a 8 10
a 9 10
a 10 10
a 11 10
a 12 100
Now I want to find out if within any given 5 minute intervals a total SUM of more than 1000 is achieved.
For example in the above example I should get an output such as user a, minute 5,6,8,9.

That's an easy task for Window Function:
select *
from
(
select t.*
,sum("Value") -- cumulative sum over the previous five minutes
over (partition by "user"
order by "Time"
range 4 preceding) as sum_5_minutes
from Table1 t
) dt
where sum_5_minutes > 1000
See fiddle
Edit: SQLFiddle is offline again, but you can also search the next 5 minutes.
Edit2: SQLFiddle offline, but if the datatype is a TimeStamp or Date you must use intervals instead of integers:
select *
from
(
select t.*
,sum("Value")
over (partition by "User"
order by "Time"
range interval '4' minute preceding) as sum_prev5_minutes
,sum("Value")
over (partition by "User"
order by "Time"
range between interval '0' minute preceding -- or "current row" if there are no duplicate timestamps
and interval '4' minute following) as sum_next5_minutes
from Table1 t
) dt
where sum_prev5_minutes > 1000
or sum_next5_minutes > 1000

To illustrate my comment to dnoeth's post, and so don't take my answer as correct as he did the heavy lifting and deserves the green checkmark, the following shows how you can set the range at runtime...
WITH DAT AS (
SELECT 'a' u, 0 t, 10 v from dual union all
SELECT 'b' u, 1 t, 100 v from dual union all
SELECT 'c' u, 2 t, 200 v from dual union all
SELECT 'a' u, 3 t, 5 v from dual union all
SELECT 'e' u, 4 t, 7 v from dual union all
SELECT 'a' u, 5 t, 999 v from dual union all
SELECT 'a' u, 6 t, 8 v from dual union all
SELECT 'b' u, 7 t, 10 v from dual union all
SELECT 'a' u, 8 t, 10 v from dual union all
SELECT 'a' u, 9 t, 10 v from dual union all
SELECT 'a' u, 10 t, 10 v from dual union all
SELECT 'a' u, 11 t, 10 v from dual union all
SELECT 'a' u, 12 t, 100 v from dual )
-- imaging passing a variable in to this second query, setting it in a config table, or whatever.
-- This is just showing that you don't have to hard-code it into the actual select clause, and that the value can be determined at runtime.
, wind as (select 5 rng from dual)
select d.*
,sum(v) -- cumulative sum over the previous five minutes
over (partition by u order by t
range w.rng preceding) as sum_5_minutes
from dat d
join wind w on 1=1
order by u,t;
I also note that lad2025 is correct that this windowing WILL miss some rows in the set. To correct that you need to bring back all rows in the set over the range for a user where the preceeding five seconds exceed 1000. This works correctly for user Z below, but would have only brought back the second row as originally coded.
WITH DAT AS (
SELECT 'a' u, 0 t, 10 v from dual union all
SELECT 'b' u, 1 t, 100 v from dual union all
SELECT 'c' u, 2 t, 200 v from dual union all
SELECT 'a' u, 3 t, 5 v from dual union all
SELECT 'e' u, 4 t, 7 v from dual union all
SELECT 'a' u, 5 t, 999 v from dual union all
SELECT 'a' u, 6 t, 8 v from dual union all
SELECT 'b' u, 7 t, 10 v from dual union all
SELECT 'a' u, 8 t, 10 v from dual union all
SELECT 'a' u, 9 t, 10 v from dual union all
SELECT 'a' u, 10 t, 10 v from dual union all
SELECT 'a' u, 11 t, 10 v from dual union all
-- two Z rows added. In the initial version only the second row would be caught.
SELECT 'z' u, 10 t, 999 v from dual union all
SELECT 'z' u, 11 t, 10 v from dual union all
SELECT 'a' u, 12 t, 100 v from dual )
, wind as (select 3 rng from dual)
SELECT dd.*, sum_5_minutes
from dat dd
JOIN (
SELECT * FROM (
select d.*
,sum(v) -- cumulative sum over the previous five minutes
over (partition by u order by t
range w.rng preceding) as sum_5_minutes
,min(t) -- start point of the range that we are covering
over (partition by u order by t
range w.rng preceding) as rng_5_minutes
from dat d
join wind w on 1=1
) WHERE sum_5_minutes > 1000 ) fails
on dd.u = fails.u
and dd.t >= fails.rng_5_minutes
and dd.t <= fails.t
order by dd.u, dd.t;

Here is my attempt at this:
select
s1."user", s1."time", sum (s2."value") as five_minute_value
from
sample s1
left join sample s2 on
s1."user" = s2."user" and
s1."time" between s2."time" and s2."time" + 4
group by
s1."user", s1."time"
having
sum (s2."value") > 1000
Output on your data:
a 8 1017
a 9 1027
a 6 1012
a 5 1004

Related

Oracle: Analytical functions Sub totals after each change in value

I have the following data (order of records as in the example):
A B
1 10
1 20
1 30
1 40
2 50
2 65
2 75
1 89
1 100
from SQL:
with x as (
select A, B
from (
select 1 as A, 10 as B from dual
union all
select 1 as A, 20 as B from dual
union all
select 1 as A, 30 as B from dual
union all
select 1 as A, 40 as B from dual
union all
select 2 as A, 50 as B from dual
union all
select 2 as A, 65 as B from dual
union all
select 2 as A, 75 as B from dual
union all
select 1 as A, 89 as B from dual
union all
select 1 as A, 100 as B from dual
)
)
select A, B
from X
I want to group the data for each change of value in column A,
I want to get the following result:
A MIN(B) MAX(B)
1 10 40
2 50 75
1 89 100
How to get such a result in the ORACLE 11. I would expect a simple implementation...
This is a gaps and islands problem, solved using row_number analytic function
SELECT a,
MIN(b),
MAX(b)
FROM (
SELECT x.*,
ROW_NUMBER() OVER(
ORDER BY b
) - ROW_NUMBER() OVER(
PARTITION BY a
ORDER BY b
) AS seq
FROM x
)
GROUP BY a,
seq;
Demo

BigQuery SQL how to get total count when using LIMIT

If I use LIMIT 10 in a SQL query (using BigQuery), is there a way to also return the total count?
For example, 100 rows exist. How can I query to return the first 10 but also display to users how many rows are available in total without doing a separate count(id) aggregate query?
To add to Mikhail's answer, you may want to do this to see the count of the unique values in a grouped query. In the following example, there are 10 unique values of R, but you only want to see the the first 4, along with the count of the unique rows. I also added showing the count for each group and overall count of every row. (Standard SQL below)
WITH YourTable AS (
SELECT 1 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 10 AS r
)
SELECT
r,
SUM(1) OVER (ORDER BY r ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CountOfAllUniqueRows,
COUNT(r) AS CountOfEachR,
SUM(COUNT(R)) OVER (ORDER BY r ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CountOfAllRows
FROM YourTable
GROUP BY r
ORDER BY r
LIMIT 4
And giving the results of:
r CountOfAllUniqueRows CountOfEachR CountOfAllRows
1 10 8 68
2 10 6 68
3 10 7 68
4 10 6 68
Don't know why you would wanted to do this - maybe because of cost - so you avoid second scan - anyway - Below "trick' might work for you.
While getting only as many rows as you wish - you also getting total rows count but within each and every output row - So you need to handle this on your own when displaying this to user
With BigQuery Legacy SQL:
SELECT
r, cnt
FROM (
SELECT
r,
COUNT(r) OVER() AS cnt,
ROW_NUMBER() OVER() AS line
FROM
(SELECT 1 AS r),
(SELECT 2 AS r),
(SELECT 3 AS r),
(SELECT 4 AS r),
(SELECT 5 AS r),
(SELECT 6 AS r),
(SELECT 7 AS r),
(SELECT 8 AS r),
(SELECT 9 AS r),
(SELECT 10 AS r)
)
WHERE line <= 4
or
SELECT
r,
cnt
FROM (
SELECT r
FROM
(SELECT 1 AS r),
(SELECT 2 AS r),
(SELECT 3 AS r),
(SELECT 4 AS r),
(SELECT 5 AS r),
(SELECT 6 AS r),
(SELECT 7 AS r),
(SELECT 8 AS r),
(SELECT 9 AS r),
(SELECT 10 AS r)
) AS YourTable
CROSS JOIN (
SELECT COUNT(1) AS cnt
FROM
(SELECT 1 AS r),
(SELECT 2 AS r),
(SELECT 3 AS r),
(SELECT 4 AS r),
(SELECT 5 AS r),
(SELECT 6 AS r),
(SELECT 7 AS r),
(SELECT 8 AS r),
(SELECT 9 AS r),
(SELECT 10 AS r)
) rows
LIMIT 4
With BigQuery Standard SQL:
Don't forget to uncheck Use Legacy SQL checkbox under Show Options
WITH YourTable AS (
SELECT 1 AS r UNION ALL
SELECT 2 AS r UNION ALL
SELECT 3 AS r UNION ALL
SELECT 4 AS r UNION ALL
SELECT 5 AS r UNION ALL
SELECT 6 AS r UNION ALL
SELECT 7 AS r UNION ALL
SELECT 8 AS r UNION ALL
SELECT 9 AS r UNION ALL
SELECT 10 AS r
)
SELECT
r,
(SELECT COUNT(1) FROM YourTable) AS cnt
FROM YourTable
LIMIT 4
In all cases result is
r cnt
1 10
2 10
3 10
4 10

How can I find unoccupied id numbers in a table?

In my table I want to see a list of unoccupied id numbers in a certain range.
For example there are 10 records in my table with id's: "2,3,4,5,10,12,16,18,21,22" and say that I want to see available ones between 1 and 25. So I want to see a list like:
1,6,7,89,11,13,14,15,17,19,20,23,24,25
How should I write my sql query?
Select the numbers form 1 to 25 and show only those that are not in your table
select n from
( select rownum n from dual connect by level <= 25)
where n not in (select id from table);
Let's say you a #numbers table with three numbers -
CREATE TABLE #numbers (num INT)
INSERT INTO #numbers (num)
SELECT 1
UNION
SELECT 3
UNION
SELECT 6
Now, you can use CTE to generate numbers recursively from 1-25 and deselect those which are in your #numbers table in the WHERE clause -
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 25
)
SELECT n FROM n
WHERE n NOT IN (select num from #numbers)
ORDER BY n
OPTION (MAXRECURSION 25);
You can try using the "NOT IN" clause:
select
u1.user_id + 1 as start
from users as u1
left outer join users as u2 on u1.user_id + 1 = u2.id
where
u2.id is null
see also SQL query to find Missing sequence numbers
You need LISTAGG to get the output in a single row.
SQL> WITH DATA1 AS(
2 SELECT LEVEL rn FROM dual CONNECT BY LEVEL <=25
3 ),
4 data2 AS(
5 SELECT 2 num FROM dual UNION ALL
6 SELECT 3 FROM dual UNION ALL
7 SELECT 4 from dual union all
8 SELECT 5 FROM dual UNION ALL
9 SELECT 10 FROM dual UNION ALL
10 SELECT 12 from dual union all
11 SELECT 16 from dual union all
12 SELECT 18 FROM dual UNION ALL
13 SELECT 21 FROM dual UNION ALL
14 SELECT 22 FROM dual)
15 SELECT listagg(rn, ',')
16 WITHIN GROUP (ORDER BY rn) num_list FROM data1
17 WHERE rn NOT IN(SELECT num FROM data2)
18 /
NUM_LIST
----------------------------------------------------
1,6,7,8,9,11,13,14,15,17,19,20,23,24,25
SQL>

Calculating data point which have Precision of 99%

We have a table which have millions of entry. The table have two columns, now there is correlation between X and Y when X is beyond a value, Y tends to be B (However it is not always true, its a trend not a certainty).
Here i want to find the threshold value for X, i.e(X1) such that at least 99% of the value which are less than X1 are B.
It can be done using code easily. But is there a SQL query which can do the computation.
For the below dataset expected is 6 because below 6 more than 99% is 'B' and there is no bigger value of X for which more than 99% is 'B'. However if I change it to precision of 90% then it will become 12 because if X<12 more than 90% of the values are 'B' and there is no bigger value of X for which it holds true
So we need to find the biggest value X1 such that at least 99% of the value lesser than X1 are 'B'.
X Y
------
2 B
3 B
3 B
4 B
5 B
5 B
5 B
6 G
7 B
7 B
7 B
8 B
8 B
8 B
12 G
12 G
12 G
12 G
12 G
12 G
12 G
12 G
13 G
13 G
13 B
13 G
13 G
13 G
13 G
13 G
14 B
14 G
14 G
Ok, I think this accomplishes what you want to do, but it will not work for the data volume you are mentioning. I'm posting it anyway in case it can help someone else provide an answer.
This may be one of those cases where the most efficient way is to use a cursor with sorted data.
Oracle has some builting functions for correlation analysis but I've never worked with it so I don't know how they work.
select max(x)
from (select x
,y
,num_less
,num_b
,num_b / nullif(num_less,0) as percent_b
from (select x
,y
,(select count(*) from table b where b.x<a.x) as num_less
,(select count(*) from table b where b.x<a.x and b.y = 'B') as num_b
from table a
)
where num_b / nullif(num_less,0) >= 0.99
);
The inner select does the following:
For every value of X
Count the nr of values < X
Count the nr of 'B'
The next SELECT computes the ratio of B's and filter only the rows where the ratio is above the threshold. The outer just picks the max(x) from those remaining rows.
Edit:
The non-scalable part in the above query is the semi-cartesian self-joins.
This is mostly inspired from the previous answer, which had some flaws.
select max(next_x) from
(
select
count(case when y='B' then 1 end) over (order by x) correct,
count(case when y='G' then 1 end) over (order by x) wrong,
lead(x) over (order by x) next_x
from table_name
)
where correct/(correct + wrong) > 0.99
Sample data:
create table table_name(x number, y varchar2(1));
insert into table_name
select 2, 'B' from dual union all
select 3, 'B' from dual union all
select 3, 'B' from dual union all
select 4, 'B' from dual union all
select 5, 'B' from dual union all
select 5, 'B' from dual union all
select 5, 'B' from dual union all
select 6, 'G' from dual union all
select 7, 'B' from dual union all
select 7, 'B' from dual union all
select 7, 'B' from dual union all
select 8, 'B' from dual union all
select 8, 'B' from dual union all
select 8, 'B' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'B' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 14, 'B' from dual union all
select 14, 'G' from dual union all
select 14, 'G' from dual;
Give a try with this and share the results:
Assuming table name as table_name and columns as x and y
with TAB AS (
select (count(x) over (PARTITION BY Y order by x rows between unbounded preceding and current row))/
(COUNT(case when y='B' then 1 end) OVER (PARTITION BY Y)) * 100 CC, x, y
from table_name)
select x,y from (SELECT min(cc) over (partition by y) min_cc, x, cc, y
FROM TAB
where cc >= 99)
where min_cc = cc

Oracle - Convert value from rows into ranges

Are there any techniques that would allow a row set like this
WITH
base AS
(
SELECT 1 N FROM DUAL UNION ALL
SELECT 2 N FROM DUAL UNION ALL
SELECT 3 N FROM DUAL UNION ALL
SELECT 6 N FROM DUAL UNION ALL
SELECT 7 N FROM DUAL UNION ALL
SELECT 17 N FROM DUAL UNION ALL
SELECT 18 N FROM DUAL UNION ALL
SELECT 19 N FROM DUAL UNION ALL
SELECT 21 N FROM DUAL
)
SELECT a.N
FROM base a
to yield results
1 3
6 7
17 19
21 21
It is in effect a rows to ranges operation.
I'm playing in Oracle Land, and would appreciate any suggestions.
I feel like this can probably be improved on, but it works:
WITH base AS (
SELECT 1 N FROM DUAL UNION ALL
SELECT 2 N FROM DUAL UNION ALL
SELECT 3 N FROM DUAL UNION ALL
SELECT 6 N FROM DUAL UNION ALL
SELECT 7 N FROM DUAL UNION ALL
SELECT 17 N FROM DUAL UNION ALL
SELECT 18 N FROM DUAL UNION ALL
SELECT 19 N FROM DUAL UNION ALL
SELECT 21 N FROM DUAL
)
, lagged AS
(
SELECT n, LAG(n) OVER (ORDER BY n) lag_n FROM base
)
, groups AS
(
SELECT n, row_number() OVER (ORDER BY n) groupnum
FROM lagged
WHERE lag_n IS NULL OR lag_n < n-1
)
, grouped AS
(
SELECT n, (SELECT MAX(groupnum) FROM groups
WHERE groups.n <= base.n
) groupnum
FROM base
)
SELECT groupnum, MIN(n), MAX(n)
FROM grouped
GROUP BY groupnum
ORDER BY groupnum
Another way:
WITH base AS
(
SELECT 1 N FROM DUAL UNION ALL
SELECT 2 N FROM DUAL UNION ALL
SELECT 3 N FROM DUAL UNION ALL
SELECT 6 N FROM DUAL UNION ALL
SELECT 7 N FROM DUAL UNION ALL
SELECT 17 N FROM DUAL UNION ALL
SELECT 18 N FROM DUAL UNION ALL
SELECT 19 N FROM DUAL UNION ALL
SELECT 21 N FROM DUAL
)
select min(n), max(n) from
(
select n, connect_by_root n root from base
connect by prior n = n-1
start with n not in (select n from base b
where exists (select 1 from base b1 where b1.n = b.n-1)
)
)
group by root
order by root
Yet another way:
with base as (
select 1 n from dual union all
select 2 n from dual union all
select 3 n from dual union all
select 6 n from dual union all
select 7 n from dual union all
select 17 n from dual union all
select 18 n from dual union all
select 19 n from dual union all
select 21 n from dual)
select a,b
from (select a
,case when b is not null and a is not null
then b
else lead(n) over (order by n)
end b
from (select n
,a
,b
from (select n
,case n-1 when lag (n) over (order by n) then null else n end a
,case n+1 when lead (n) over (order by n) then null else n end b
from base)
where a is not null
or b is not null))
where a is not null
order by a