How to select most dense 1 min in Oracle - sql

I have table with time stamp column tmstmp, this table contains log of certain events. I need to find out the max number events which occurred within 1 min interval.
Please read carefully! I do NOT want to extract the time stamps minute fraction and sum like this:
select count(*), TO_CHAR(tmstmp,'MI')
from log_table
group by TO_CHAR(tmstmp,'MI')
order by TO_CHAR(tmstmp,'MI');
It needs to take 1st record and then look ahead until it selects all records within 1 min from the 1st and sum number of records, then take 2nd and do the same etc..
And as the result there must be a recordset of (sum, starting timestamp).
Anyone has a snippet of code somewhere and care to share please?

Analytic function with a logical window can provide this information directly:
select l.tmstmp,
count(*) over (order by tmstmp range between current row and interval '59.999999' second following) cnt
from log_table l
order by 1
;
TMSTMP CNT
--------------------------- ----------
01.01.16 00:00:00,000000000 4
01.01.16 00:00:10,000000000 4
01.01.16 00:00:15,000000000 3
01.01.16 00:00:20,000000000 2
01.01.16 00:01:00,000000000 3
01.01.16 00:01:40,000000000 2
01.01.16 00:01:50,000000000 1
Please adjust the interval length for your precision. It must be the highest possible value below 1 minute.
To get the maximal minute use the subquery (and don't forget you may receive more that one record - with the MAX count):
with tst as (
select l.tmstmp,
count(*) over (order by tmstmp range between current row and interval '59.999999' second following) cnt
from log_table l)
select * from tst where cnt = (select max(cnt) from tst);
TMSTMP CNT
--------------------------- ----------
01.01.16 00:00:00,000000000 4
01.01.16 00:00:10,000000000 4

I think you can achieve your goal using a subquery in SELECT statement, as follow:
SELECT tmstmp, (
SELECT COUNT(*)
FROM log_table t2
WHERE t2.tmstmp >= t.tmstmp AND t2.tmstmp < t.tmstmp + 1 / (24*60)
) AS events
FROM log_table t;

One method uses a join and aggregation:
select t.*
from (select l.tmstmp, count(*)
from log_table l join
log_table l2
on l2.tmstmp >= l.tmstmp and
l2.tmstmp < l.tmstmp + interval '1' minute
group by l.tmpstmp
order by count(*) desc
) t
where rownum = 1;
Note: This assumes that tmstmp is unique on each row. If this is not true, then the subquery should be aggregating by some column that is unique.
EDIT:
For large data, there is a more efficient way that makes use of cumulative sums:
select tmstamp - interval 1 minute as starttm, tmstamp as endtm, cumulative
from (select tmstamp, sum(inc) over (order by tmstamp) as cumulative
from (select tmstamp, 1 as inc from log_table union all
select tmstamp + interval '1' day, -1 as inc from log_table
) t
order by sum(inc) over (order by tmstamp) desc
) t
where rownum = 1;

Related

Get the row where the sum of a value matches a condition

I have a table with the columns:
date (timestamp)
num (integer)
Looks like this in CSV:
"date","num"
"2018-02-07 00:00:00","1"
"2018-02-16 00:00:00","1"
"2018-03-02 00:00:00","4"
"2018-04-04 00:00:00","6"
"2018-06-07 00:00:00","1"
I want different queries to figure out the following:
A: The earliest date that the sum of num is >= 1
B: The earliest date that the sum of num is >= 2
In the sample data the output would be A: 2018-02-07 and B: 2018-02-16.
Note that if the first date in the data had a num higher than 1 then A and B would both equal the same date.
Grouping and using MIN(date) would be good enough to satisfy A but I can't figure out how to get B to work if there are two days with num = 1 right after another. Any ideas are appreciated.
Use a cumulative sum. For a single number:
select t.*
from (select t.*, sum(num) over (order by date) as running
from t
) t
where running >= 1 and running - num < 1
order by date
limit 1;
If you wanted multiple thresholds at the same time:
select min(date) filter (where running >= 1) as date_1,
min(date) filter (where running >= 2) as date_2
from (select t.*, sum(num) over (order by date) as running
from t
) t;
Or, if you want them on separate rows:
select distinct on (threshold) v.threshold, t.*
from (select t.*, sum(num) over (order by date) as running
from t
) t cross join
(values (1), (2)) v(threshold)
where running >= threshold and running - num < threshold
order by threshold, date

Is there a way to change this BigQuery self-join to use a window function?

Let's say I have a BigQuery table "events" (in reality this is a slow sub-query) that stores the count of events per day, by event type. There are many types of events and most of them don't occur on most days, so there is only a row for day/event type combinations with a non-zero count.
I have a query that returns the count for each event type and day and the count for that event from N days ago, which looks like this:
WITH events AS (
SELECT DATE('2019-06-08') AS day, 'a' AS type, 1 AS count
UNION ALL SELECT '2019-06-09', 'a', 2
UNION ALL SELECT '2019-06-10', 'a', 3
UNION ALL SELECT '2019-06-07', 'b', 4
UNION ALL SELECT '2019-06-09', 'b', 5
)
SELECT e1.type, e1.day, e1.count, COALESCE(e2.count, 0) AS prev_count
FROM events e1
LEFT JOIN events e2 ON e1.type = e2.type AND e1.day = DATE_ADD(e2.day, INTERVAL 2 DAY) -- LEFT JOIN, because the event may not have occurred at all 2 days ago
ORDER BY 1, 2
The query is slow. BigQuery best practices recommend using window functions instead of self-joins. Is there a way to do this here? I could use the LAG function if there was a row for each day, but there isn't. Can I "pad" it somehow? (There isn't a short list of possible event types. I could of course join to SELECT DISTINCT type FROM events, but that probably won't be faster than the self-join.)
A brute force method is:
select e.*,
(case when lag(day) over (partition by type order by date) = dateadd(e.day, interval -2 day)
then lag(cnt) over (partition by type order by date)
when lag(day, 2) over (partition by type order by date) = dateadd(e.day, interval -2 day)
then lag(cnt, 2) over (partition by type order by date)
end) as prev_day2_count
from events e;
This works fine for a two day lag. It gets more cumbersome for longer lags.
EDIT:
A more general form uses window frames. Unfortunately, these have to be numeric so there is an additional step:
select e.*,
(case when min(day) over (partition by type order by diff range between 2 preceding and current day) = date_add(day, interval -2 day)
then first_value(cnt) over (partition by type order by diff range between 2 preceding and current day)
end)
from (select e.*,
date_diff(day, max(day) over (partition by type), day) as diff -- day is a bad name for a column because it is a date part
from events e
) e;
And duh! The case expression is not necessary:
select e.*,
first_value(cnt) over (partition by type order by diff range between 2 preceding and 2 preceding)
from (select e.*,
date_diff(day, max(day) over (partition by type), day) as diff -- day is a bad name for a column because it is a date part
from events e
) e;
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, IFNULL(FIRST_VALUE(count) OVER (win), 0) prev_count
FROM `project.dataset.events`
WINDOW win AS (PARTITION BY type ORDER BY UNIX_DATE(day) RANGE BETWEEN 2 PRECEDING AND 2 PRECEDING)
If t apply to sample data from your question - result is:
Row day type count prev_count
1 2019-06-08 a 1 0
2 2019-06-09 a 2 0
3 2019-06-10 a 3 1
4 2019-06-07 b 4 0
5 2019-06-09 b 5 4

Vertica Analytic function to count instances in a window

Let's say I have a dataset with two columns: ID and timestamp. My goal is to count return IDs that have at least n timestamps in any 30 day window.
Here is an example:
ID Timestamp
1 '2019-01-01'
2 '2019-02-01'
3 '2019-03-01'
1 '2019-01-02'
1 '2019-01-04'
1 '2019-01-17'
So, let's say I want to return a list of IDs that have 3 timestamps in any 30 day window.
Given above, my resultset would just be ID = 1. I'm thinking some kind of windowing function would accomplish this, but I'm not positive.
Any chance you could help me write a query that accomplishes this?
A relatively simple way to do this involves lag()/lead():
select t.*
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;
The lag() looks at the third timestamp in a series. The where checks if this is within 30 days of the original one. The result is rows where this occurs.
If you just want the ids, then:
select distinct id
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;

SQL query to calculate sum of row with previous row

This is my table
date count of subscription per date
---- ----------------------------
21-03-2016 10
22-03-2016 30
23-03-2016 40
Please need your help, I need to get the result like below table, summation second row with first row, same thing for another rows:
date count of subscription per date
---- ----------------------------
21-03-2016 10
22-03-2016 40
23-03-2016 80
You can do a cumulative sum using the ANSI standard analytic SUM() function:
select date, sum(numsubs) over (order by date) as cume_numsubs
from t;
Select sum(col1) over(order by date rows between unbounded preceding and current row) cnt from mytable;
SELECT t.date, (
SELECT SUM(numsubs)
FROM mytable t2
WHERE t2.date <= t.date
) AS cnt
FROM mytable t

Counting an already counted column in SQL (db2)

I'm pretty new to SQL and have this problem:
I have a filled table with a date column and other not interesting columns.
date | name | name2
2015-03-20 | peter | pan
2015-03-20 | john | wick
2015-03-18 | harry | potter
What im doing right now is counting everything for a date
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
what i want to do now is counting the resulting lines and only returning them if there are less then 10 resulting lines.
What i tried so far is surrounding the whole query with a temp table and the counting everything which gives me the number of resulting lines (yeah)
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select count(*)
from temp_count
What is still missing the check if the number is smaller then 10.
I was searching in this Forum and came across some "having" structs to use, but that forced me to use a "group by", which i can't.
I was thinking about something like this :
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
having count(*) < 10
maybe im too tired to think of an easy solution, but i can't solve this so far
Edit: A picture for clarification since my english is horrible
http://imgur.com/1O6zwoh
I want to see the 2 columned results ONLY IF there are less then 10 rows overall
I think you just need to move your having clause to the inner query so that it is paired with the GROUP BY:
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
having count(*) < 10
)
select *
from temp_count
If what you want is to know whether the total # of records (after grouping), are returned, then you could do this:
with temp_count (date, counter) as
(
select date, counter=count(*)
from testtable
where date >= current date - 10 days
group by date
)
select date, counter
from (
select date, counter, rseq=row_number() over (order by date)
from temp_count
) x
group by date, counter
having max(rseq) >= 10
This will return 0 rows if there are less than 10 total, and will deliver ALL the results if there are 10 or more (you can just get the first 10 rows if needed with this also).
In your temp_count table, you can filter results with the WHERE clause:
with temp_count (date, counter) as
(
select date, count(distinct date)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
where counter < 10
Something like:
with t(dt, rn, cnt) as (
select dt, row_number() over (order by dt) as rn
, count(1) as cnt
from testtable
where dt >= current date - 10 days
group by dt
)
select dt, cnt
from t where 10 >= (select max(rn) from t);
will do what you want (I think)