Running a Count on an Interval - sql

I'm trying to do an alert of sorts for customers joining. The alert needs to run on an interval of one hour, which is possible with an integration we have.
The sample data is this:
Name
Time
John
2022-04-21T13:49:51
Mary
2022-04-23T13:49:51
Dave
2022-04-25T13:49:51
Gregg
2022-04-27T13:49:51
so the problem with the below output is this only captures the "count" within the hour. And will yield no results. But I'm trying to determine the moment (well, within the hour) the threshold crosses above a count of 3. Is there something I'm missing?
SELECT COUNT (name)
FROM Table
WHERE Time >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 MINUTE)
HAVING COUNT (NAME) > 3

Related

How to aggregate unique users by hour in Amazon Redshift?

With Amazon Redshift I want to count every unique visitor.
A unique visitor is a visitor who did not visit less than an hour previously.
So for the following rows of users and timestamps we'd get a total count of 4 unique visitors with user1 and user2 counting as 2 respectively.
Please note that I do not want to aggregate by hour in a 24 hour day. I want to aggregate by an hour after the time stamp of the users first visit.
I'm guessing a straight up SQL expression won't do it.
user1,"2015-07-13 08:28:45.247000"
user1,"2015-07-13 08:30:17.247000"
user1,"2015-07-13 09:35:00.030000"
user1,"2015-07-13 09:54:00.652000"
user2,"2015-07-13 08:28:45.247000"
user2,"2015-07-13 08:30:17.247000"
user2,"2015-07-13 09:35:00.030000"
user2,"2015-07-13 09:54:00.652000"
So user1 arrives at 8:28, that counts as one hit. He comes back at 8:30 which counts as zero. He then comes back at 9:35 which is more than an hour from 8:30, so he gets another hit. Then he comes back at 9:35 which is only 5 minutes from the last time 9:30 so this counts as zero. The total is 2 hits for user1. The same thing happens for user2 meaning two hits each bringing it to a final total of 4.
You can use lag to accomplish this. However, you will also have to handle for end of day by partitioning on day as well. The query below would be a starting point.
with prev as (
select user_id,
datecol,
coalesce(lag(datecol) over(partition by user_id order by datecol),0) as prev
from tablename
)
select user_id,
sum(case when datediff(minutes, datecol, prev) >=60 then 1 else 0 end) as totalvisits
from prev
group by user_id

Plot a graph of the number of people online between a time range

I have a database model that stores
visit time
last seen time
how many seconds online (derived value, calculated by subtracting last seen time from visit time)
I need to build a graph of online people for a time range (say 8pm to 9pm). I'm thinking of the x-axis as the time with the y-axis as the number of people. The granularity is 1 minute for the graph, but I have data granular to 5 seconds.
I can't just sum the seconds online value because people visit before or after 8pm.
I was thinking of just loading up all records found in a particular day and doing calculations in memory (which I would probably do for now, then just cache the derived values for later) but I wanted to know if there's a more efficient way?
I wonder if there's a special sql query group by thing I can do to make this work.
Edit: Here's a graphical representation I am stealing from another question (Count Grouped Gaps In Time For Time Range) :P
|.=========]
|.=============]
|=========.======]
|===.=================.====]
|.=================.==========]
T 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
The bars represent the data I've stored (visit time, last seen time, and seconds online) and I need to know at a particular point how many are online. In this example for T=0 the number is 3 and for T=9 the number is 4)
Q: I can't understand what you mean with "but I have data granular to 5 seconds", how many records do you store per visit? Can you add some example data?
A: There's only one record per visit. Granular to 5 seconds means I'm storing up to 5 seconds worth of accurate data.
Sample data as requested:
id visit_time last_seen_time seconds_online
1 00:00:00 00:00:12 10
2 00:12:41 00:12:47 5
3 00:01:20 00:01:22 0
4 00:01:22 00:01:27 5
In this particular case, if I graph the people online at 00:00:00 there would be one person, until 00:00:15 where there would be 0 people:
4|
3|
2| *
1|* *
-*****-******************-
Very interesting and hard question, if we suppose that the interval of graph should be by hour (for example from 8.00 to 8.59), and the granularity by minute we can leverage the problem by extract this date parts (if you are using postgresql the function to use should be EXTRACT), I also suggest to use Commont Table Expressions.
We can then build a CTE to have first minute and last minute of each visit in the target hour, like:
SELECT CASE WHEN EXTRACT(hour FROM visit_time) = 8
THEN EXTRACT(minute FROM visit_time)
ELSE 0 END AS first_minute,
CASE WHEN EXTRACT(hour FROM last_seen_time) = 8
THEN EXTRACT(minute FROM last_seen_time)
ELSE 59 END AS last_minute
FROM visit_table
WHERE EXTRACT(hour FROM visit_time) <= 8 AND EXTRACT(hour FROM last_seen_time) >= 8
The number of visitor changes when a new visit begin or a visit ends, so we can build a second CTE from the first to have a list of all minutes where the visitors' number changes, lets name target the first CTE, then the latter could be defined as:
SELECT first_minute AS minute
FROM target
UNION
SELECT last_minute AS minute
FROM target
The UNION will also eliminate duplicates.
Finally we can join the two tables and count the visitors:
WITH target AS (
SELECT CASE WHEN EXTRACT(hour FROM visit_time) = 8
THEN EXTRACT(minute FROM visit_time)
ELSE 0 END AS first_minute,
CASE WHEN EXTRACT(hour FROM last_seen_time) = 8
THEN EXTRACT(minute FROM last_seen_time)
ELSE 59 END AS last_minute
FROM visit_table
WHERE EXTRACT(hour FROM visit_time) <= 8
AND EXTRACT(hour FROM last_seen_time) >= 8
), time_table AS (
SELECT first_minute AS minute
FROM target
UNION
SELECT last_minute AS minute
FROM target
)
SELECT time_table.minute, COUNT(*) AS Users
FROM target INNER JOIN
time_table ON time_table.minute BETWEEN target.first_minute
AND target.last_minute
GROUP BY time_table.minute
ORDER BY time_table.minute
You should obtain a table where the first record contains the first minute, within the target hour, when there is at least an online visitor, with the number of online people, then you have a record for each change of the number of online people, with the minute of the change and the new number of online people, you can easily make your graph from this.
Sorry if I can't test this solution, but I hope it could help you anyway.

Query - find empty interval in series of timestamps

I have a table that stores historical data. I get a row inserted in this query every 30 seconds from different type of sources and obviously there is a time stamp associated.
Let's make my parameter as disservice to 1 hour.
Since I charge my services based on time, I need to know, for example, in a specific month, if there is a period within this month in which the there is an interval which is equal or exceeds my 1 hour interval.
A simplified structure of the table would be like:
tid serial primary key,
tunitd id int,
tts timestamp default now(),
tdescr text
I don't want to write a function that loops through all the records comparing them one by one as I suppose it is time and memory consuming.
Is there any way to do this directly from SQL maybe using the interval type in PostgreSQL?
Thanks.
this small SQL query will display all gaps with the duration more than one hour:
select tts, next_tts, next_tts-tts as diff from
(select a.tts, min(b.tts) as next_tts
from test1 a
inner join test1 b ON a.tts < b.tts
GROUP BY a.tts) as c
where next_tts - tts > INTERVAL '1 hour'
order by tts;
SQL Fiddle

Oracle timestamp difference greater than X hours/days/months

I am trying to write a query to run on Oracle database. The table ActionTable contains actionStartTime and actionEndTime columns. I need to find out which action took longer than 1 hour to complete.
actionStartTime and actionEndTime are of timestamp type
I have a query which gives me the time taken for each action:
select (actionEndTime - actionStartTime) actionDuration from ActionTable
What would be my where clause that would return only actions that took longer than 1 hour to finish?
Subtracting two timestamps returns an interval. So you'd want something like
SELECT (actionEndTime - actionStartTime) actionDuration
FROM ActionTable
WHERE actionEndTime - actionStartTime > interval '1' hour

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.