Postgres count items by interval

Postgres count items by interval - sql

I am trying to get the count of items given an interval with no start or stop times specified. I would imagine you could do it with window functions but i am not too sure how to go about it.
The problem is as follows i would like to get the number of times people login to a website within a given an arbitrary interval say 20 mins.
Example A
1. 2015-06-24 23:00:00
2. 2015-06-24 23:45:00
3. 2015-06-25 00:00:00
4. 2015-06-25 00:15:00
5. 2015-06-25 00:17:00
6. 2015-06-25 00:21:00
In the above example I would highlight items (2,3),(3,4,5), (4,5,6), (5,6) the output I would like is the
start,end,count
2015-06-25 23:45:00,2015-06-25 00:00:00,2
2015-06-25 00:00:00,2015-06-25 00:17:00,3
2015-06-25 00:15:00,2015-06-25 00:21:00,3
Also only keep the data where count >= 2 otherwise everything will be a valid grouping
Now is a window function the way i should go, cte or is there another practice to adopt?

Try this query with self join:
select a.id, a.log_at, max(b.log_at), count(1)
from logs a
join logs b on b.log_at >= a.log_at and b.log_at <= a.log_at+ '20 m'::interval
group by 1, 2
having count(1) > 1
order by 1

You can get each "day" groups with counts by a query like:
SELECT MIN(last_seen_at), MAX(last_seen_at), COUNT(*)
FROM user_kinds
GROUP BY DATE(last_seen_at)
ORDER BY DATE(last_seen_at) DESC LIMIT 5;
Which on my sample data set yields a result like:
2015-06-26 00:12:30.476548 | 2015-06-26 22:06:25.134322 | 69
2015-06-25 00:46:03.392651 | 2015-06-25 23:49:46.616964 | 14
2015-06-24 14:22:33.578176 | 2015-06-24 23:39:01.32241 | 10
2015-06-23 01:42:53.438663 | 2015-06-23 20:12:21.864601 | 2
(5 rows)

Related

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days

You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00

The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

SQL - Datediff between rows with Rank Applied

I am trying to work out how to to apply a datediff between rows where a rank is applied to the USER ID;
Example of how the data below;
UserID Order Number ScanDateStart ScanDateEnd Minute Difference Rank | Minute Difference Rank vs Rank+1
User1 10-24 10:20:00 10:40:00 20 1 | 5
User1 10-25 10:45:00 10:50:00 5 2 | 33
User1 10-26 11:12:00 11:45:00 33 3 | NULL
User2 10-10 00:09:00 00:09:20 20 1 | 4
User2 10-11 00:09:24 00:09:25 1 2 | 15
User2 10-12 00:09:40 00:10:12 32 3 | 3
User2 10-13 00:10:15 00:10:35 20 4 | NULL
What i'm looking for is how to code the final column of this table.
The rank is applied to UserID ordered by ScanDateStart.
Basically, i want to know the time between the ScanDateEnd of Rank 1, to ScanDateStart of Rank2, and so on, but for each user.... (calculating time between order processing etc)
Appreciate the help

This can be achieved by performing a LEFT JOIN to the same table on the UserID column and the Rank column, plus 1.
The following (simplified) pseudo-code should illustrate how to achieve this:
SELECT R.UserID,
R.Rank,
R1.Diff
FROM Rank R
LEFT JOIN Rank R1 ON R1.UserID = R.UserID AND R1.Rank = R.Rank + 1
Effectively, you are showing the UserID and Rank from the current row, but the Difference from the row of the same UserID with the Rank + 1.

sql sliding window - finding max value over interval

i have a sliding window problem. specifically, i do not know where my window should start and where it should end. i do know the size of my interval/window.
i need to find the start/end of the window that delivers the best (or worst, depending on how you look at it) case scenario.
here is an example dataset:
value | tstamp
100 | 2013-02-20 00:01:00
200 | 2013-02-20 00:02:00
300 | 2013-02-20 00:03:00
400 | 2013-02-20 00:04:00
500 | 2013-02-20 00:05:00
600 | 2013-02-20 00:06:00
500 | 2013-02-20 00:07:00
400 | 2013-02-20 00:08:00
300 | 2013-02-20 00:09:00
200 | 2013-02-20 00:10:00
100 | 2013-02-20 00:11:00
let's say i know that my interval needs to be 5 minutes. so, i need to know the value and timestamps included in the 5 minute interval where the sum of 'value' is the highest. in my above example, the rows from '2013-02-20 00:04:00' to '2013-02-20 00:08:00' would give me a sum of 400+500+600+500+400 = 2400, which is the highest value over 5 minutes in that table.
i'm not opposed to using multiple tables if needed. but i'm trying to find a "best case scenario" interval. results can go either way, as long as they net the interval. if i get all data points over that interval, it still works. if i get the start and end points, i can use those as well.
i've found several sliding window problems for SQL, but haven't found any where the window size is the known factor, and the starting point is unknown.

SELECT *,
(
SELECT SUM(value)
FROM mytable mi
WHERE mi.tstamp BETWEEN m.tstamp - '5 minute'::INTERVAL AND m.tstamp
) AS maxvalue
FROM mytable m
ORDER BY
maxvalue DESC
LIMIT 1
In PostgreSQL 11 and above:
SELECT SUM(value) OVER (ORDER BY tstamp RANGE '5 minute' PRECEDING) AS maxvalue,
*
FROM mytable m
ORDER BY
maxvalue DESC
LIMIT 1

Group records by time

I have a table containing a datetime column and some misc other columns. The datetime column represents an event happening. It can either contains a time (event happened at that time) or NULL (event didn't happen)
I now want to count the number of records happening in specific intervals (15 minutes), but do not know how to do that.
example:
id | time | foreign_key
1 | 2012-01-01 00:00:01 | 2
2 | 2012-01-01 00:02:01 | 4
3 | 2012-01-01 00:16:00 | 1
4 | 2012-01-01 00:17:00 | 9
5 | 2012-01-01 00:31:00 | 6
I now want to create a query that creates a result set similar to:
interval | COUNT(id)
2012-01-01 00:00:00 | 2
2012-01-01 00:15:00 | 2
2012-01-01 00:30:00 | 1
Is this possible in SQL or can anyone advise what other tools I could use? (e.g. exporting the data to a spreadsheet program would not be a problem)

Give this a try:
select datetime((strftime('%s', time) / 900) * 900, 'unixepoch') interval,
count(*) cnt
from t
group by interval
order by interval
Check the fiddle here.

I have limited SQLite background (and no practice instance), but I'd try grabbing the minutes using
strftime( FORMAT, TIMESTRING, MOD, MOD, ...)
with the %M modifier (http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html)
Then divide that by 15 and get the FLOOR of your quotient to figure out which quarter-hour you're in (e.g., 0, 1, 2, or 3)
cast(x as int)
Getting the floor value of a number in SQLite?
Strung together it might look something like:
Select cast( (strftime( 'YYYY-MM-DD HH:MI:SS', your_time_field, '%M') / 15) as int) from your_table
(you might need to cast before you divide by 15 as well, since strftime probably returns a string)
Then group by the quarter-hour.
Sorry I don't have exact syntax for you, but that approach should enable you to get the functional groupings, after which you can massage the output to make it look how you want.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres count items by interval - sql

Try this query with self join: select a.id, a.log_at, max(b.log_at), count(1) from logs a join logs b on b.log_at >= a.log_at and b.log_at <= a.log_at+ '20 m'::interval group by 1, 2 having count(1) > 1 order by 1

Related

Creating a new calculated column in SQL

Getting date difference between consecutive rows in the same group

SQL - Datediff between rows with Rank Applied

sql sliding window - finding max value over interval

Group records by time

Categories

Resources