How can I find the time between users session activity using SQL - sql

Using SQL how can you find the time duration or time elapsed between each users session? For instance user_id 1234 had one session on 2017-01-01 00:00:00 and another session on 2017-01-02 (see table below). How can I find the time between the last session_end to beginning of their next session_start.
user_id|session_start |session_end
1234 | 2017-01-01 00:00:00| 2017-01-01 00:30:30
1236 | 2017-01-01 01:00:00| 2017-01-01 01:05:30
1234 | 2017-01-02 12:00:09| 2017-01-02 12:00:30
1234 | 2017-01-01 02:00:00| 2017-01-01 03:30:30
1236 | 2017-01-01 00:00:00| 2017-01-01 00:30:30
Thanks.

This can easily be done using window functions
select user_id, session_start, session_end,
session_start - lag(session_end) over (partition by user_id order by session_start) as time_diff
from the_table
order by user_id, session_start;
Online example: http://rextester.com/NTVH38963
Subtracting one timestamp from another returns an interval to convert that to minutes you can extract the number of get the number of seconds the interval represents and divide them by 60 to get minutes:
select user_id, session_start, session_end,
extract(epoch from
session_start - lag(session_end) over (partition by user_id order by session_start)
) / 60 as minutes
from the_table
order by user_id, session_start;

Here's one way to do it with a subquery:
SELECT dT.user_ID
,dT.max_session_start
,DATEDIFF(minute, (SELECT MAX(session_end)
FROM tablename T
WHERE T.user_ID = dT.user_ID
AND T.session_end < dT.max_session_start)
, dT.max_session_start
) AS minutes
FROM (
SELECT user_ID
,MAX(session_start) AS max_session_start
FROM tablename
GROUP BY user_ID
) AS dT

Related

Finding total session time of a user in postgres

I am trying to create a query that will give me a column of total time logged in for each month for each user.
username | auth_event_type | time | credential_id
Joe | 1 | 2021-11-01 09:00:00 | 44
Joe | 2 | 2021-11-01 10:00:00 | 44
Jeff | 1 | 2021-11-01 11:00:00 | 45
Jeff | 2 | 2021-11-01 12:00:00 | 45
Joe | 1 | 2021-11-01 12:00:00 | 46
Joe | 2 | 2021-11-01 12:30:00 | 46
Joe | 1 | 2021-12-06 14:30:00 | 47
Joe | 2 | 2021-12-06 15:30:00 | 47
The auth_event_type column specifies whether the event was a login (1) or logout (2) and the credential_id indicates the session.
I'm trying to create a query that would have an output like this:
username | year_month | total_time
Joe | 2021-11 | 1:30
Jeff | 2021-11 | 1:00
Joe | 2021-12 | 1:00
How would I go about doing this in postgres? I am thinking it would involve a window function? If someone could point me in the right direction that would be great. Thank you.
Solution 1 partially working
Not sure that window functions will help you in your case, but aggregate functions will :
WITH list AS
(
SELECT username
, date_trunc('month', time) AS year_month
, max(time ORDER BY time) - min(time ORDER BY time) AS session_duration
FROM your_table
GROUP BY username, date_trunc('month', time), credential_id
)
SELECT username
, to_char (year_month, 'YYYY-MM') AS year_month
, sum(session_duration) AS total_time
FROM list
GROUP BY username, year_month
The first part of the query aggregates the login/logout times for the same username, credential_id, the second part makes the sum per year_month of the difference between the login/logout times. This query works well until the login time and logout time are in the same month, but it fails when they aren't.
Solution 2 fully working
In order to calculate the total_time per username and per month whatever the login time and logout time are, we can use a time range approach which intersects the session ranges [login_time, logout_time) with the monthly ranges [monthly_start_time, monthly_end_time) :
WITH monthly_range AS
(
SELECT to_char(m.month_start_date, 'YYYY-MM') AS month
, tsrange(m.month_start_date, m.month_start_date+ interval '1 month' ) AS monthly_range
FROM
( SELECT generate_series(min(date_trunc('month', time)), max(date_trunc('month', time)), '1 month') AS month_start_date
FROM your_table
) AS m
), session_range AS
(
SELECT username
, tsrange(min(time ORDER BY auth_event_type), max(time ORDER BY auth_event_type)) AS session_range
FROM your_table
GROUP BY username, credential_id
)
SELECT s.username
, m.month
, sum(upper(p.period) - lower(p.period)) AS total_time
FROM monthly_range AS m
INNER JOIN session_range AS s
ON s.session_range && m.monthly_range
CROSS JOIN LATERAL (SELECT s.session_range * m.monthly_range AS period) AS p
GROUP BY s.username, m.month
see the result in dbfiddle
Use the window function lag() with a partition it by credential_id ordered by time, e.g.
WITH j AS (
SELECT username, time, age(time, LAG(time) OVER w)
FROM t
WINDOW w AS (PARTITION BY credential_id ORDER BY time
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
)
SELECT username, to_char(time,'yyyy-mm'),sum(age) FROM j
GROUP BY 1,2;
Note: the frame ROWS BETWEEN 1 PRECEDING AND CURRENT ROW is pretty much optional in this case, but it is considered a good practice to keep window functions as explicit as possible, so that in the future you don't have to read the docs to figure out what your query is doing.
Demo: db<>fiddle

How to get the row_numbers in sql where condition is a specific date difference?

I am stuck with a portion of my query to extract the row_numbers that have a date difference of at least three months. So in the example below I would like to extract row_number 1 (always the first one), 5 and 6. So after row_number 1 the row_numbers with a date_diff > 3 months (and after second extracted row_number applying this condition again until None. Is there any function or way within SQL that allows for such a condition to be made?
table_name: users
id row_number User date
---|----------|-------|---------------------|
1 |1 | Usr1 | 2017-10-01 12:35:00 |
2 |2 | Usr1 | 2017-10-01 12:35:00 |
3 |3 | Usr1 | 2017-12-03 07:47:00 |
4 |4 | Usr1 | 2018-01-10 07:47:00 |
5 |5 | Usr1 | 2018-02-10 07:47:00 |
6 |6 | Usr1 | 2018-04-10 07:47:00 |
You can use the lag() function to calculate the difference:
select *
from (
select id, row_number, "User", date,
date - lag(date) over (order by id) as diff
from users
) t
where diff is null -- first row
or diff > interval '3 month';
I'm not sure you want to compare intervals as months. I think they are normally represented as a number of days.
So, I would phrase this as:
select u.*
from (select u.*, lag(date) over (order by id) as prev_date
from users u
) u
where prev_date is null or prev_date < date - interval '3 month';
If the or bothers you, you can remove it by using default values in the lag():
select u.*
from (select u.*, lag(date, 1, date - interval '100 year') over (order by id) as prev_date
from users u
) u
where prev_date < date - interval '3 month';

Oracle: Ordering/sorting records based on the difference value of a column data in different rows of same table

I have a table error_event_table, with 3 columns:
eventtime timestamp;
url varchar2(1024);
errorcount number(30);
the table is having more than 3 million data, I need to find the top n(100) URL's based on the difference of errorcount column value for a given start time and endtime.
for ex: table data as below
eventtime | url |errorcount
2018-01-29 10:20:00 | url1.com | 950
2018-01-29 10:25:00 | url1.com | 1000
2018-01-29 10:20:00 | url2.com | 100
2018-01-29 10:25:00 | url2.com | 400
2018-01-29 10:25:00 | url3.com | 500
2018-01-29 10:10:00 | url35.com | 500
when startTime=2018-01-29 10:20:00 and endTime= 2018-01-29 10:25:00 are passed as inputs to the query, the expected output is:
eventtime | url |errorcount
2018-01-29 10:25:00 | url3.com | 500
2018-01-29 10:25:00 | url2.com | 400
2018-01-29 10:20:00 | url2.com | 100
2018-01-29 10:25:00 | url1.com | 1000
2018-01-29 10:20:00 | url1.com | 950
the query should order the records based on the difference of errorcount number at given start time and end time (inputs for the query) descending and limit the results to top 100. To say in other way, the query should find the top 100 URL's with the max difference at end time and start time, and result the corresponding URL's records at both start time and end time.
it is possible that an URL exists only at end time and not at start time, in that case the start time errorcount should be taken as 0. similarly an URL might exists only at start time in which case the diff will be negative number and i don't want these -ve diff records in my results.
I have tried two approaches and not able to get the proper approach to procede furthur.
Approach 1: Using Group By
SELECT url,
Max(eventtime),
Max(errorcount)
FROM error_event_table
WHERE eventtime IN ( To_date(:startTime, 'yyyymmddHH24MISS'),
To_date(:endTime, 'yyyymmddHH24MISS')
)
GROUP BY url
ORDER BY Max(errorcount)DESC;
Approch 2: Using Self Join
SELECT t2.url eurl,
t1.url surl,
t2.eventtime endtime,
t1.eventtime starttime,
( t2.errorcount - t1.errorcount ) diff
FROM error_event_table t1,
error_event_table t2
WHERE ( t1.eventtime = To_date(:startTime, 'yyyymmddHH24MISS')
OR t2.eventtime = To_date(:endTime, 'yyyymmddHH24MISS') )
AND t2.url (+) = t1.url
ORDER BY ( t2.errorcount - t1.errorcount ) DESC
Pls provides inputs on how to approach for solving this problem..
If I understand correctly, you want to use lag():
select t.*,
(error_count -
lag(error_count, 1, 0) over (partition by url order by eventtime)
) as diff
from t
where <date conditions here>
order by diff desc;
EDIT:
If you just want the URLs with the maximums, then:
select t.*
from (select t.*, row_number() over (partition by url order by diff desc) as seqnum
from (select t.*,
(error_count -
lag(error_count, 1, 0) over (partition by url order by eventtime)
) as diff
from t
where <date conditions here>
) t
) t
where seqnum <= 100
order by diff desc;

sql query to groupby with a deduplicated column

I have the following table
create table events (
event_id,
event_name,
datetime,
email)
And I want to display the events per week, and the events per week deduplicated by emails, in a single query.
While doing:
select date_trunc('week', datetime) wdt, event_name, count(1)
from events
group by wdt, event_name;
wdt | event_name | count
---------------------+-------------+-------
2014-10-27 00:00:00 | deliver | 32
2014-11-17 00:00:00 | open | 30
2014-10-20 00:00:00 | deliver | 25
2014-10-20 00:00:00 | click | 19
2014-10-27 00:00:00 | click | 29
I can get the first column, but I don't know how to have the count_distinct column (if two clicks for the same email, on same week, it counts for one, not two).
Just specify which column to count only distinct values for, like this:
select date_trunc('week', datetime) wdt, event_name, count(distinct email)
from events
group by wdt, event_name;
I think the problem is you just need to do a distinct 1; as you pointed out.
select date_trunc('week', datetime) wdt, event_name, count(distinct 1)
from events
group by wdt, event_name;
however with out the raw data and some samples, I'm not sure how to confirm as I can't see why count 31 and 29 would occur for the same date (10/27) in wdt for the same event_name.

Time spent between log entries in a table

I have a table which tracks the user actions in a web-site. A simplified version is as follows
user_id | action_time | module_name
--------+-------------------------+------------
1 | 2014-03-02 11:13:08.775 | home
1 | 2014-03-02 11:13:08.345 | user
1 | 2014-03-02 11:13:08.428 | discussions
How much time did a user spend on each screen? So take the least action_time for a user, get the next one, find the difference.
I think this calls for a recursive query, but not able to get my head around it. One thing - I wouldn't know when to stop. After some "module" the user could have just closed the browser, without bothering to logout. So "closure" is a bit tricky.
This can be surprisingly simple with the window function lead()
SELECT *
, lead(action_time) OVER (PARTITION BY user_id ORDER BY action_time)
- action_time AS time_spent
FROM tbl;
That's all.
time_spent is NULL for the last action of a user, where no other action follows - which seems perfectly adequate.
this example of how to make a 'range aggregate' using windowing functions and a lot of nested subqueries. I just adapted it to partition and group by user_id, and it seems to do what you want:
SELECT user_id, min(login_time) as login_time, max(logout_time) as logout_time
FROM (
SELECT user_id, login_time, logout_time,
max(new_start) OVER (PARTITION BY user_id ORDER BY login_time, logout_time) AS left_edge
FROM (
SELECT user_id, login_time, logout_time,
CASE
WHEN login_time <= max(lag_logout_time) OVER (
PARTITION BY user_id ORDER BY login_time, logout_time
) THEN NULL
ELSE login_time
END AS new_start
FROM (
SELECT
user_id,
login_time,
logout_time,
lag(logout_time) OVER (PARTITION BY user_id ORDER BY login_time, logout_time) AS lag_logout_time
FROM app_log
) AS s1
) AS s2
) AS s3
GROUP BY user_id, left_edge
ORDER BY user_id, min(login_time)
Results in:
user_id | login_time | logout_time
---------+---------------------+---------------------
1 | 2014-01-01 08:00:00 | 2014-01-01 10:49:00
1 | 2014-01-01 10:55:00 | 2014-01-01 11:00:00
2 | 2014-01-01 09:00:00 | 2014-01-01 11:49:00
2 | 2014-01-01 11:55:00 | 2014-01-01 12:00:00
(4 rows)
It works by first detecting the beginning of each new range (partitioned by user_id), then extending and grouping by the detected ranges. I found I had to read that article very carefully to understand it!
The article suggests it can be simplified with Postgresql>=9.0 by removing the innermost subquery and changing the window range, but I could not get that to work.