SQL query with specific order of user actions - sql

I have a table that looks like this:
user_id user_action timestamp
1 action #2 2016-02-01 00:02
2 action #1 2017-10-05 15:24
3 action #3 2017-03-31 19:35
4 action #1 2017-07-09 00:24
1 action #1 2018-11-05 18:28
1 action #3 2018-02-01 13:02
2 action #2 2017-10-05 16:14
2 action #3 2017-10-05 16:34
etc
My task is to write a query where I can see user sessions, where a user performs action #1, 2, and 3 in that specific order, with time intervals between actions less than an hour. For example, user #2 have a session
2 action #1 2017-10-05 15:24
2 action #2 2017-10-05 16:14
2 action #3 2017-10-05 16:34
Sorry for lack of my own attempt, as I am really stuck and don't know, where to start.
Thanks in advance!

This can be done with window functions lead and lag which get the values from the next and previous rows respecitvely.
select distinct user_id
from (select user_id,user_action,timestamp,
lag(user_action) over(partition by user_id order by timestamp) as prev_action,
lead(user_action) over(partition by user_id order by timestamp) as next_action,
datediff(minute,lag(timestamp) over(partition by user_id order by timestamp),timestamp) as time_diff_with_prev_action,
datediff(minute,timestamp,lead(timestamp) over(partition by user_id order by timestamp)) as time_diff_with_next_action
from tbl
) t
where user_action='action#2' and prev_action='action#1' and next_action='action#3'
and time_diff_with_prev_action <= 60 and time_diff_with_next_action <= 60

Related

Calculate Average between columns by comparing two rows in SQL Server

I have the below table
BidID AppID AppStatus StatusTime
1 1 In Review 2019-01-02 12:00:00
1 1 Approved 2019-01-02 13:00:00
1 2 In Review 2019-01-04 13:00:00
1 2 Approved 2019-01-04 14:00:00
2 2 In Review 2019-01-07 15:00:00
2 2 Approved 2019-01-07 17:00:00
3 1 In Review 2019-01-09 13:00:00
4 1 Approved 2019-01-09 13:00:00
What I am trying to do is first to calculate the average of statusTime minutes difference by the following logic
First group by the BidID and then by AppID and then calculate the time difference between the StatusTime between In Review and Approved AppStatus
eg
First Group BidID,Then group App ID
, Then First Check for In Review Status and Find the Next Approved status and then have to calculate min difference between the dates
BidID AppID AppStatus BidAverage
1 -> 1,2 -> For App ID 1(2019-01-02 1hour 1.5
15:48:42.000 - 2019-01-02
12:33:36.000)
For App ID 2(2019-01-04 2hour
10:33:12.000 - 2019-01-04
10:33:12.000)
2-> 2 -> For App ID 2(2019-01-04 1 1
10:33:12.000 - 2019-01-04
10:33:12.000)
3-> 1-> No Calculation since no Approved
4-> 1-> No Calculation since no In Review before Approved
Final Average (1.5 + 1) / 2 = 1.25 for the table
The time difference excluding saturday I have already figured out Time Dfference Exluding Weekend using David's suggestion.
I am not sure how to check if AppStatus is first in In Review and then Approved and then only calculate the time difference and if there is no Approved like in BidID 3 then don't use that in the average calculation and then average it across the APPId and then the BidID
Thanks
I think you can just use min() and max() for simplicity to get the times for the bid/app pairs. The rest is just aggregation and more aggregation.
The processing you describe seems to be:
select avg(avg_bid_diff)
from (select bid, avg(diff*1.0) as avg_bid_diff
from (select bid, appid,
datediff(second, min(starttime), max(statustime)) as diff
from t
where appstatus in ('In Review', 'Approved')
group by bid, appid
having count(*) = 2
) ba
group by bid
) b;
This makes assumptions that are consistent with the provided data -- that the statuses don't have duplicates for the bid/app pairs an that approval is always after review.

SQL query to show user session length

I have a table that looks like this:
user_id page happened_at
2 'page3' 2017-10-05 11:31
1 'page2' 2016-02-01 00:02
2 'page1' 2017-10-05 15:24
3 'page3' 2017-03-31 19:35
4 'page1' 2017-07-09 00:24
2 'page3' 2017-10-05 15:28
1 'page3' 2018-02-01 13:02
2 'page2' 2017-10-05 16:14
2 'page3' 2017-10-05 16:34
etc
I have a query that identifies user sessions, which are opened pages #1, #2 and #3, in that particular order, made in a time period less than one hour from each other (page3 within an hour of page2, page2 within an hour of page1). Any pages, opened between that, can be ignored. Example of a session from the table above:
user_id page happened_at
2 'page1' 2017-10-05 15:24
2 'page2' 2017-10-05 16:14
2 'page3' 2017-10-05 16:34
My query so far looks like this and shows user_id of users, who had sessions:
select user_id
from (select user_id,page,happened_at,
lag(page) over(partition by user_id order by happened_at) as prev_page,
lead(page) over(partition by user_id order by happened_at) as next_page,
datediff(minute,lag(happened_at) over(partition by user_id order by happened_at),happened_at) as time_diff_with_prev_action,
datediff(minute,happened_at,lead(happened_at) over(partition by user_id order by happened_at)) as time_diff_with_next_action
from tbl
) t
where page='page2' and prev_page='page1' and next_page='page3'
and time_diff_with_prev_action <= 60 and time_diff_with_next_action <= 60
What I need is to edit a query, add 2 columns to the output, session start time and session end time, which is last action + 1 hour. Please advice how to make it. Temporary tables are forbidden, so it should be just a query. Example output should be:
user_id session_start session_end
2 2017-10-05 15:24 2017-10-05 17:34
Thanks for your time!

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Postgres count items by interval

I am trying to get the count of items given an interval with no start or stop times specified. I would imagine you could do it with window functions but i am not too sure how to go about it.
The problem is as follows i would like to get the number of times people login to a website within a given an arbitrary interval say 20 mins.
Example A
1. 2015-06-24 23:00:00
2. 2015-06-24 23:45:00
3. 2015-06-25 00:00:00
4. 2015-06-25 00:15:00
5. 2015-06-25 00:17:00
6. 2015-06-25 00:21:00
In the above example I would highlight items (2,3),(3,4,5), (4,5,6), (5,6) the output I would like is the
start,end,count
2015-06-25 23:45:00,2015-06-25 00:00:00,2
2015-06-25 00:00:00,2015-06-25 00:17:00,3
2015-06-25 00:15:00,2015-06-25 00:21:00,3
Also only keep the data where count >= 2 otherwise everything will be a valid grouping
Now is a window function the way i should go, cte or is there another practice to adopt?
Try this query with self join:
select a.id, a.log_at, max(b.log_at), count(1)
from logs a
join logs b on b.log_at >= a.log_at and b.log_at <= a.log_at+ '20 m'::interval
group by 1, 2
having count(1) > 1
order by 1
You can get each "day" groups with counts by a query like:
SELECT MIN(last_seen_at), MAX(last_seen_at), COUNT(*)
FROM user_kinds
GROUP BY DATE(last_seen_at)
ORDER BY DATE(last_seen_at) DESC LIMIT 5;
Which on my sample data set yields a result like:
2015-06-26 00:12:30.476548 | 2015-06-26 22:06:25.134322 | 69
2015-06-25 00:46:03.392651 | 2015-06-25 23:49:46.616964 | 14
2015-06-24 14:22:33.578176 | 2015-06-24 23:39:01.32241 | 10
2015-06-23 01:42:53.438663 | 2015-06-23 20:12:21.864601 | 2
(5 rows)

Detect Intervals

id_person transaction internation_in internation_out
1 456465 2015-01-01 2015-02-01
2 564564 2015-02-03 2015-04-02
3 4564654 2015-01-01 2015-01-05
4 4564646 2015-01-01 2015-02-04
4 4564656 2015-03-01 2015-04-15
4 87899465 2015-05-16 2015-05-25
5 56456456 2015-01-01 2105-01-08
5 45456546 2015-02-04 2015-03-04
I want to know how to group by id_person the difference (Interval in hours) between the internation_out from the first transaction with the internation_in of the next transaction.
I probe with lag and lead but I can't group by id_person
I Want this Result using id_person 4 for example
id_person transaction Gap
4 4564646 Null
4 4564656 The result of (2015-02-04- 2015-03-01)
4 87899465 The result of (2015-04-15- 2015-05-16)
If your time periods are not overlapping (and yours are not), then there is a simple calculation for the gaps: it is the total number of days from the beginning to the end minus the total on each row. So, you don't need lead() or lag():
select id_person,
(case when count(*) > 1
then (max(internation_out) - min(internation_in) -
sum(internation_out - internation_in)
)
end) as gap_duration
from table t
group by id_person;
Note that this returns NULL if there is only one row for the person. If you want 0, then you don't need the case.