How to get the maximum interim value of a parameter in a select statement in sql server? - sql

How to get the maximum interim value of a parameter in a select statement in sql server?
Example:
I have a table userconnection that contains the login and logout time as below:
action, time, user
Login, 2013-24-11 13:00:00, a
Login, 2013-24-11 13:30:00, b
Login, 2013-24-11 14:00:00, c
Logout, 2013-24-11 14:10:00, b
...
...
...
Can anyone help me with the query below to show max concurrent users at any time during the day (=3 from the above example set) and current time of the day (=2 from the above example set?
[select DateAdd(day, 0, DateDiff(day, 0, time)) calanderday,
sum(case when action = 'Login' then 1 when action = 'Logout' then -1
else 0 end) concurrentuser,
max of(concurrentuser interim values) maxconcurrentuser
from userconnection
where time > sysdate - 1
group by DateAdd(day, 0, DateDiff(day, 0, time))
order by calanderday]
I would much appreciate any help with how to get
max of(concurrentuser interim values) maxconcurrentuser?? in the above query without using user defined functions etc, just using inline queries.

I think that this will work, but obviously you've only given us minimal sample data to work from:
;With PairedEvents as (
select a.[user],a.time as timeIn,b.time as TimeOut
from
userconnection a
left join
userconnection b
on
a.[user] = b.[user] and
a.time < b.time and
b.action = 'logout'
left join
userconnection b_anti
on
a.[user] = b_anti.[user] and
a.time < b_anti.time and
b_anti.time < b.time and
b_anti.action = 'logout'
where
a.action = 'Login' and
b_anti.action is null
), PossibleMaxima as (
select pe.timeIn,COUNT(*) as Cnt
from
PairedEvents pe
inner join
PairedEvents pe_all
on
pe_all.timeIn <= pe.timeIn and
(
pe_all.timeOut > pe.timeIn or
pe_all.timeOut is null
)
group by pe.timeIn
), Ranked as (
select *,RANK() OVER (ORDER BY Cnt desc) as rnk
from PossibleMaxima
)
select * from Ranked where rnk = 1
This assumes that all login events can be paired with logout events, and that you don't have stray extras (a logout without a login, or two logins in a row without a logout).
It works by generating 3 CTEs. The first, PairedEvents associates the login rows with their associated logout rows (and needs the above assumption).
Then, in PossibleMaxima, we take each login event and try to find any PairedEvents rows that overlap that time. The number of times that that join succeeds is the number of users who were concurrently online.
Finally, we have the Ranked CTE that gives the maximum value the rank of 1. If there are multiple periods that achieve the maximum then they will each be ranked 1 and returned in the final result set.
If it's possible for multiple users to have identical login times then a slight tweak to PossibleMaxima may be required - but that's only if we need to.

Related

Postgres: Session duration per event (row)

I'm trying to write a query that builds a session duration per each event.
The database houses events from a webapp, each with a session-id and a timestamp.
Each row represents one event.
I thought I could solve this with a recursive query, but every attempt runs for minutes with no return. It's driving me crazy.
This is what I have so far.
with recursive session_time as (
select
f.data->'sessionId' as session_id,
f.ts,
null::timestamp with time zone as prev_timestamp,
0 as session_duration
from arbiter_events as f
union
select
n.data->'sessionId' as session_id,
n.ts,
st.ts as prev_timestamp,
(EXTRACT(epoch from (n.ts - (
select
st.ts
from arbiter_events p
where p.ts < n.ts
order by p.ts desc
limit 1
))) + st.session_duration)::integer as session_duration
from arbiter_events as n
inner join session_time st on st.session_id = n.data->'sessionId'
)
SELECT
ae.customer,
ae.username,
ae.data->'category' as category,
ae.data->'subCategory' as subcategory,
st.session_id,
st.session_duration
from arbiter_events ae
left join session_time st on ae.data->'sessionId' = st.session_id;

How to filter Users that meet CASE criteria without nesting WHERE in SQL?

Right now I have a query that lets me know which users didn't make a purchase 12 months prior to becoming members. These users have MEM_PRE_12=0 and I want to filter off those users more natively using SQL partitions rather than always putting rudimentary WHERE criteria.
Here is the SQL I use to find the users I want/don't want.
SELECT SUM(CASE WHEN DATE <= DATEADD(month, -12, U.INSERTED_AT) THEN 1 ELSE 0 END) AS MEM_PRE_12, I.CLIENTID, I.INSTALLATIONID
FROM <<<My_Joined_Tables>>>
GROUP BY I.CLIENTID, I.INSTALLATIONID
HAVING MEM_PRE_12 != 0
ORDER BY MEM_PRE_12
After this I'm going to have to go back and say where I.CLIENTID in the above nested query and select the actual information I want from users who made purchases greater than their insertion date.
How can I do this without so much nesting of all these joined tables?
If you want the detailed rows for customers who made a purchase in the last 12 months, you can use window functions:
with q as (
<whatever your query logic is>
)
select q.*
from (select q.*,
SUM(CASE WHEN DATE <= DATEADD(month, -12, U.INSERTED_AT) THEN 1 ELSE 0 END) over (partition by CLIENTID, INSTALLATIONID) as AS MEM_PRE_12
from q
) q
where mem_pre_12 > 0;

Determine cluster of access time within 10min intervals per user per day in SQL Server

How to query in SQL from the sample data, it will group or cluster the access_time per user per day within 10min intervals?
This is a complete guess, based on reading between the lines, and is untested due to a lack of consumable sample data.
It, however, looks like you are after a triangular JOIN (these can perform poorly, especially as this won't be SARGable) and a DENSE_RANK:
SELECT YT.[date],
YT.User_ID,
YT2.AccessTime,
DENSE_RANK() OVER (PARTITION BY YT.[date], YT.User_ID ORDER BY YT1.AccessTime) AS Cluster
FROM dbo.YourTable YT
JOIN dbo.YourTable YT2 ON YT.[date] = YT2.[date]
AND YT.User_ID = YT2.User_ID
AND YT.AccessTime <= YT2.AccessTime --This will join the row to itself
AND DATEADD(MINUTE,10,YT.AccessTime) >= YT2.AccessTime; --That is intentional
If I have understood your problem you want to group all accesses for a user in a day when all accesses of that group are in a time interval of 10 minutes. Not counting single accesses, so an access distant more than 10 minutes from every other is not counted as a cluster.
You can identify the clusters joining the accesses table with itself to get all possible time intervals of 10 minutes and number them.
Finally simply rejoin access table to get accesses for each cluster:
; with
user_clusters as (
select a1.date, a1.user_id, a1.access_time cluster_start, a2.access_time cluster_end,
ROW_NUMBER() over (partition by a1.date, a1.user_id order by a1.access_time) user_cluster_id
from ACCESS_TIMES a1
join ACCESS_TIMES a2 on a1.date = a2.date and a1.user_id = a2.user_id
and a1.access_time < a2.access_time
and datediff(minute, a1.access_time, a2.access_time)<10
)
select *
from user_clusters c
join ACCESS_TIMES a on a.date = c.date and a.user_id = c.user_id and a.access_time between c.cluster_start and cluster_end
order by a.date, a.user_id, c.user_cluster_id, a.access_time
output:
date user_id access_time user_cluster_id
'2020-09-19', 'AA083P', '2020-09-19 18:15:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 2
'2020-09-19', 'AA083P', '2020-09-19 18:28:00', 2
'2020-09-20', 'AB162Y', '2020-09-20 19:34:00', 1
'2020-09-20', 'AB162Y', '2020-09-20 19:37:00', 1

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

Get duration of using app for user from log table

I have a table in which I'm recording logins/logouts. Schema and sample data: http://sqlfiddle.com/#!2/e1b35/1
I need to create stored procedure which will print sum of (logout-login) times for an user and date (duration of using application for user on date). In column type I represents login and O logout.
Input parameters: #username, #date. Output: time.
You need to identify all groups of logins that are related to each other. The way I would do this is by finding the logout associated with logins. You are dealing with log data, so don't be surprised if there are multiple logins with no logout.
select l.*,
(select min(l2.time) from logs l2 where l2.username = l.username and l2.type = 'O' and l2.time > l.time
) as logoutTime
from logs l
where l.type = 'I'
Now, you can use the username and LogoutTime as a pair for aggregation to get what you want:
select username, logoutTime, min(time) as StartTime, logouttime as EndTime,
datediff(s, min(time), logoutTime) as TimeInSeconds
from (select l.*,
(select min(l2.time) from logs l2 where l2.username = l.username and l2.type = 'O' and l2.time > l.time
) as logoutTime
from logs l
where l.type = 'I'
) l
group by username, logoutTime
Note that you specify SQL Server as a tag, but the SQL Fiddle is for MySQL; date functions tend to differ among databases.
And, if you are using SQL Server 2012, there is an easier way. You should specify that if it is the case.
This is Gordon Linoff's solution, modified so it doesn't include the outermost level of aggregation, and uses clearer aliases, IMO
select username,time_in,time_out_min,
datediff(s, time_in, time_out_min) as TimeInSeconds
from (
select i.username, i.time as time_in,
(select min(o.time) as min_time_out
from logs o
where o.username = i.username and
o.type = 'O' and
o.time > i.time
) as time_out_min
from logs i
where i.type = 'I'
) d;