Get duration of using app for user from log table - sql

I have a table in which I'm recording logins/logouts. Schema and sample data: http://sqlfiddle.com/#!2/e1b35/1
I need to create stored procedure which will print sum of (logout-login) times for an user and date (duration of using application for user on date). In column type I represents login and O logout.
Input parameters: #username, #date. Output: time.

You need to identify all groups of logins that are related to each other. The way I would do this is by finding the logout associated with logins. You are dealing with log data, so don't be surprised if there are multiple logins with no logout.
select l.*,
(select min(l2.time) from logs l2 where l2.username = l.username and l2.type = 'O' and l2.time > l.time
) as logoutTime
from logs l
where l.type = 'I'
Now, you can use the username and LogoutTime as a pair for aggregation to get what you want:
select username, logoutTime, min(time) as StartTime, logouttime as EndTime,
datediff(s, min(time), logoutTime) as TimeInSeconds
from (select l.*,
(select min(l2.time) from logs l2 where l2.username = l.username and l2.type = 'O' and l2.time > l.time
) as logoutTime
from logs l
where l.type = 'I'
) l
group by username, logoutTime
Note that you specify SQL Server as a tag, but the SQL Fiddle is for MySQL; date functions tend to differ among databases.
And, if you are using SQL Server 2012, there is an easier way. You should specify that if it is the case.

This is Gordon Linoff's solution, modified so it doesn't include the outermost level of aggregation, and uses clearer aliases, IMO
select username,time_in,time_out_min,
datediff(s, time_in, time_out_min) as TimeInSeconds
from (
select i.username, i.time as time_in,
(select min(o.time) as min_time_out
from logs o
where o.username = i.username and
o.type = 'O' and
o.time > i.time
) as time_out_min
from logs i
where i.type = 'I'
) d;

Related

Retrieve data if next line of data equals a particular value

I am very new to SQL and I need some assistance with a query.
I am writing a script which is reviewing a log file. Basically the query is retrieving the instance of when a particular status occurred. This is working as expected however I would like to now add a new condition which states that only if the immediate value after this value equals 'Accepted' or 'Attended'. How would I do this. I have pasted the current script below and commented in italics where I think this condition should be. Any help would be greatly appreciated!
WITH Test AS
(
Select j.jobcode, min(log.timestamp) as 'Time First Assigned'
from Job J
inner join JobLog Log
on J.JobID = Log.JobID
and log.JobStatusID = 'Assigned' *-- and record after this equals accepted or attended*
where j.CompletionDate >= #Start_date
and j.CompletionDate < #End_date
Group by j.jobcode
)
I recommend lead(), but using it in a subquery on one table:
with test as (
select j.jobcode, min(log.timestamp) as time_first_assigned
from Job j join
(select jl.*,
lead(jl.JobStatusID) over (partition by jl.jobid order by jl.timestamp) as next_status
from JobLog jl
) jl
on J.JobID = Log.JobID
where jl.JobStatusID = 'Assigned' and
jl.next_JobStatusID in ('accepted', 'attended') and
j.CompletionDate >= #Start_date and
j.CompletionDate < #End_date
group by j.jobcode
)
In particular, this enables the optimizer to use an index on JobLog(jobid, timestamp, JobStatusId) for the lead(). That said, this will not always improve performance, particularly if the filter on the CompletionDate filters out most rows.
You can use the LEAD windows function as follows:
Select jobcode, min(ts) as 'Time First Assigned' from
(select j.jobcode, log.timestamp as ts, JobStatusID ,
lead(log.JobStatusID)
over (partition by Log.JobID order by Log.timestamp) as lead_statusid
from Job J
inner join JobLog Log on J.JobID = Log.JobID
where j.CompletionDate >= #Start_date and j.CompletionDate < #End_date
) t
where JobStatusID = 'Assigned' and lead_statusid in ('accepted', 'attended')
Group by jobcode
Thank you very much.
I used Gordon's suggested code and once I changed the values to the names I used in my code I can confirm that it works.
I did look at the Lead function however I didn't know how to apply it.
Again thanks to everyone for helping with my query.

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

SUM of time spent in a State

Please consider the table below for call center agent states.
What I need is to calculate the sum of time Bryan spent in "Break" for the whole day.
This is what I'm trying to execute but it returns some inaccurate values:
select sum (CASE
WHEN State = 'Not Working' and Reason = 'Break'
THEN Datediff(SECOND, [Time_Stamp], CURRENT_TIMESTAMP)
else '' END) as Break_Overall
from MyTable
where Agent = 'Bryan'
Use lead():
select agent,
sum(datediff(second, timestamp, next_timestamp)
from (select t.*,
lead(timestamp) over (partition by agent order by time_stamp) as next_timestamp
from mytable t
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
If the agent can currently be on break, you might want a default value:
select agent,
sum(datediff(second, timestamp, next_timestamp)
from (select t.*,
lead(timestamp, 1, current_timestamp) over (partition by agent
order by time_stamp) as next_timestamp
from mytable t
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
I'm a little uncomfortable with this logic, because current_timestamp has a date component, but your times don't.
EDIT:
In SQL Server 2008, you can do:
select agent,
sum(datediff(second, timestamp, coalesce(next_timestamp, current_timestamp))
from (select t.*, t2.timestamp as next_timestamp
from mytable t outer apply
(select top 1 t2.*
from mytable t2
where t2.agent = t.agent and t2.time_stamp > t.time_stamp
order by t.time_stamp
) t2
) t
where state = 'Not Working' and reason = 'Break'
group by agent;
As it is, you're getting the difference between the record's Time_Stamp and CURRENT_TIMESTAMP. That's probably not correct - you probably want to get the difference between the record's Time_Stamp and the next Time_Stamp for the same "Agent".
(Note that "Agent" will also present problems if you have multiple Agents with the same name; you probably want to store Agents in a different table and use a unique identifier as a foreign key.)
So, for Bryan, you'd get
the sum of both the "total time" for the 8:30:21 record AND the 11:34:58 record, which is right - except that you're calculating "total time" incorrectly, so instead you'd get the sum of the time since 8:30:21 and 11:34:58.

How to get the maximum interim value of a parameter in a select statement in sql server?

How to get the maximum interim value of a parameter in a select statement in sql server?
Example:
I have a table userconnection that contains the login and logout time as below:
action, time, user
Login, 2013-24-11 13:00:00, a
Login, 2013-24-11 13:30:00, b
Login, 2013-24-11 14:00:00, c
Logout, 2013-24-11 14:10:00, b
...
...
...
Can anyone help me with the query below to show max concurrent users at any time during the day (=3 from the above example set) and current time of the day (=2 from the above example set?
[select DateAdd(day, 0, DateDiff(day, 0, time)) calanderday,
sum(case when action = 'Login' then 1 when action = 'Logout' then -1
else 0 end) concurrentuser,
max of(concurrentuser interim values) maxconcurrentuser
from userconnection
where time > sysdate - 1
group by DateAdd(day, 0, DateDiff(day, 0, time))
order by calanderday]
I would much appreciate any help with how to get
max of(concurrentuser interim values) maxconcurrentuser?? in the above query without using user defined functions etc, just using inline queries.
I think that this will work, but obviously you've only given us minimal sample data to work from:
;With PairedEvents as (
select a.[user],a.time as timeIn,b.time as TimeOut
from
userconnection a
left join
userconnection b
on
a.[user] = b.[user] and
a.time < b.time and
b.action = 'logout'
left join
userconnection b_anti
on
a.[user] = b_anti.[user] and
a.time < b_anti.time and
b_anti.time < b.time and
b_anti.action = 'logout'
where
a.action = 'Login' and
b_anti.action is null
), PossibleMaxima as (
select pe.timeIn,COUNT(*) as Cnt
from
PairedEvents pe
inner join
PairedEvents pe_all
on
pe_all.timeIn <= pe.timeIn and
(
pe_all.timeOut > pe.timeIn or
pe_all.timeOut is null
)
group by pe.timeIn
), Ranked as (
select *,RANK() OVER (ORDER BY Cnt desc) as rnk
from PossibleMaxima
)
select * from Ranked where rnk = 1
This assumes that all login events can be paired with logout events, and that you don't have stray extras (a logout without a login, or two logins in a row without a logout).
It works by generating 3 CTEs. The first, PairedEvents associates the login rows with their associated logout rows (and needs the above assumption).
Then, in PossibleMaxima, we take each login event and try to find any PairedEvents rows that overlap that time. The number of times that that join succeeds is the number of users who were concurrently online.
Finally, we have the Ranked CTE that gives the maximum value the rank of 1. If there are multiple periods that achieve the maximum then they will each be ranked 1 and returned in the final result set.
If it's possible for multiple users to have identical login times then a slight tweak to PossibleMaxima may be required - but that's only if we need to.

How to determine if two records are 1 year apart (using a timestamp)

I need to analyze some weblogs and determine if a user has visited once, taken a year break, and visited again. I want to add a flag to every row (Y/N) with a VisitId that meets the above criteria.
How would I go about creating this sql?
Here are the fields I have, that I think need to be used (by analyzing the timestamp of the first page of each visit):
VisitID - each visit has a unique Id (ie. 12356, 12345, 16459)
UserID - each user has one Id (ie. steve = 1, ted = 2, mark = 12345, etc...)
TimeStamp - looks like this: 2010-01-01 00:32:30.000
select VisitID, UserID, TimeStamp from page_view_t where pageNum = 1;
thanks - any help would be greatly appreciated.
You could rank every user's rows, then join the ranked row set to itself to compare adjacent rows:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY TimeStamp)
FROM page_view_t
),
flagged AS (
SELECT
*,
IsReturnVisit = CASE
WHEN EXISTS (
SELECT *
FROM ranked
WHERE UserID = r.UserID
AND rnk = r.rnk - 1
AND TimeStamp <= DATEADD(YEAR, -1, r.TimeStamp)
)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
)
SELECT
VisitID,
UserID,
TimeStamp,
IsReturnVisit
FROM flagged
Note: the above flags only return visits.
UPDATE
To flag the first visits same as return visits, the flagged CTE could be modified as follows:
…
SELECT
*,
IsFirstOrReturnVisit = CASE
WHEN p.UserID IS NULL OR r.TimeStamp >= DATEADD(YEAR, 1, p.TimeStamp)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
LEFT JOIN ranked p ON r.UserID = p.UserID AND r.rnk = p.rnk + 1
…
References that might be useful:
WITH common_table_expression (Transact-SQL)
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
The other guy was faster but since I took time to do it and it's a completely different approach I might as well post It :D.
SELECT pv2.VisitID,
pv2.UserID,
pv2.TimeStamp,
CASE WHEN pv1.VisitID IS NOT NULL
AND pv3.VisitID IS NULL
THEN 'YES' ELSE 'NO' END AS IsReturnVisit
FROM page_view_t pv2
LEFT JOIN page_view_t pv1 ON pv1.UserID = pv2.UserID
AND pv1.VisitID <> pv2.VisitID
AND (pv1.TimeStamp <= DATEADD(YEAR, -1, pv2.TimeStamp)
OR pv2.TimeStamp <= DATEADD(YEAR, -1, pv1.TimeStamp))
AND pv1.pageNum = 1
LEFT JOIN page_view_t pv3 ON pv1.UserID = pv3.UserID
AND (pv3.TimeStamp BETWEEN pv1.TimeStamp AND pv2.TimeStamp
OR pv3.TimeStamp BETWEEN pv2.TimeStamp AND pv1.TimeStamp)
AND pv3.pageNum = 1
WHERE pv2.pageNum = 1
Assuming page_view_t table stores UserID and TimeStamp details of each visit of the user, the following query will return users who have visited taking a break of at least an year (365 days) between two consecutive visits.
select t1.UserID
from page_view_t t1
where (
select datediff(day, max(t2.[TimeStamp]), t1.[TimeStamp])
from page_view_t t2
where t2.UserID = t1.UserID and t2.[TimeStamp] < t1.[TimeStamp]
group by t2.UserID
) >= 365