create id column based on activity data

create id column based on activity data - sql

I have a table EVENTS
USER EVENT_TS EVENT_TYPE
abc 2016-01-01 08:00:00 Login
abc 2016-01-01 08:25:00 Stuff
abc 2016-01-01 10:00:00 Stuff
abc 2016-01-01 14:00:00 Login
xyz 2015-12-31 18:00:00 Login
xyz 2016-01-01 08:00:00 Logout
What I need to do is produce a session field for each period of activity for each user. In addition, if the user has been idle for a period equal to or longer than p_timeout (1 hour in this case) then a new session starts at the next activity. Users don't always log out cleanly, so the logout isn't walways there...
Notes:
Logout always terminates a session
There doesn't have to be a logout or a login (because software)
Login is always a new session
Output like
USER EVENT_TS EVENT_TYPE SESSION
abc 2016-01-01 08:00:00 Login 1
abc 2016-01-01 08:25:00 Stuff 1
abc 2016-01-01 10:00:00 Stuff 2
abc 2016-01-01 14:00:00 Login 3
xyz 2015-12-31 18:00:00 Login 1
xyz 2016-01-01 08:00:00 Logout 1
Any thoughts on how to acheive this?

I think this may do what you need. I changed "user" to "usr" in the input, and "session" to "sess" in the output - I don't ever use reserved Oracle words for object names.
Note: as Boneist pointed out below, my solution will assign a session number of 0 to the first session, if it is a Logout event (or a succession of Logouts right at the top). If this situation can occur in the data, and if the desired behavior is to start session counts at 1 even in that case, then the definition of flag must be tweaked - for example, by making flag = 1 when lag(event_ts) over (partition by usr order by event_ts) is null as well.
Good luck!
with
events ( usr, event_ts, event_type ) as (
select 'abc', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
select 'abc', to_timestamp('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
select 'abc', to_timestamp('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
select 'abc', to_timestamp('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
select 'xyz', to_timestamp('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
select 'xyz', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Logout' from dual
),
start_of_sess ( usr, event_ts, event_type, flag ) as (
select usr, event_ts, event_type,
case when event_type != 'Logout'
and
( event_ts >= lag(event_ts) over (partition by usr
order by event_ts) + 1/24
or event_type = 'Login'
or lag(event_type) over (partition by usr
order by event_ts) = 'Logout'
)
then 1 end
from events
)
select usr, event_ts, event_type,
count(flag) over (partition by usr order by event_ts) as sess
from start_of_sess
;
Output (timestamps use my current NLS_TIMESTAMP_FORMAT setting):
USR EVENT_TS EVENT_TYPE SESS
--- --------------------------------- ---------- ------
abc 01-JAN-2016 08.00.00.000000000 AM Login 1
abc 01-JAN-2016 08.25.00.000000000 AM Stuff 1
abc 01-JAN-2016 10.00.00.000000000 AM Stuff 2
abc 01-JAN-2016 02.00.00.000000000 PM Login 3
xyz 31-DEC-2015 06.00.00.000000000 PM Login 1
xyz 01-JAN-2016 08.00.00.000000000 AM Logout 1
6 rows selected

I think this will do the trick:
WITH EVENTS AS (SELECT 'abc' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
SELECT 'abc' usr, to_date('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
SELECT 'abc' usr, to_date('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
SELECT 'abc' usr, to_date('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
SELECT 'xyz' usr, to_date('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
SELECT 'xyz' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
SELECT 'def' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
SELECT 'def' usr, to_date('2016-01-01 08:15:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual)
SELECT usr,
event_ts,
event_type,
SUM(counter) OVER (PARTITION BY usr ORDER BY event_ts) session_id
FROM (SELECT usr,
event_ts,
event_type,
CASE WHEN LAG(event_type, 1, 'Logout') OVER (PARTITION BY usr ORDER BY event_ts) = 'Logout' THEN 1
WHEN event_type = 'Logout' THEN 0
WHEN event_ts - LAG(event_ts) OVER (PARTITION BY usr ORDER BY event_ts) > 1/24 THEN 1
WHEN event_type = 'login' THEN 1
ELSE 0
END counter
FROM EVENTS);
USR EVENT_TS EVENT_TYPE SESSION_ID
--- ------------------- ---------- ----------
abc 2016-01-01 08:00:00 login 1
abc 2016-01-01 08:25:00 Stuff 1
abc 2016-01-01 10:00:00 Stuff 2
abc 2016-01-01 14:00:00 login 3
def 2016-01-01 08:00:00 Logout 1
def 2016-01-01 08:15:00 Logout 2
xyz 2015-12-31 18:00:00 login 1
xyz 2016-01-01 08:00:00 Logout 1
This solution relies on the logic-short circuiting that takes place in the CASE expression and the fact that the event_type is not null. It also assumes that multiple logouts in a row are counted as separate sessions:
If the previous row was a logout row (and if there is no previous row - i.e. for the first row in the set - treat it as if a logout row was present), we want to increase the counter by one. (Logouts terminate the session, so we always have a new session following a logout.)
If the current row is a logout, then this terminates the existing session. Therefore, the counter shouldn't be increased.
If the time of the current row is greater than an hour from the previous row, increase the counter by one.
If the current row is a login row, then it's a new session, so increase the counter by one.
For any other case, we don't increase the counter.
Once we've done that, it's just a matter of doing a running total on the counter.

For completeness (for users with Oracle 12 or above), here is a solution using MATCH_RECOGNIZE:
select usr, event_ts, event_type, sess
from events
match_recognize(
partition by usr
order by event_ts
measures match_number() as sess
all rows per match
pattern (strt follow*)
define follow as event_type = 'Logout'
or ( event_type != 'Login'
and prev(event_type) != 'Logout'
and event_ts < prev(event_ts) + 1/24
)
)
;
Here I cover an unusual case: a Logout event following another Logout event. In such cases, I assume all consecutive Logouts, no matter how many and how far apart in time, belong to the same session. (If such cases are guaranteed not to occur in the data, so much the better.)
Please see also the Note I added to my other answer (for Oracle 11 and below) regarding the possibility of the very first event for a usr being a Logout (if that is even possible in the input data).

Related

Need information in rows into columns

Currently I have a view which gets user, date, session id, activity and hostname.
User logins to a system and a session id is created, same session id gets updated for the logoff as well.
View data:
user
date
session_id
activity
hostname
X
2023-02-07T11:02
45
Login
XYZ
X
2023-02-07T11:06
45
Logout
XYZ
Y
2023-02-07T10:02
67
Login
ABC
Y
2023-02-07T10:32
67
Logout
ABC
X
2023-02-06T11:02
48
Login
XYZ
X
2023-02-06T11:06
48
Logout
XYZ
I want the data to come out as below,
user
Hostname
login
logout
X
XYZ
2023-02-07T11:02
2023-02-07T11:06
Y
ABC
2023-02-07T10:02
2023-02-07T10:32
X
XYZ
2023-02-06T11:02
2023-02-06T11:06
I have written a query using pivot
select * from ( select user, date, session_id, activity, hostname from view)
pivot ( max(date) for activity in ('login','logoff')) view
I am getting the results as expected but I don't want session_id to come up in the results and also the column name for login & logoff is as 'login' and 'logoff', how can I rename them?

If you do not want certaing columns displayed then do not SELECT them (naming the ones you do want to display rather than using SELECT *) and if you do not want the default column aliases then explicitly provide your own aliases:
SELECT username,
hostname,
login,
logoff
FROM (
SELECT username,
date_column,
session_id,
activity,
hostname
FROM view_name
)
PIVOT (
MAX(date_column) FOR activity IN (
'login' AS login,
'logout' AS logoff
)
);
or, if you do not want to group by the session id:
SELECT username,
hostname,
login,
logoff
FROM (
SELECT username,
date_column,
activity,
hostname
FROM view_name
)
PIVOT (
MAX(date_column) FOR activity IN (
'login' AS login,
'logout' AS logoff
)
);
fiddle

Here's one option:
Sample data:
SQL> with test (cuser, datum, session_id, activity, hostname) as
2 (select 'x', to_date('07.02.2023 11:02', 'dd.mm.yyyy hh24:mi'), 45, 'Login' , 'xyz' from dual union all
3 select 'x', to_date('07.02.2023 11:06', 'dd.mm.yyyy hh24:mi'), 45, 'Logout', 'xyz' from dual union all
4 select 'y', to_date('07.02.2023 10:02', 'dd.mm.yyyy hh24:mi'), 67, 'Login' , 'abc' from dual union all
5 select 'y', to_date('07.02.2023 10:32', 'dd.mm.yyyy hh24:mi'), 67, 'Logout', 'abc' from dual union all
6 select 'x', to_date('06.02.2023 11:02', 'dd.mm.yyyy hh24:mi'), 48, 'Login' , 'xyz' from dual union all
7 select 'x', to_date('06.02.2023 11:06', 'dd.mm.yyyy hh24:mi'), 48, 'Logout', 'xyz' from dual
8 )
Query:
9 select cuser, hostname,
10 max(case when activity = 'Login' then datum end) login,
11 max(case when activity = 'Logout' then datum end) logout
12 from test
13 group by cuser, hostname, session_id
14 order by cuser, login;
C HOS LOGIN LOGOUT
- --- ---------------- ----------------
x xyz 06.02.2023 11:02 06.02.2023 11:06
x xyz 07.02.2023 11:02 07.02.2023 11:06
y abc 07.02.2023 10:02 07.02.2023 10:32
SQL>

Oracle: getting an average by week for the timespan of available data

I have some data that shows daily logins by clients on every available date they logged in that streches back a few years.
date month clientId loginCount
------------ --------- ---------- ------------
01/01/2021 01-2021 1234 234
02/01/2021 01-2021 1234 978
01/02/2021 02-2021 6547 45
01/02/2021 02-2021 345 86
....
For each client, I would like to generate the average number of times they login every week for however long they have corresponding date entries in the table :
clientId avgWeeklyLoginCount
---------- ---------------------
1234 125
6547 26
345 48
I understand 'IW' could be used in the TO_CHAR function to do this, e.g.
SELECT
TO_CHAR(date,'IW'),
clientId,
SUM(loginCount) as summedCount
FROM
logins
GROUP BY
TO_CHAR(date,'IW')
but not sure how to get an average by client id from this. any help will be appreciated!

You can using it as example. It can be looks like unnecessary overcomplicated:
ceil((in_date - trunc(to_date('06.01.0001', 'dd.MM.yyyy'), 'IW'))/7)
It means number of week since 1 CE. If your dates contain within single year you can use TO_CHAR(date,'IW') or TO_CHAR(date,'WW') instead of.
with logins(in_date, clientId, loginCount) as (
select to_date('01/01/2021 01:00:00', 'dd/MM/yyyy HH:MI:SS'), 1234, 234 from dual union all
select to_date('02/01/2021 01:00:00', 'dd/MM/yyyy HH:MI:SS'), 1234, 978 from dual union all
select to_date('01/02/2021 01:00:00', 'dd/MM/yyyy HH:MI:SS'), 6547, 45 from dual union all
select to_date('01/02/2021 01:00:00', 'dd/MM/yyyy HH:MI:SS'), 345, 86 from dual union all
select to_date('31/12/2020 01:00:00', 'dd/MM/yyyy HH:MI:SS'), 347, 1 from dual union all
select to_date('01/01/2021 01:00:00', 'dd/MM/yyyy HH:MI:SS'), 347, 1 from dual
)
select
clientId, avg(loginCount) avgLoginCountPerWeek
from (
select
week_number, clientId, sum(loginCount) loginCountPerWeek
from (
select
ceil((in_date - trunc(to_date('06.01.0001', 'dd.MM.yyyy'), 'IW'))/7) week_number, clientId, loginCount
from
logins
) t
group by
week_number, clientId
)
group by
clientId

You can use an aggregation query and count(distinct):
select clientid,
count(*) / count(distinct trunc(in_date, 'WW')) as avg_per_week
from logins
group by clientid;

Weird Interleaving requirement

I have a log table that contains (to be simple), user, operation, date.
There are two operations: search and view (search may return a hundred records; the user may view zero or more).
I need to have the basic output sorted by date, but I also need to have all of the views for one search together. Something like
name operation date
john search 1/1 1pm
john view 1/1 2pm
john view 1/1 3pm
james search 1/1 230pm
james view 1/1 315pm
john search 1/1 310pm
It seems I need to use the results of a subquery to perform the query, but I'm not sure how that would look. I'm OK with SQL but I kind of hit the ceiling with JOINs and UNIONs. :-/

You can identify the groups by using a window function. And you can include the window function in the order by, so no subqueries are needed.
select *
from log_table l
order by max(case when l.operation = 'search' then l.log_date end) over (partition by l.name order by l.log_date),
l.name,
l.log_date;
Here is a db<>fiddle.

You can use a conditional lag() call to find the most recent search date/time for each view row, per user; with search rows getting their own date/time:
-- CTE for sample data
with log_table (name, operation, log_date) as (
select 'john', 'search', timestamp '2019-01-01 13:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 14:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 15:00:00' from dual
union all select 'james', 'search', timestamp '2019-01-01 14:30:00' from dual
union all select 'james', 'view', timestamp '2019-01-01 15:15:00' from dual
union all select 'john', 'search', timestamp '2019-01-01 15:10:00' from dual
)
-- actual query
select name, operation, log_date,
case when operation = 'search' then log_date
else lag(case when operation = 'search' then log_date end ignore nulls)
over (partition by name order by log_date)
end as search_date
from log_table
order by log_date;
NAME OPERATION LOG_DATE SEARCH_DATE
----- --------- ------------------- -------------------
john search 2019-01-01 13:00:00 2019-01-01 13:00:00
john view 2019-01-01 14:00:00 2019-01-01 13:00:00
james search 2019-01-01 14:30:00 2019-01-01 14:30:00
john view 2019-01-01 15:00:00 2019-01-01 13:00:00
john search 2019-01-01 15:10:00 2019-01-01 15:10:00
james view 2019-01-01 15:15:00 2019-01-01 14:30:00
You can then use that as a CTE or inline view, and use the generated search_date to order first, then order the records with the same search date by their actual log date:
-- CTE for sample data
with log_table (name, operation, log_date) as (
select 'john', 'search', timestamp '2019-01-01 13:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 14:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 15:00:00' from dual
union all select 'james', 'search', timestamp '2019-01-01 14:30:00' from dual
union all select 'james', 'view', timestamp '2019-01-01 15:15:00' from dual
union all select 'john', 'search', timestamp '2019-01-01 15:10:00' from dual
)
-- actual query
select name, operation, log_date
from (
select name, operation, log_date,
case when operation = 'search' then log_date
else lag(case when operation = 'search' then log_date end ignore nulls)
over (partition by name order by log_date)
end as search_date
from log_table
)
order by search_date, log_date;
NAME OPERATION LOG_DATE
----- --------- -------------------
john search 2019-01-01 13:00:00
john view 2019-01-01 14:00:00
john view 2019-01-01 15:00:00
james search 2019-01-01 14:30:00
james view 2019-01-01 15:15:00
john search 2019-01-01 15:10:00
As you could potentially get simultaneous searches from two users, you might want to include the user in the final order-by clause too:
...
order by search_date, name, log_date;

Find periods from timestamps in ordered table

Let's assume I have following table CALLS which is sorted by column CALL of type TIMESTAMP:
CALL TYPE
--------------------- ------
31.10.2018 10:00:00 OFF
31.10.2018 11:00:00 ON
31.10.2018 12:00:00 ON
31.10.2018 13:00:00 ON
31.10.2018 14:00:00 OFF
31.10.2018 15:00:00 OFF
31.10.2018 16:00:00 ON
31.10.2018 17:00:00 ON
I want to write view that will find individual groups of calls with TYPE=ON and extract their start and end dates. As a result, for given example I get two groups:
START END
--------------------- ---------------------
31.10.2018 11:00:00 31.10.2018 13:00:00
31.10.2018 16:00:00 31.10.2018 17:00:00
We should assume:
Minimal count of group is 1, so we can get group that has the same start and end date
ON rows are seperated by OFF rows but the first and the last row don't have to be OFF type
Is it possible to achieve that in Oracle 12c?

This is a gaps-and-islands problem. In this case, a difference of row numbers with aggregation does what you want:
select min(call) as start_time, max(call) as end_time
from (select t.*,
row_number() over (partition by type order by call) as seqnum_t,
row_number() over (order by call) as seqnum
from t
) t
where type = 'ON'
group by (seqnum - seqnum_t)

If you run Oracle 12 then you can use also the SQL for Pattern Matching
Would be like this:
WITH t (CALL, TYPE) AS (
SELECT TO_TIMESTAMP('31.10.2018 10:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'OFF' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 11:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'ON' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 12:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'ON' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 13:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'ON' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 14:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'OFF' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 15:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'OFF' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 16:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'ON' FROM dual UNION ALL
SELECT TO_TIMESTAMP('31.10.2018 17:00:00', 'dd.mm.yyyy hh24:mi:ss'), 'ON' FROM dual)
SELECT *
FROM t
MATCH_RECOGNIZE (
ORDER BY CALL
MEASURES
FINAL MIN(CALL) AS CALL_START,
FINAL MAX(CALL) AS CALL_END
PATTERN ( CALL_ON+ )
DEFINE
CALL_ON AS TYPE = 'ON'
);
+-----------------------------------------------------------+
| CALL_START | CALL_END |
+-----------------------------------------------------------+
| 31.10.2018 11:00:00.000 | 31.10.2018 13:00:00.000 |
| 31.10.2018 16:00:00.000 | 31.10.2018 17:00:00.000 |
+-----------------------------------------------------------+

SQL , Analytical Functions , rownumber

I need to get same rownumber or numeric value in SQL to group values that match conditions like the following example:
If we have same Agent name and the time variance between current row and preceding row value is less than 06:00 hours after applying partition by name and ordering by time
then add same rownumber else increase it.
example for row data and output of rownumber:
person date_time rownumber
A 01/04/2018 10:00 1
A 01/04/2018 13:00 1
A 01/04/2018 14:00 1
A 01/04/2018 15:00 1
A 01/04/2018 23:00 2
A 02/04/2018 03:00 2
A 02/04/2018 12:00 3
A 02/04/2018 16:00 3
B 01/04/2018 17:00 4
B 01/04/2018 20:30 4
C 01/04/2018 18:00 5
C 01/04/2018 22:00 5

You can do this with a combination of LAG and SUM analytic functions, like so:
WITH your_table AS (SELECT 'A' person, to_date('01/04/2018 10', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('01/04/2018 13', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('01/04/2018 14', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('01/04/2018 15', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('01/04/2018 23', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('02/04/2018 03', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('02/04/2018 12', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'A' person, to_date('02/04/2018 16', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'B' person, to_date('01/04/2018 17', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'B' person, to_date('01/04/2018 20', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'C' person, to_date('01/04/2018 18', 'dd/mm/yyyy hh24') date_time FROM dual UNION ALL
SELECT 'C' person, to_date('01/04/2018 22', 'dd/mm/yyyy hh24') date_time FROM dual)
SELECT person,
date_time,
SUM(period_change) OVER (ORDER BY person, date_time) rownumber
FROM (SELECT person,
date_time,
CASE WHEN date_time - LAG(date_time, 1, date_time - 7/24) OVER (PARTITION BY person ORDER BY date_time) > 6/24 THEN 1 ELSE 0 END period_change
FROM your_table);
PERSON DATE_TIME ROWNUMBER
------ ----------- ----------
A 01/04/2018 1
A 01/04/2018 1
A 01/04/2018 1
A 01/04/2018 1
A 01/04/2018 2
A 02/04/2018 2
A 02/04/2018 3
A 02/04/2018 3
B 01/04/2018 4
B 01/04/2018 4
C 01/04/2018 5
C 01/04/2018 5
This works by putting 1 in the additional column whenever a new group is triggered.
Once you have that, then you can do a running sum on that column. That means that after every group change, subsequent rows will be assigned the same number, up until the next group change.
N.B. As suggested by Peter Lang in the comments below, you might prefer to change the case statement generating the "period_change" column to:
CASE WHEN date_time - LAG(date_time) OVER (PARTITION BY person ORDER BY date_time) < 6/24 THEN 0 ELSE 1 END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

create id column based on activity data - sql

Related

Need information in rows into columns

Oracle: getting an average by week for the timespan of available data

Weird Interleaving requirement

Find periods from timestamps in ordered table

SQL , Analytical Functions , rownumber

Categories

Resources