Get rolling 30 day count of users logging in to site - sql

I have a table of login to my site in the format below:
logins
+---------+--------------------------+-----------------------+
| USER_ID | LOGIN_TIMESTAMP | LOGOUT_TIMESTAMP |
+---------+--------------------------+-----------------------+
| 274385 | 01-JAN-20 02.56.12 PM | 02-JAN-20 10.04.40 AM |
| 32498 | 01-JAN-20 05.12.14 PM | 01-JAN-20 08.26.43 PM |
| 981231 | 01-JAN-20 04.41.04 PM | 01-JAN-20 10.51.11 PM |
+---------+--------------------------+-----------------------+
I would like to calculate a unique count of users who logged in only once in the previous 30 days, per day to get something as below
(note - USER_COUNT_LAST_30_DAYS counts only those users who logged in only once in the previous 30 days)
:
+-----------+-------------------------+
| DAY | USER_COUNT_LAST_30_DAYS |
+-----------+-------------------------+
| 01-JAN-20 | 14 |
| 02-JAN-20 | 23 |
| 03-JAN-20 | 29 |
+-----------+-------------------------+
My first thought would be a query as below, but I recognise this would just count all users who logged in the last 30 days, rather than those who only logged in once
SELECT
CAST(LOGIN_TIMESTAMP AS DATE),
COUNT(DISTINCT USER_ID)
FROM
logins
WHERE
LOGIN_TIMESTAMP > SYSDATE - 30
GROUP BY
CAST(LOGIN_TIMESTAMP AS DATE);
Would this query work in getting me a count of users who logged in only once the last 30 days with a rownum partition filter on user id? or is there something that I would have to ensure to get a rolling 30 day count?

The date datatype still has a time component, even if the format mask doesn't show it. You can use the TRUNC function on either a date or a timestamp. If you really want your day to be limited to the day, you'll need to truncate the timestamp. You also need to use INTERVAL, as timestamp math and date math are not the same:
SELECT TRUNC(LOGIN_TIMESTAMP) LOGIN_DATE,
COUNT(DISTINCT USER_ID) USER_COUNT
FROM logins
WHERE TRUNC(LOGIN_TIMESTAMP) > TRUNC(SYSTIMESTAMP - INTERVAL '30' DAY)
GROUP BY TRUNC(LOGIN_TIMESTAMP)
ORDER BY TRUNC(LOGIN_TIMESTAMP) ASC;
Example:
alter session set nls_date_format='DD-MON-YY HH24.MI.SS';
SELECT
SYSTIMESTAMP raw_timestamp,
CAST(SYSTIMESTAMP AS DATE) raw_date,
TRUNC(CAST(SYSTIMESTAMP AS DATE)) trunc_date,
TRUNC(SYSTIMESTAMP) - INTERVAL '30' DAY
from dual;
RAW_TIMESTAMP RAW_DATE TRUNC_DATE TRUNC(SYSTIMESTAMP
-------------------------------------- ------------------ ------------------ ------------------
25-JUN-20 12.27.21.756299000 PM -04:00 25-JUN-20 12.27.21 25-JUN-20 00.00.00 26-MAY-20 00.00.00
For identifying users that have only logged in once, try this:
WITH user_logins as (
SELECT USER_ID,
COUNT(*) LOGIN_COUNT
FROM logins
WHERE TRUNC(LOGIN_TIMESTAMP) > TRUNC(SYSTIMESTAMP - INTERVAL '30' DAY)
GROUP BY USER_ID)
SELECT user_id, login_count
from user_logins
where login_count=1
order by user_id;

Please use below query, since you have string value PM in the date, you cannot use cast function, instead you have to use to_date and convert to date format.
SELECT
to_date(LOGIN_TIMESTAMP, 'DD-MON-YYYY hh.mi.ss PM'),
COUNT(DISTINCT USER_ID)
FROM
logins
WHERE
LOGIN_TIMESTAMP >= SYSDATE - 30
GROUP BY
to_date(LOGIN_TIMESTAMP, 'DD-MON-YYYY hh.mi.ss PM');

Related

Finding total session time of a user in postgres

I am trying to create a query that will give me a column of total time logged in for each month for each user.
username | auth_event_type | time | credential_id
Joe | 1 | 2021-11-01 09:00:00 | 44
Joe | 2 | 2021-11-01 10:00:00 | 44
Jeff | 1 | 2021-11-01 11:00:00 | 45
Jeff | 2 | 2021-11-01 12:00:00 | 45
Joe | 1 | 2021-11-01 12:00:00 | 46
Joe | 2 | 2021-11-01 12:30:00 | 46
Joe | 1 | 2021-12-06 14:30:00 | 47
Joe | 2 | 2021-12-06 15:30:00 | 47
The auth_event_type column specifies whether the event was a login (1) or logout (2) and the credential_id indicates the session.
I'm trying to create a query that would have an output like this:
username | year_month | total_time
Joe | 2021-11 | 1:30
Jeff | 2021-11 | 1:00
Joe | 2021-12 | 1:00
How would I go about doing this in postgres? I am thinking it would involve a window function? If someone could point me in the right direction that would be great. Thank you.
Solution 1 partially working
Not sure that window functions will help you in your case, but aggregate functions will :
WITH list AS
(
SELECT username
, date_trunc('month', time) AS year_month
, max(time ORDER BY time) - min(time ORDER BY time) AS session_duration
FROM your_table
GROUP BY username, date_trunc('month', time), credential_id
)
SELECT username
, to_char (year_month, 'YYYY-MM') AS year_month
, sum(session_duration) AS total_time
FROM list
GROUP BY username, year_month
The first part of the query aggregates the login/logout times for the same username, credential_id, the second part makes the sum per year_month of the difference between the login/logout times. This query works well until the login time and logout time are in the same month, but it fails when they aren't.
Solution 2 fully working
In order to calculate the total_time per username and per month whatever the login time and logout time are, we can use a time range approach which intersects the session ranges [login_time, logout_time) with the monthly ranges [monthly_start_time, monthly_end_time) :
WITH monthly_range AS
(
SELECT to_char(m.month_start_date, 'YYYY-MM') AS month
, tsrange(m.month_start_date, m.month_start_date+ interval '1 month' ) AS monthly_range
FROM
( SELECT generate_series(min(date_trunc('month', time)), max(date_trunc('month', time)), '1 month') AS month_start_date
FROM your_table
) AS m
), session_range AS
(
SELECT username
, tsrange(min(time ORDER BY auth_event_type), max(time ORDER BY auth_event_type)) AS session_range
FROM your_table
GROUP BY username, credential_id
)
SELECT s.username
, m.month
, sum(upper(p.period) - lower(p.period)) AS total_time
FROM monthly_range AS m
INNER JOIN session_range AS s
ON s.session_range && m.monthly_range
CROSS JOIN LATERAL (SELECT s.session_range * m.monthly_range AS period) AS p
GROUP BY s.username, m.month
see the result in dbfiddle
Use the window function lag() with a partition it by credential_id ordered by time, e.g.
WITH j AS (
SELECT username, time, age(time, LAG(time) OVER w)
FROM t
WINDOW w AS (PARTITION BY credential_id ORDER BY time
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
)
SELECT username, to_char(time,'yyyy-mm'),sum(age) FROM j
GROUP BY 1,2;
Note: the frame ROWS BETWEEN 1 PRECEDING AND CURRENT ROW is pretty much optional in this case, but it is considered a good practice to keep window functions as explicit as possible, so that in the future you don't have to read the docs to figure out what your query is doing.
Demo: db<>fiddle

How to group sum results by date with custom start time PostrgresQL

I am trying to group my sum results by custom day in Postgresql.
As regular day starts at 00:00 , I would like mine to start at 04:00am , so if there is entry with time 2019-01-03 02:23 it would count into '2019-01-02' instead.
Right now my code looks like this:
Bottom part works perfectly on day type 00:00 - 23.59 , however I would like to group it by my upper range created above. I just don't know how to connect those two parts.
with dateRange as(
SELECT
generate_series(
MIN(to_date(payments2.paymenttime,'DD Mon YYYY')) + interval '4 hour',
max(to_date(payments2.paymenttime,'DD Mon YYYY')),
'24 hour') as theday
from payments2
)
select
sum(cast(payments2.servicecharge as money)) as total,
to_date(payments2.paymenttime,'DD Mon YYYY') as date
from payments2
group by date
Result like this
+------------+------------+
| total | date |
+------------+------------+
| 20 | 2019-01-01 |
+------------+------------+
| 60 | 2019-01-02 |
+------------+------------+
| 35 | 2019-01-03 |
+------------+------------+
| 21 | 2019-01-04 |
+------------+------------+
Many thanks for your help.
If I didn't misunderstand your question, you just need to subtract 4 hours from the timestamp before casting to date, you don't even need the CTE.
Something like
select
sum(cast(payments2.servicecharge as money)) as total,
(to_timestamp(payments2.paymenttime,'DD Mon YYYY HH24:MI:SS') - interval '4 hours')::date as date
from payments2
group by date
Yu may need to use a different format in the to_timestamp function depending on the format of the payments2.paymenttime string

Database Query to generate a Time-based Chart

I have a logins table in the following (simplified) structure:
id | login_time
---------
1 | 2019-02-04 18:14:30.026361+00
2 | 2019-02-04 22:10:19.720065+00
3 | 2019-02-06 15:51:53.799014+00
Now I want to generate chart like this:
https://prnt.sc/mifz6y
Basically I want to show the logins within the past 48 hours.
My current query:
SELECT count(*), date_trunc('hour', login_time) as time_trunced FROM user_logins
WHERE login_time > now() - interval '48' hour
GROUP BY time_trunced
ORDER BY time_trunced DESC
This works as long as there are entries for every hour. However, if in some hour there were no logins, there will be no entry selected, like this:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
15:00 | 3
16:00 | 5
I would need a continous query, so that I can simply put the count values into an array:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
14:00 | 0 <-- This is missing
15:00 | 3
16:00 | 5
Based on that I can simply transform the query result into an array like [1, 2, 0, 3, 5] and pass that to my frontend.
Is this possible with postgresql? Or do I need to implement my own logic?
I think I would do:
select gs.h, count(ul.login_time)
from generate_series(
date_trunc('hour', now() - interval '48 hour'),
date_trunc('hour', now()),
interval '1 hour'
) gs(h) left join
user_logins ul
on ul.login_time >= gs.h and
ul.login_time < gs.h + interval '1 hour'
group by gs.h
order by gs.h;
This can almost certainly be tidied up a bit but should give you some ides. Props to clamp for the generate_series() tip :
SELECT t.time_trunced,coalesce(l.login_count,0) as logins
FROM
(
-- Generate an inline view with all hours between the min & max values in user_logins table
SELECT date_trunc('hour',a.min_time)+ interval '1h' * b.hr_offset as time_trunced
FROM (select min(login_time) as min_time from user_logins) a
JOIN (select generate_series(0,(select ceil((EXTRACT(EPOCH FROM max(login_time))-EXTRACT(EPOCH FROM min(login_time)))/3600) from user_logins)::int) as hr_offset) b on true
) t
LEFT JOIN
(
-- OP's original query tweaked a bit
SELECT count(*) as login_count, date_trunc('hour', login_time) as time_trunced
FROM user_logins
GROUP BY time_trunced
) l on t.time_trunced=l.time_trunced
order BY 1 desc;

Calculate previous login for each row grouped by day relative to current row

Goal: I would like to gather logins for each user grouped by day.
Problem: I am struggling with the function to calculate the last column which is the last login relative to the current row of the login column(somewhat like a lag function but not sure how to use it). The issue is that I only need to show logins for the last three months so how would it calculate the fifth observation of the days_last_login column in the following table if i put a where condition for the last three months?:
Desired Output:
+----+---------------------+-----------------+
| id | login | days_last_login |
+----+---------------------+-----------------+
| 1 | 2018-12-10 05:00:00 | 5 |
| 1 | 2018-12-07 05:30:00 | 3 |
| 1 | 2018-12-01 05:30:00 | 6 |
| 2 | 2019-08-01 05:30:00 | 7 |
| 2 | 2019-01-01 05:30:00 | 365 |
+----+---------------------+-----------------+
Current Query:
SELECT id
,YEAR(login) as yr, MONTH(login) as mm, DAY(login) as dd
,CAST(login AS DATE) as logins
,FUNCTION FOR DAYS_LAST_LOGIN
FROM database.table
WHERE login > DATEADD(month,-3,getdate())
GROUP BY YEAR(login), MONTH(login), DAY(login), id
ORDER BY id, yr desc, mm desc, dd desc
Note: I ommitted to show the yr,month and day columns in the table to make it more clear.
From what I can tell, the logic is the number of days from a given login date to the next, presumably with the most recent date measured up to the current date.
That suggests a query like this:
SELECT id, CONVERT(date, login) as dte,
DATEDIFF(day, login, LEAD(MAX(login), 1, GETDATE()) OVER (PARTITION BY id)) as DAYS_LAST_LOGIN
FROM database.table
WHERE login > DATEADD(month, -3, getdate())
GROUP BY id, CONVERT(date, login)
ORDER BY id, CONVERT(date, login) DESC;
I removed the date parts because I don't find them useful, but you can of course include them.

sql query to groupby with a deduplicated column

I have the following table
create table events (
event_id,
event_name,
datetime,
email)
And I want to display the events per week, and the events per week deduplicated by emails, in a single query.
While doing:
select date_trunc('week', datetime) wdt, event_name, count(1)
from events
group by wdt, event_name;
wdt | event_name | count
---------------------+-------------+-------
2014-10-27 00:00:00 | deliver | 32
2014-11-17 00:00:00 | open | 30
2014-10-20 00:00:00 | deliver | 25
2014-10-20 00:00:00 | click | 19
2014-10-27 00:00:00 | click | 29
I can get the first column, but I don't know how to have the count_distinct column (if two clicks for the same email, on same week, it counts for one, not two).
Just specify which column to count only distinct values for, like this:
select date_trunc('week', datetime) wdt, event_name, count(distinct email)
from events
group by wdt, event_name;
I think the problem is you just need to do a distinct 1; as you pointed out.
select date_trunc('week', datetime) wdt, event_name, count(distinct 1)
from events
group by wdt, event_name;
however with out the raw data and some samples, I'm not sure how to confirm as I can't see why count 31 and 29 would occur for the same date (10/27) in wdt for the same event_name.