Database Query to generate a Time-based Chart - sql

I have a logins table in the following (simplified) structure:
id | login_time
---------
1 | 2019-02-04 18:14:30.026361+00
2 | 2019-02-04 22:10:19.720065+00
3 | 2019-02-06 15:51:53.799014+00
Now I want to generate chart like this:
https://prnt.sc/mifz6y
Basically I want to show the logins within the past 48 hours.
My current query:
SELECT count(*), date_trunc('hour', login_time) as time_trunced FROM user_logins
WHERE login_time > now() - interval '48' hour
GROUP BY time_trunced
ORDER BY time_trunced DESC
This works as long as there are entries for every hour. However, if in some hour there were no logins, there will be no entry selected, like this:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
15:00 | 3
16:00 | 5
I would need a continous query, so that I can simply put the count values into an array:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
14:00 | 0 <-- This is missing
15:00 | 3
16:00 | 5
Based on that I can simply transform the query result into an array like [1, 2, 0, 3, 5] and pass that to my frontend.
Is this possible with postgresql? Or do I need to implement my own logic?

I think I would do:
select gs.h, count(ul.login_time)
from generate_series(
date_trunc('hour', now() - interval '48 hour'),
date_trunc('hour', now()),
interval '1 hour'
) gs(h) left join
user_logins ul
on ul.login_time >= gs.h and
ul.login_time < gs.h + interval '1 hour'
group by gs.h
order by gs.h;

This can almost certainly be tidied up a bit but should give you some ides. Props to clamp for the generate_series() tip :
SELECT t.time_trunced,coalesce(l.login_count,0) as logins
FROM
(
-- Generate an inline view with all hours between the min & max values in user_logins table
SELECT date_trunc('hour',a.min_time)+ interval '1h' * b.hr_offset as time_trunced
FROM (select min(login_time) as min_time from user_logins) a
JOIN (select generate_series(0,(select ceil((EXTRACT(EPOCH FROM max(login_time))-EXTRACT(EPOCH FROM min(login_time)))/3600) from user_logins)::int) as hr_offset) b on true
) t
LEFT JOIN
(
-- OP's original query tweaked a bit
SELECT count(*) as login_count, date_trunc('hour', login_time) as time_trunced
FROM user_logins
GROUP BY time_trunced
) l on t.time_trunced=l.time_trunced
order BY 1 desc;

Related

Group rows with start- end date by month on PostgresSQL

I have a database with a tbl_registration with rows that look like
ID | start_date_time | end_date_time | ...
1 | 2021-01-01 14:00:15 | 2021-01-01 14:00:15
2 | 2021-02-01 14:00:15 | null
4 | 2021-05-15 14:00:15 | 2024-01-01 14:00:15
5 | 2019--15 14:00:15 | 2024-01-01 14:00:15
endDate can be null
It contains 500.000 - 1.000.000 of records
We want to create an overview of year grouped by month that shows the amount of records that are active in that month. So a registration is counted per month if it lies (partially) in that month based on start and end date.
I can do a query per month like this
select count (id)
from tbl_registration
where
(r.end_date_time >= to_timestamp('01/01/2021 00:00:00', 'DD/MM/YYYY HH24:MI:SS') or r.end_date_time is null )
and r.start_date_time < to_timestamp('01/02/2021 00:00:00', 'DD/MM/YYYY HH24:MI:SS');
But that forces me to repeat this query 12 times.
I don't see a creative way to solve this in one query that would give me as a result 12 rows, one for each month
I've been looking at the generate_series function, but I don't see how I can group on the comparison of those start- and end dates
Postgres supports generate_series() . . . so generate the dates you want then then construct the query. One method is:
select gs.mon, x.cnt
from generate_series('2021-01-01'::date, '2021-12-01'::date, interval '1 month') gs(mon) left join lateral
(select count(*) as cnt
from tbl_registration
where r.end_date_time >= gs.mon or r.end_date_time is null) and
r.start_date_time < gs.mon + interval '1 month'
) x
on 1=1;

Get rolling 30 day count of users logging in to site

I have a table of login to my site in the format below:
logins
+---------+--------------------------+-----------------------+
| USER_ID | LOGIN_TIMESTAMP | LOGOUT_TIMESTAMP |
+---------+--------------------------+-----------------------+
| 274385 | 01-JAN-20 02.56.12 PM | 02-JAN-20 10.04.40 AM |
| 32498 | 01-JAN-20 05.12.14 PM | 01-JAN-20 08.26.43 PM |
| 981231 | 01-JAN-20 04.41.04 PM | 01-JAN-20 10.51.11 PM |
+---------+--------------------------+-----------------------+
I would like to calculate a unique count of users who logged in only once in the previous 30 days, per day to get something as below
(note - USER_COUNT_LAST_30_DAYS counts only those users who logged in only once in the previous 30 days)
:
+-----------+-------------------------+
| DAY | USER_COUNT_LAST_30_DAYS |
+-----------+-------------------------+
| 01-JAN-20 | 14 |
| 02-JAN-20 | 23 |
| 03-JAN-20 | 29 |
+-----------+-------------------------+
My first thought would be a query as below, but I recognise this would just count all users who logged in the last 30 days, rather than those who only logged in once
SELECT
CAST(LOGIN_TIMESTAMP AS DATE),
COUNT(DISTINCT USER_ID)
FROM
logins
WHERE
LOGIN_TIMESTAMP > SYSDATE - 30
GROUP BY
CAST(LOGIN_TIMESTAMP AS DATE);
Would this query work in getting me a count of users who logged in only once the last 30 days with a rownum partition filter on user id? or is there something that I would have to ensure to get a rolling 30 day count?
The date datatype still has a time component, even if the format mask doesn't show it. You can use the TRUNC function on either a date or a timestamp. If you really want your day to be limited to the day, you'll need to truncate the timestamp. You also need to use INTERVAL, as timestamp math and date math are not the same:
SELECT TRUNC(LOGIN_TIMESTAMP) LOGIN_DATE,
COUNT(DISTINCT USER_ID) USER_COUNT
FROM logins
WHERE TRUNC(LOGIN_TIMESTAMP) > TRUNC(SYSTIMESTAMP - INTERVAL '30' DAY)
GROUP BY TRUNC(LOGIN_TIMESTAMP)
ORDER BY TRUNC(LOGIN_TIMESTAMP) ASC;
Example:
alter session set nls_date_format='DD-MON-YY HH24.MI.SS';
SELECT
SYSTIMESTAMP raw_timestamp,
CAST(SYSTIMESTAMP AS DATE) raw_date,
TRUNC(CAST(SYSTIMESTAMP AS DATE)) trunc_date,
TRUNC(SYSTIMESTAMP) - INTERVAL '30' DAY
from dual;
RAW_TIMESTAMP RAW_DATE TRUNC_DATE TRUNC(SYSTIMESTAMP
-------------------------------------- ------------------ ------------------ ------------------
25-JUN-20 12.27.21.756299000 PM -04:00 25-JUN-20 12.27.21 25-JUN-20 00.00.00 26-MAY-20 00.00.00
For identifying users that have only logged in once, try this:
WITH user_logins as (
SELECT USER_ID,
COUNT(*) LOGIN_COUNT
FROM logins
WHERE TRUNC(LOGIN_TIMESTAMP) > TRUNC(SYSTIMESTAMP - INTERVAL '30' DAY)
GROUP BY USER_ID)
SELECT user_id, login_count
from user_logins
where login_count=1
order by user_id;
Please use below query, since you have string value PM in the date, you cannot use cast function, instead you have to use to_date and convert to date format.
SELECT
to_date(LOGIN_TIMESTAMP, 'DD-MON-YYYY hh.mi.ss PM'),
COUNT(DISTINCT USER_ID)
FROM
logins
WHERE
LOGIN_TIMESTAMP >= SYSDATE - 30
GROUP BY
to_date(LOGIN_TIMESTAMP, 'DD-MON-YYYY hh.mi.ss PM');

Get a rolling count of timestamps in SQL

I have a table (in an Oracle DB) that looks something like what is shown below with about 4000 records. This is just an example of how the table is designed. The timestamps range for several years.
| Time | Action |
| 9/25/2019 4:24:32 PM | Yes |
| 9/25/2019 4:28:56 PM | No |
| 9/28/2019 7:48:16 PM | Yes |
| .... | .... |
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval. I would like this done by looking at each timestamp and getting a count of timestamps that appear within 15 minutes of that timestamp.
My goal would to have something like
| Interval | Count |
| 9/25/2019 4:24:00 PM - 9/25/2019 4:39:00 | 2 |
| 9/25/2019 4:25:00 PM - 9/25/2019 4:40:00 | 2 |
| ..... | ..... |
| 9/25/2019 4:39:00 PM - 9/25/2019 4:54:00 | 0 |
I am not sure how I would be able to do this, if at all. Any ideas or advice would be much appreciated.
If you want any 15 minute interval in the data, then you can use:
select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t;
If you want the maximum, then use rank() on this:
select t.*
from (select t.*, rank() over (order by cnt_15 desc) as seqnum
from (select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t
) t
) t
where seqnum = 1;
This doesn't produce exactly the results you specify in the query. But it does answer the question:
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval.
You could enumerate the minutes with a recursive query, then bring the table with a left join:
with recursive cte (start_dt, max_dt) as (
select trunc(min(time), 'mi'), max(time) from mytable
union all
select start_dt + interval '1' minute, max_dt from cte where start_dt < max_dt
)
select
c.start_dt,
c.start_dt + interval '15' minute end_dt,
count(t.time) cnt
from cte c
left join mytable t
on t.time >= c.start_dt
and t.time < c.start_dt + interval '15' minute
group by c.start_dt

SQLite: Sum of differences between two dates group by every date

I have a SQLite database with start and stop datetimes
With the following SQL query I get the difference hours between start and stop:
SELECT starttime, stoptime, cast((strftime('%s',stoptime)-strftime('%s',starttime)) AS real)/60/60 AS diffHours FROM tracktime;
I need a SQL query, which delivers the sum of multiple timestamps, grouped by every day (also whole dates between timestamps).
The result should be something like this:
2018-08-01: 12 hours
2018-08-02: 24 hours
2018-08-03: 12 hours
2018-08-04: 0 hours
2018-08-05: 1 hours
2018-08-06: 14 hours
2018-08-07: 8 hours
You can try this, use CTE RECURSIVE make a calendar table for every date start time and end time, and do some calculation.
Schema (SQLite v3.18)
CREATE TABLE tracktime(
id int,
starttime timestamp,
stoptime timestamp
);
insert into tracktime values
(11,'2018-08-01 12:00:00','2018-08-03 12:00:00');
insert into tracktime values
(12,'2018-09-05 18:00:00','2018-09-05 19:00:00');
Query #1
WITH RECURSIVE cte AS (
select id,starttime,date(starttime,'+1 day') totime,stoptime
from tracktime
UNION ALL
SELECT id,
date(starttime,'+1 day'),
date(totime,'+1 day'),
stoptime
FROM cte
WHERE date(starttime,'+1 day') < stoptime
)
SELECT strftime('%Y-%m-%d', starttime),(strftime('%s',CASE
WHEN totime > stoptime THEN stoptime
ELSE totime
END) -strftime('%s',starttime))/3600 diffHour
FROM cte;
| strftime('%Y-%m-%d', starttime) | diffHour |
| ------------------------------- | -------- |
| 2018-08-01 | 12 |
| 2018-09-05 | 1 |
| 2018-08-02 | 24 |
| 2018-08-03 | 12 |
View on DB Fiddle

In a table containing rows of date ranges, from each row, generate one row per day containing hours of utilization

Given a table with rows like:
+----+-------------------------+------------------------+
| ID | StartDate | EndDate |
+----+-------------------------+------------------------+
| 1 | 2016-02-05 20:00:00.000 | 2016-02-07 5:00:00.000 |
+----+-------------------------+------------------------+
I want to produce a table like this:
+----+------------+----------+
| ID | Date | Duration |
+----+------------+----------+
| 1 | 2016-02-05 | 4 |
| 1 | 2016-02-06 | 24 |
| 1 | 2016-02-07 | 5 |
+----+------------+----------+
This is an interview-style question. I am wondering how I can go about tackling this. Is it possible to do this with just standard SQL query syntax? Or is a procedural language like pl/pgSQL required to do a query like this?
The basic idea is this:
SELECT date_trunc('day', dayhour) as dd,count(*)
FROM (VALUES (1, '2016-02-05 20:00:00.000'::timestamp, '2016-02-07 5:00:00.000'::timestamp)
) v(ID, StartDate, EndDate), lateral
generate_series(StartDate, EndDate, interval '1 hour') g(dayhour)
GROUP BY dd
ORDER BY dd;
That adds an extra hour, so this is more accurate:
SELECT date_trunc('day', dayhour) as dd,count(*)
FROM (VALUES (1, '2016-02-05 20:00:00.000'::timestamp, '2016-02-07 5:00:00.000'::timestamp)
) v(ID, StartDate, EndDate), lateral
generate_series(StartDate, EndDate - interval '1 hour', interval '1 hour') g(dayhour)
GROUP BY dd
ORDER BY dd;
Technically, the lateral is not needed (and in that case, I would replace the comma with cross join). However, this is an example of a lateral join, so being explicit is good.
I should also note that the above is the simplest method. However, the group by does slow down the query. There are other methods that don't require generating a series for every hour.