Querying data from last 30 days and 24 hours from different tables - sql

I am currently storing data in postgreSQL that is being displayed back to the user in a chart based on last 24 hours, 30 days and 3 months.
To get the last 24 hours worth of data, I just run the following code when the user requests for it:
SELECT COUNT(*)
FROM page_visits
WHERE page_id = '1111'
AND created_at >= NOW() - '1 day'::INTERVAL
I run a cron job every night to aggregate data for that day and store it in a different table (table_day), which only contains aggregated data.
So when the user requests for the last 30 days worth of data, I can run a code similar to the one above to get this month's data. However, it does not include today's data as it is not aggregated and stored in table_day.
So how can I run a query that gets the last 1 month worth of data from table page_visits and aggregated 24 hour data from table_day?
Or is this approach of storing data of different intervals completely wrong?
I intend to do something similar for monthly data, where a cron job runs at the end of every month to aggregate that month's data and stores it in table_month.
And the same question repeats with how I can query data from last last month and last month from table_month and aggregate this month's data from table_day in a single query.
page_visits
id
page_id
created_at
1
1111
2021-12-02T04:55:26.779Z
2
1442
2021-12-02T02:25:32.219Z
3
1111
2021-12-02T04:55:26.214Z
Table_day
id
page_id
visit_count
created_at
1
1111
2001
2021-13-02T04:55:26.779Z
2
1442
103
2021-13-02T02:25:32.219Z
3
1111
4024
2021-14-02T04:55:26.214Z

If you aggregate the data in the table_day table every end of day, then you can use these aggregate data to calculate the monthly's data and store them in the table_month table, and then you cas reuse the monthly's data to calculate the quarter's data and store them in the table_quarter table.
Every end of day :
INSERT INTO table_day
SELECT page_id, extract('day' from NOW()) AS day, count(*) AS count
FROM page_visits
WHERE created_at >= extract('day' from NOW())
GROUP BY page_id
Every end of calendar month :
INSERT INTO table_month
SELECT page_id, extract('month' from NOW()) AS month, count(*) AS count
FROM table_day
WHERE day >= extract('month' from NOW())
GROUP BY page_id
Every end of calendar quarter:
INSERT INTO table_quarter
SELECT page_id, extract('year' from NOW()) || '-' || extract('quarter' from NOW()) AS qurter, count(*) AS count
FROM table_month
WHERE month >= extract('month' from NOW()) - interval '2 months'
GROUP BY page_id

Related

Daily Rolling Count of Distinct Users on Different time periods

I am trying to find the most optimal way to run the following query which I need to connect to tableau and visualise. The idea is to count 7 day active users, 30 day active users and 90 day active users for each day. So for today I want who was active and for yesterday and I want who was active within those timeframes.
To clarify users can be active multiple times within my time frames.
A count of 7 day actives users would be the distinct number of users who had a session with in the period todays date and todays date -6. I need to calculate this for every date within the last 6 month.
This is the query I have.
with dau as (
select date_trunc('day', created_date) created_at,
count(distinct customer_id) dau
from sessions
where created_date >= date_trunc('day', dateadd('month', -6, getdate()))
group by date_trunc('day', created_date)
)
select created_at,
dau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 6 and created_at) wau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 29 and created_at) as mau,
(select count(distinct customer_id)
from session_s
where date_trunc('day', created_date) between created_at - 89 and created_at) as three_mau
from dau
It takes 30 min to run which seems crazy. Is there a better way to do it? I am also looking into the use of materialised views as a faster way to use this in a dashboard. Would this work?
The result I am looking to get would be a table where the rows are dates within the last 6 months and each column is the count of distinct users on 7, 30 and 90 periods from that date.
Thanks in advance!

Getting counts for overlapping time periods

I have a table data in PostgreSQL with this structure:
created_at. customer_email status
2020-12-31 xxx#gmail.com opened
...
2020-12-24 yyy#gmail.com delivered
2020-12-24 xxx#gmail.com opened
...
2020-12-17 zzz#gmail.com opened
2020-12-10 xxx#gmail.com opened
2020-12-03 hhh#gmail.com enqueued
2020-11-27 xxx#gmail.com opened
...
2020-11-20 rrr#gmail.com opened
2020-11-13 ttt#gmail.com opened
There are many rows for each day.
Basically I need 2021-W01 for this week with the count of unique emails with status "opened" within the last 90 days. Likewise for every week before that.
Desired output:
period active
2021-W01 1539
2020-W53 1480
2020-W52 1630
2020-W51 1820
2020-W50 1910
2020-W49 1890
2020-W48 2000
How can I do that?
Window functions would come to mind. Alas, those don't allow DISTINCT aggregations.
Instead, get distinct counts from a LATERAL subquery:
WITH weekly_dist AS (
SELECT DISTINCT date_trunc('week', created_at) AS wk, customer_email
FROM tbl
WHERE status = 'opened'
)
SELECT to_char(t.wk, 'YYYY"-W"IW') AS period, ct.active
FROM (
SELECT generate_series(date_trunc('week', min(created_at) + interval '1 week')
, date_trunc('week', now()::timestamp)
, interval '1 week') AS wk
FROM tbl
) t
LEFT JOIN LATERAL (
SELECT count(DISTINCT customer_email) AS active
FROM weekly_dist d
WHERE d.wk >= t.wk - interval '91 days'
AND d.wk < t.wk
) ct ON true;
db<>fiddle here
I operate with timestamp, not timestamptz, might make a corner case difference.
The CTE weekly_dist reduces the set to distinct "opened" emails. This step is strictly optional, but increases performance significantly if there can be more than a few duplicates per week.
The derived table t generates a timestamp for the begin of each week since the earliest entry in the table up to "now". This way I make sure no week is skipped,even if there are no rows for it. See:
PostgreSQL: running count of rows for a query 'by minute'
Generating time series between two dates in PostgreSQL
But I do skip the first week since I count active emails before each start of the week.
Then LEFT JOIN LATERAL to a subquery computing the distinct count for the 90-day time-range. To be precise, I deduct 91 days, and exclude the start of the current week. This happens to fall in line with the weekly pre-aggregated data from the CTE. Be wary of that if you shift bounds.
Finally, to_char(t.wk, 'YYYY"-W"IW') is a compact expression to get your desired format for week numbers. Details in the manual here.
You can combine the date_part() function with a group by like this:
SELECT
DATE_PART('year', created_at)::varchar || '-W' || DATE_PART('week', created_at)::varchar,
SUM(CASE WHEN status = 'opened' THEN 1 ELSE 0 END)
FROM
your_table
GROUP BY 1
ORDER BY created_at DESC

Extract records between days

I have an Audit table for each and every day. All add/modify/delete records are stored. When any record is deleted it doesn’t show up the next day. Something like below.
Date records
---- --------
15th 100
16th 102 - Pickup all records, between 15 and 16, which are not in 16th
17th 110 - Pickup all records, between 16 and 17, which are not in 17th
18th 150 - Pickup all records, between 17 and 18, which are not in 18th
.. So on..
This is an Audit table which has the deleted records in the previous day, but not present today. I need to pick up all the deleted records, between dates.
But I don’t want to hard code the dates, instead, it should work from date to today()
How to achieve this in a single SQL query? I tried using “Union” it works, but with hardcoded dates. Is there any way we can achieve as a generic query which works as of today.
You can use two levels of aggregation. The first gets the maximum date for each id. The second records on the delete on the next day:
select max_date + interval 1 day, count(*)
from (select a.id, max(date) as max_date
from audit a
group by a.id
) t
group by max_date
order by max_date;
You might want a where clause to limit the maximum date to before the maximum date in the data (otherwise everything will look like it is being deleted on the following day).
An alternative method uses lead():
select date + 1, count(*)
from (select a.*,
lead(date) over (partition by id order by date) as next_date
from audit a
) t
where next_date <> date_add(date, INTERVAL 1 DAY) or next_date is null
group by date
order by date;
If records can be resurrected and you still want to count them as deleted when they disappear, this is the better method.
Here is a db<>fiddle.

how to perform query in Postresql that returns a data count created grouped by month?

In postgresql, how do I perform a query that returns the sum amounts of rows created of a particular table by month? I would like the result to be something like:
month: January
count: 67
month: February
count: 85
....
....
Let's suppose a I have a table, users. This table has a primary key, id, and a created_at column with time stored in ISO8601 formatting. Last year n number of users were created, and now I want to know how many were created by month, and I want the data returned to me in the above format -- grouped by month and an associated count reflecting how many users were created that month.
Does anyone know how to perform the above SQL query in postgresql?
The query would look something like this:
select date_trunc('month', created_at) as mm, count(*)
from users u
where subscribed = true and
created_at >= '2016-01-01' and
created_at < '2017-01-01'
group by date_trunc('month', created_at);
I don't know where the constant '2017-03-20 13:38:46.688-04' is coming from.
Of course you can make the year comparison dynamic:
select date_trunc('month', created_at) as mm, count(*)
from users u
where subscribed = true and
created_at >= date_trunc('year', now()) - interval '1 year' and
created_at < date_trunc('year', now())
group by date_trunc('month', created_at);

How to select the count of total active listings week by week?

I want to get the count from a table of active apartment listings week by week. The table looks like this (except much longer):
id created_at delisted_at
2318867 2014-11-12 18:57:44 Null
2329665 2014-11-14 4:36:32 Null
1431098 2014-07-25 5:45:03 Null
1930123 2014-09-28 10:10:46 2014-09-28 10:10:45
2490774 2014-12-05 0:08:47 Null
To get an active listings for a single week, you have to check that created_at <= end_of_week and delisted_at > end_of_week.
The results table would like like a longer version of:
Week Number of Active Listings
5/1/2016 3024
5/8/2016 11234
5/15/2016 11234
I would also like to produce another results table month by month as opposed to week by week.
How do I write a query to achieve this behavior?
Here is an example for month. First generate a list of months using generate_series(). The rest is just joins and aggregations:
select g.mon_start, g.mon_end, count(a.id) as numActives
from (select g.mon_start, g.mon_start + interval '1 month' as mon_end
from generate_series('2016-01-01'::timestamp, '2016-06-01'::timestamp, interval '1 month'
) g(mon_start)
) g left join
actives a
on a.created_at < g.mon_end and
(a.delisted_at >= mon_end or a.delisted_at is null)
group by g.mon_start, g.mon_end
order by g.mon_start, g.mon_end;
A similar idea works for weeks as well.