Postgres: Count over a series of days

Postgres: Count over a series of days - sql

I have created the following query which returns 3 values for 1 day ('20170731'). What I am struggling to figure out is how do I run this query for everyday in series from 30 days ago to 60 days from now and return a row for each day.
SELECT DATE_TRUNC('day', '20170731'::TIMESTAMP),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions
WHERE DATE_TRUNC('day', run) >= '20170731'
AND DATE_TRUNC('day', created_at) <= '20170731'
ORDER BY 1

You can use generate_series() to generate the dates. The idea is:
SELECT gs.dte,
SUM( (state NOT IN ('unsub','skipped', 'error'))::int) AS a,
SUM( (state IN ('unsub') AND DATE_TRUNC('month', unsub_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS b,
SUM( (state IN ('skipped') AND DATE_TRUNC('month', skipped_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS c
FROM subscriptions s CROSS JOIN
generate_series(current_date - interval '30 day',
current_date + interval '60 day',
interval '1 day'
) gs(dte)
WHERE DATE_TRUNC('day', run) >= gs.dte AND
DATE_TRUNC('day', created_at) <= gs.dte
GROUP BY gs.dte
ORDER BY 1;
I switched the query to cast the booleans as integers -- I just find that easier to follow.

See Set Returning Functions. The generate_series function is what you want.
First check this, so you know what it does:
SELECT
*
FROM
generate_series(
'2017-07-31'::TIMESTAMP - INTERVAL '30 days',
'2017-07-31'::TIMESTAMP + INTERVAL '60 days',
INTERVAL '1 day');
Then your query could look something like that:
SELECT DATE_TRUNC('day', stamp),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN stamp AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions,
generate_series('2017-07-31'::TIMESTAMP - INTERVAL '30 days', '2017-07-31'::TIMESTAMP + INTERVAL '60 days', INTERVAL '1 day') AS stamp
WHERE DATE_TRUNC('day', run) >= stamp
AND DATE_TRUNC('day', created_at) <= stamp
ORDER BY 1
Just add generate_series function as you would do with plain input table (alias it AS stamp), JOIN with subscriptions (cartesian product) and use stamp value instead of hard-coded '20170731'.

Related

Get columns of data with two different date range

I would like to get the average rating for last 7 days and last 14 days.
I tried using WITH AS to get the data but it's taking way too long to load. Any other way that is better and could reduce the run time?
syntax:
WITH last_7_days AS (
SELECT item, rating
FROM sales
WHERE (
rating IS NOT NULL
AND (entry_date >= CAST((CAST(now() AS timestamp) + (INTERVAL '-7 day')) AS date) AND entry_date < CAST((CAST(now() AS timestamp) + (INTERVAL '1 day')) AS date))
)
),
last_14_days AS (
SELECT item, rating
FROM sales
WHERE (
rating IS NOT NULL
AND (entry_date >= CAST((CAST(now() AS timestamp) + (INTERVAL '-14 day')) AS date) AND entry_date < CAST((CAST(now() AS timestamp) + (INTERVAL '1 day')) AS date))
)
)
SELECT last_7_days.item, avg(last_7_days.score) as "avg_last_7_days", avg(last_14_days.rating) as "avg_last_14_days", count(*) AS "count"
FROM last_7_days, last_14_days
WHERE last_7_days.item = last_14_days.item
GROUP BY last_7_days.item
ORDER BY "avg_last_7_days" DESC, last_7_days.item ASC
Result should be something like this:
item|avg_last_7_days|avg_last_14_days|count|
thank you

Use conditional aggregation:
SELECT item,
AVG(rating) FILTER (WHERE entry_date >= NOW() + interval '-7 day' AND entry_date < NOW() + interval '1 day') AS avg_rating_last_seven_days,
AVG(rating) FILTER (WHERE entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day') AS avg_rating_last_fourteen_days
FROM sales
WHERE rating IS NOT NULL AND
(entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day')
GROUP BY item;
Note: If you only care about the date, then perhaps you should use CURRENT_DATE or even NOW()::date.

Getting rid of all the casts and aggregating directly on the CTEs should help, try with the following:
WITH last_7_days AS (
SELECT
item,
AVG(rating) AS avg_rating_last_seven_days
FROM
sales
WHERE
rating IS NOT NULL AND
(entry_date >= NOW() + interval '-7 day' AND entry_date < NOW() + interval '1 day')
GROUP BY
1
),
last_14_days AS (
SELECT
item,
AVG(rating) AS avg_rating_last_fourteen_days
FROM
sales
WHERE
rating IS NOT NULL AND
(entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day')
GROUP BY
1
)
SELECT
lsd.item,
avg_rating_last_seven_days,
avg_rating_last_fourteen_days
FROM
last_7_days AS lsd
INNER JOIN
last_14_days AS lfd ON lsd.item = lfd.item
Let me know in case it helped on improving your current performance!

get List of counts from table based on dates in sql

I have to fetch List of counts from table by department here is my table structure
empid empname department departmentId joinedon
i want to populate all the joined employee on today , yesterday and More than 2 days like [12,25,89] i.e
12* joined today
25 joined yesterday
81 joined all prior to yesterday(2+day)
* 0 if there isn't any entries for given date range.

You would use aggregation on a case expression:
select (case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'older'
end) as grp,
count(*)
from t
group by grp;

In additional to #Gordon Linoff answer:
SELECT
days.day,
coalesce(t.cnt, 0) count
FROM (
SELECT * FROM (VALUES ('today'), ('yesterday'), ('older')) AS days (day)
)days
LEFT JOIN (
SELECT (CASE WHEN joinedon::date = current_date THEN 'today'
WHEN joinedon::date = current_date - interval '1 day' THEN 'yesterday'
WHEN joinedon::date < current_date - interval '1 day' THEN 'older'
end) as day,
count(*) cnt
FROM t
GROUP BY day
) t on t.day = days.day;
Test it here

You can use the group by as follows:
select department,
(case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'More than 2 days'
end) as grp,
Coalesce(count(*),0)
from t
group by grp, department;

How to save the results of a select expression as a variable?

In the following Postgresql sql, is there a way to save mo.delivered_at - mo.created_at as a variable so I don't have to repeat myself?
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '3 days' and mo.delivered_at - mo.created_at <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '6 days' and mo.delivered_at - mo.created_at <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '9 days') as ten_days,
min(mo.delivered_at - mo.created_at),
max(mo.delivered_at - mo.created_at),
percentile_disc(0.5) within group (order by mo.delivered_at - mo.created_at) as median,
avg(mo.delivered_at - mo.created_at) as average
FROM medication_order mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name

You can compute the derived value in a subquery or CTE as has been suggested.
But there is more. This should be faster (and correct). And can be sorted properly, too:
SELECT
to_char(mo.month,'MM-YYYY') AS month, -- optionally prettify
mo.sku,
s.condition_id, -- I added this to make the result unambiguous
(SELECT name FROM condition WHERE id = s.condition_id) AS condition_name,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE mo.my_interval < interval '3 days') AS three_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '3 days' AND mo.my_interval <= interval '6 days') AS six_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '6 days' AND mo.my_interval <= interval '9 days') AS nine_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '9 days') AS ten_days,
min(mo.my_interval),
max(mo.my_interval),
percentile_disc(0.5) WITHIN GROUP (ORDER BY mo.my_interval) AS median,
avg(mo.my_interval) AS average
FROM (
SELECT
date_trunc('month', mo.created_at) AS month, -- faster, keeps ORDER
delivered_at - created_at AS my_interval, -- your core request
sku_key AS sku
FROM medication_order mo
WHERE status = 'DELIVERED' -- filter early
AND payment_preference = 'INSURANCE'
AND created_at > '2020-01-01'
AND delivered_at IS NOT NULL
AND sku_key <> 'manual_order_sku'
) mo
LEFT JOIN subscription s ON s.id = mo.subscription_id
GROUP BY mo.month, mo.sku, s.condition_id -- GROUP BY unique ID! Correct - and cheaper, too
ORDER BY mo.month, mo.sku, s.condition_id; -- my addition: sorting by date works across years, 'MM-YYYY' does not
Aside: condition.name should probably be UNIQUE. And "name" is almost never a good name.

You could just compute the information in a subquery when selecting from the table:
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivery_interval < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '3 days' and mo.delivery_interval <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '6 days' and mo.delivery_interval <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '9 days') as ten_days,
min(mo.delivery_interval),
max(mo.delivery_interval),
percentile_disc(0.5) within group (order by mo.delivery_interval) as median,
avg(mo.delivery_interval) as average
FROM (
SELECT mo.*, mo.delivery_interval delivery_interval --> here
FROM medication_order
) mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name

How to group by only one column?

I would like to select only one column (Failed_operation) and distinct column (SN) with hide column as below code but I got error
ERROR: column "rw_pcba.sn" must appear in the GROUP BY clause or be used in an aggregate function
I tried remove distinct on (SN) then the result was appear but result are including duplicate SN too. I don't want duplicate SN in result.
SELECT DISTINCT ON (sn) Failed_operation
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 0
AND (extract(day FROM NOW() - fail_timestamp)) <= 15 THEN 1 ELSE NULL END) AS AgingLessThan15
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 15
AND (extract(day FROM NOW() - fail_timestamp)) <= 30 THEN 1 ELSE NULL END) AS Aging16To30
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 30
AND (extract(day FROM NOW() - fail_timestamp)) <= 60 THEN 1 ELSE NULL END) AS Aging31To60
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 60 THEN 1 ELSE NULL END) AS AgingGreaterThan60
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) <= 0 THEN 1 ELSE NULL END) AS Aging0
FROM rw_pcba
WHERE rework_status = 'In-Process'
GROUP BY Failed_operation
ORDER BY sn
,Failed_operation ASC

You need to group by using the column sn, when you are using group by then it would be distinct combination of sn and failed_operation you don't have to specify distinct.
SELECT sn, Failed_operation,
count (case when (extract(day from NOW() - fail_timestamp)) >0 and (extract(day from NOW() - fail_timestamp))<=15 then 1 else null end) as AgingLessThan15,
count (case when (extract(day from NOW() - fail_timestamp)) >15 and (extract(day from NOW() - fail_timestamp))<=30 then 1 else null end) as Aging16To30,
count (case when (extract(day from NOW() - fail_timestamp)) >30 and (extract(day from NOW() - fail_timestamp))<=60 then 1 else null end) as Aging31To60,
count (case when (extract(day from NOW() - fail_timestamp)) >60 then 1 else null end) as AgingGreaterThan60,
count (case when (extract(day from NOW() - fail_timestamp)) <=0 then 1 else null end) as Aging0
FROM rw_pcba where rework_status='In-Process'
GROUP by sn,Failed_operation ORDER BY sn,Failed_operation ASC

You want to aggregate by sn as well as failed_operation. I also think you can simplify the calculation of each column:
SELECT sn, Failed_operation,
count(*) filter (where fail_timestamp > current_date and fail_timestamp < current_date + interval '15 day') as AgingLessThan15,
count(*) filter (where fail_timestamp > current_date + interval '15 day' and fail_timestamp < current_date + interval '30 day') as Aging16To30,
count(*) filter (where fail_timestamp > current_date + interval '30 day' and fail_timestamp < current_date + interval '600 day') as Aging31To60,
count(*) filter (where fail_timestamp > current_date + interval '60 day') as AgingGreaterThan60,
count(*) filter (where fail_timestamp <= current_date) as Aging0
FROM rw_pcba
WHERE rework_status = 'In-Process'
GROUP BY sn, Failed_operation
ORDER BY sn, Failed_operation ASC;
I prefer direct date comparisons for this type of logic rather than working with the difference between the dates. I simply find it easier to follow. For instance, using current_date rather than now() removes the question of what happens to the time component of now().
EDIT:
In older versions of Postgres, you can phrase this using sum:
sum( (fail_timestamp > current_date and fail_timestamp < current_date + interval '15 day')::int ) as AgingLessThan15,

SQL: Select average value of column for last hour and last day

I have a table like below image. What I need is to get average value of Volume column, grouped by User both for 1 hour and 24 hours ago. How can I use avg with two different date range in single query?

You can do it like:
SELECT user, AVG(Volume)
FROM mytable
WHERE created >= NOW() - interval '1 hour'
AND created <= NOW()
GROUP BY user
Few things to remember, you are executing the query on same server with same time zone. You need to group by the user to group all the values in volume column and then apply the aggregation function like avg to find average. Similarly if you need both together then you could do the following:
SELECT u1.user, u1.average, u2.average
FROM
(SELECT user, AVG(Volume) as average
FROM mytable
WHERE created >= NOW() - interval '1 hour'
AND created <= NOW()
GROUP BY user) AS u1
INNER JOIN
(SELECT user, AVG(Volume) as average
FROM mytable
WHERE created >= NOW() - interval '1 day'
AND created <= NOW()
GROUP BY user) AS u2
ON u1.user = u2.user

Use conditional aggregation. Postgres offers very convenient syntax using the FILTER clause:
SELECT user,
AVG(Volume) FILTER (WHERE created >= NOW() - interval '1 hour' AND created <= NOW()) as avg_1hour,
AVG(Volume) FILTER (WHERE created >= NOW() - interval '1 day' AND created <= NOW()) as avg_1day
FROM mytable
WHERE created >= NOW() - interval '1 DAY' AND
created <= NOW()
GROUP BY user;
This will filter out users who have had no activity in the past day. If you want all users -- even those with no recent activity -- remove the WHERE clause.
The more traditional method uses CASE:
SELECT user,
AVG(CASE WHEN created >= NOW() - interval '1 hour' AND created <= NOW() THEN Volume END) as avg_1hour,
AVG(CASE WHEN created >= NOW() - interval '1 day' AND created <= NOW() THEN Volume END) as avg_1day
. . .

SELECT User, AVG(Volume) , ( IIF(created < DATE_SUB(NOW(), INTERVAL 1 HOUR) , 1 , 0) )IntervalType
WHERE created < DATE_SUB(NOW(), INTERVAL 1 HOUR)
AND created < DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY User, (IIF(created < DATE_SUB(NOW(), INTERVAL 1 HOUR))
Please Tell me about it's result :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres: Count over a series of days - sql

Related

Get columns of data with two different date range

get List of counts from table based on dates in sql

How to save the results of a select expression as a variable?

How to group by only one column?

SQL: Select average value of column for last hour and last day

Categories

Resources