In the following Postgresql sql, is there a way to save mo.delivered_at - mo.created_at as a variable so I don't have to repeat myself?
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '3 days' and mo.delivered_at - mo.created_at <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '6 days' and mo.delivered_at - mo.created_at <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '9 days') as ten_days,
min(mo.delivered_at - mo.created_at),
max(mo.delivered_at - mo.created_at),
percentile_disc(0.5) within group (order by mo.delivered_at - mo.created_at) as median,
avg(mo.delivered_at - mo.created_at) as average
FROM medication_order mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name
You can compute the derived value in a subquery or CTE as has been suggested.
But there is more. This should be faster (and correct). And can be sorted properly, too:
SELECT
to_char(mo.month,'MM-YYYY') AS month, -- optionally prettify
mo.sku,
s.condition_id, -- I added this to make the result unambiguous
(SELECT name FROM condition WHERE id = s.condition_id) AS condition_name,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE mo.my_interval < interval '3 days') AS three_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '3 days' AND mo.my_interval <= interval '6 days') AS six_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '6 days' AND mo.my_interval <= interval '9 days') AS nine_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '9 days') AS ten_days,
min(mo.my_interval),
max(mo.my_interval),
percentile_disc(0.5) WITHIN GROUP (ORDER BY mo.my_interval) AS median,
avg(mo.my_interval) AS average
FROM (
SELECT
date_trunc('month', mo.created_at) AS month, -- faster, keeps ORDER
delivered_at - created_at AS my_interval, -- your core request
sku_key AS sku
FROM medication_order mo
WHERE status = 'DELIVERED' -- filter early
AND payment_preference = 'INSURANCE'
AND created_at > '2020-01-01'
AND delivered_at IS NOT NULL
AND sku_key <> 'manual_order_sku'
) mo
LEFT JOIN subscription s ON s.id = mo.subscription_id
GROUP BY mo.month, mo.sku, s.condition_id -- GROUP BY unique ID! Correct - and cheaper, too
ORDER BY mo.month, mo.sku, s.condition_id; -- my addition: sorting by date works across years, 'MM-YYYY' does not
Aside: condition.name should probably be UNIQUE. And "name" is almost never a good name.
You could just compute the information in a subquery when selecting from the table:
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivery_interval < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '3 days' and mo.delivery_interval <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '6 days' and mo.delivery_interval <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '9 days') as ten_days,
min(mo.delivery_interval),
max(mo.delivery_interval),
percentile_disc(0.5) within group (order by mo.delivery_interval) as median,
avg(mo.delivery_interval) as average
FROM (
SELECT mo.*, mo.delivery_interval delivery_interval --> here
FROM medication_order
) mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name
Related
I would like to get the average rating for last 7 days and last 14 days.
I tried using WITH AS to get the data but it's taking way too long to load. Any other way that is better and could reduce the run time?
syntax:
WITH last_7_days AS (
SELECT item, rating
FROM sales
WHERE (
rating IS NOT NULL
AND (entry_date >= CAST((CAST(now() AS timestamp) + (INTERVAL '-7 day')) AS date) AND entry_date < CAST((CAST(now() AS timestamp) + (INTERVAL '1 day')) AS date))
)
),
last_14_days AS (
SELECT item, rating
FROM sales
WHERE (
rating IS NOT NULL
AND (entry_date >= CAST((CAST(now() AS timestamp) + (INTERVAL '-14 day')) AS date) AND entry_date < CAST((CAST(now() AS timestamp) + (INTERVAL '1 day')) AS date))
)
)
SELECT last_7_days.item, avg(last_7_days.score) as "avg_last_7_days", avg(last_14_days.rating) as "avg_last_14_days", count(*) AS "count"
FROM last_7_days, last_14_days
WHERE last_7_days.item = last_14_days.item
GROUP BY last_7_days.item
ORDER BY "avg_last_7_days" DESC, last_7_days.item ASC
Result should be something like this:
item|avg_last_7_days|avg_last_14_days|count|
thank you
Use conditional aggregation:
SELECT item,
AVG(rating) FILTER (WHERE entry_date >= NOW() + interval '-7 day' AND entry_date < NOW() + interval '1 day') AS avg_rating_last_seven_days,
AVG(rating) FILTER (WHERE entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day') AS avg_rating_last_fourteen_days
FROM sales
WHERE rating IS NOT NULL AND
(entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day')
GROUP BY item;
Note: If you only care about the date, then perhaps you should use CURRENT_DATE or even NOW()::date.
Getting rid of all the casts and aggregating directly on the CTEs should help, try with the following:
WITH last_7_days AS (
SELECT
item,
AVG(rating) AS avg_rating_last_seven_days
FROM
sales
WHERE
rating IS NOT NULL AND
(entry_date >= NOW() + interval '-7 day' AND entry_date < NOW() + interval '1 day')
GROUP BY
1
),
last_14_days AS (
SELECT
item,
AVG(rating) AS avg_rating_last_fourteen_days
FROM
sales
WHERE
rating IS NOT NULL AND
(entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day')
GROUP BY
1
)
SELECT
lsd.item,
avg_rating_last_seven_days,
avg_rating_last_fourteen_days
FROM
last_7_days AS lsd
INNER JOIN
last_14_days AS lfd ON lsd.item = lfd.item
Let me know in case it helped on improving your current performance!
I have a report (using Blazer, if you care) that displays data like this, of recently updated or created rows in the jobs table:
5 Minutes | 1 Hour | 1 Day | Total
----------------------------------
0 0 367 30,989
The SQL looks something like this:
SELECT
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
) as "Total"
;
I want to add a second row, for jobs WHERE "Jobs"."active" IS TRUE. How do I make this display another row?
I want the final result to be something like this:
Status | 5 Minutes | 1 Hour | 1 Day | Total
-------------------------------------------
* 0 0 367 30,989
Active 0 0 123 24,972
The labels are not the issue. The only thing that's not obvious is how to create a new row.
The simplest way is to UNION on another bunch of queries, that have this more restrictive where clause:
SELECT
'*' as Kind,
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
) as "Total"
UNION ALL
SELECT
'Active',
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE
) as "Total"
If I were you, I would prefer this way to resolve your query:
select
"Jobs"."active" as Status,
sum(case when "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW() then 1 else 0 end) as "5 Minutes",
sum(case when "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW() then 1 else 0 end) as "1 Hour",
sum(case when "Jobs"."updated_at" "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW() then 1 else 0 end) as "1 Day",
count(*) as "Total"
from public.jobs AS "Jobs"
group by "Jobs"."active"
This way you read your table public.jobs once, and not several times (once per count). With this choice, grouping by the status is a simple group by operation
Basically, you want conditional aggregation. In Postgres, that would normally use filter:
SELECT COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '5 minute' AND NOW()) as cnt_5_minutes,
COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '1 hour' AND NOW()) as cnt_1_hour,
COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '1 day' AND NOW()) as cnt_1_day,
COUNT(*) as Total
FROM public.jobs j;
You probably don't have future update dates, so this would more simply be written as:
SELECT COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '5 minute') as cnt_5_minutes,
COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '1 hour') as cnt_1_hour,
COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '1 day') as cnt_1_day,
COUNT(*) as Total
FROM public.jobs j;
In addition, I would advise you to drop the double quotes from updated_at. Using double quotes around identifiers is just a bad habit.
The only thing that's not obvious is how to create a new row.
Basically, add a second row with UNION ALL.
First get rid of all the separate SELECT queries for each metric, though. That's needlessly expensive (important if the table is not trivially small). A single SELECT with conditional aggregates can replace all of your original (like Gordon suggested). In Postgres 9.4 or later, the aggregate FILTER clause is the way to go. See:
Aggregate columns with additional (distinct) filters
To get another row you could just run a another query adding the filter "active" IS TRUE to each expression (which boils down to just active, as a boolean column needs no further evaluation).
But that would double the cost again, and we can avoid that. Run a single SELECT in a CTE, and the split results with UNION ALL in the outer query:
WITH cte AS (
SELECT count(*) FILTER (WHERE updated_at > now() - interval '5 min') AS ct_5min
, count(*) FILTER (WHERE updated_at > now() - interval '5 min' AND active) AS ct_5min_a
, count(*) FILTER (WHERE updated_at > now() - interval '1 hour') AS ct_1h
, count(*) FILTER (WHERE updated_at > now() - interval '1 hour' AND active) AS ct_1h_a
, count(*) FILTER (WHERE updated_at > now() - interval '1 day') AS ct_1d
, count(*) FILTER (WHERE updated_at > now() - interval '1 day' AND active) AS ct_1d_a
, count(*) AS ct_all
, count(*) FILTER (WHERE active) AS ct_all_a
FROM public.jobs
)
SELECT '*' AS status, ct_5min, ct_1h, ct_1d, ct_all
FROM cte
UNION ALL
SELECT 'Active', ct_5min_a, ct_1h_a, ct_1d_a, ct_all_a
FROM cte
I have to fetch List of counts from table by department here is my table structure
empid empname department departmentId joinedon
i want to populate all the joined employee on today , yesterday and More than 2 days like [12,25,89] i.e
12* joined today
25 joined yesterday
81 joined all prior to yesterday(2+day)
* 0 if there isn't any entries for given date range.
You would use aggregation on a case expression:
select (case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'older'
end) as grp,
count(*)
from t
group by grp;
In additional to #Gordon Linoff answer:
SELECT
days.day,
coalesce(t.cnt, 0) count
FROM (
SELECT * FROM (VALUES ('today'), ('yesterday'), ('older')) AS days (day)
)days
LEFT JOIN (
SELECT (CASE WHEN joinedon::date = current_date THEN 'today'
WHEN joinedon::date = current_date - interval '1 day' THEN 'yesterday'
WHEN joinedon::date < current_date - interval '1 day' THEN 'older'
end) as day,
count(*) cnt
FROM t
GROUP BY day
) t on t.day = days.day;
Test it here
You can use the group by as follows:
select department,
(case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'More than 2 days'
end) as grp,
Coalesce(count(*),0)
from t
group by grp, department;
I have created the following query which returns 3 values for 1 day ('20170731'). What I am struggling to figure out is how do I run this query for everyday in series from 30 days ago to 60 days from now and return a row for each day.
SELECT DATE_TRUNC('day', '20170731'::TIMESTAMP),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions
WHERE DATE_TRUNC('day', run) >= '20170731'
AND DATE_TRUNC('day', created_at) <= '20170731'
ORDER BY 1
You can use generate_series() to generate the dates. The idea is:
SELECT gs.dte,
SUM( (state NOT IN ('unsub','skipped', 'error'))::int) AS a,
SUM( (state IN ('unsub') AND DATE_TRUNC('month', unsub_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS b,
SUM( (state IN ('skipped') AND DATE_TRUNC('month', skipped_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS c
FROM subscriptions s CROSS JOIN
generate_series(current_date - interval '30 day',
current_date + interval '60 day',
interval '1 day'
) gs(dte)
WHERE DATE_TRUNC('day', run) >= gs.dte AND
DATE_TRUNC('day', created_at) <= gs.dte
GROUP BY gs.dte
ORDER BY 1;
I switched the query to cast the booleans as integers -- I just find that easier to follow.
See Set Returning Functions. The generate_series function is what you want.
First check this, so you know what it does:
SELECT
*
FROM
generate_series(
'2017-07-31'::TIMESTAMP - INTERVAL '30 days',
'2017-07-31'::TIMESTAMP + INTERVAL '60 days',
INTERVAL '1 day');
Then your query could look something like that:
SELECT DATE_TRUNC('day', stamp),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN stamp AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions,
generate_series('2017-07-31'::TIMESTAMP - INTERVAL '30 days', '2017-07-31'::TIMESTAMP + INTERVAL '60 days', INTERVAL '1 day') AS stamp
WHERE DATE_TRUNC('day', run) >= stamp
AND DATE_TRUNC('day', created_at) <= stamp
ORDER BY 1
Just add generate_series function as you would do with plain input table (alias it AS stamp), JOIN with subscriptions (cartesian product) and use stamp value instead of hard-coded '20170731'.
I'm having difficulty grabbing rows from December (anything from the 3rd previous month). I'm attempting to count the amount of products sold within a certain time period. This is my current query:
SELECT
a.id,
a.default_code,
(
SELECT SUM(product_uom_qty)
AS
"Total Sold"
FROM
sale_order_line c
WHERE
c.product_id = a.id
),
(
SELECT SUM(product_uom_qty)
AS
"Month 3"
FROM sale_order_line c
WHERE
c.product_id = a.id
AND
MONTH(c.create_date) = MONTH(CURRENT_DATE - INTERVAL '3 Months')
AND
YEAR(c.create_date) = YEAR(CURRENT_DATE - INTERVAL '3 Months')
)
FROM
product_product a
This is what the DB looks like:
sale_order_line
product_id product_uom_qty create_date
33 230 2014-07-01 16:47:45.294313
product_product
id default_code
33 WHDXEB33
Here's the error I'm receiving:
ERROR: function month(timestamp without time zone) does not exist
LINE 21: MONTH(c.create_date) = MONTH(CURRENT_DATE - INTERVAL
Any help pointing me in the right direction?
Use date_trunc() to calculate timestamp bounds:
SELECT id, default_code
, (SELECT SUM(product_uom_qty)
FROM sale_order_line c
WHERE c.product_id = a.id
) AS "Total Sold"
, (SELECT SUM(product_uom_qty)
FROM sale_order_line c
WHERE c.product_id = a.id
AND c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
) AS "Month 3"
FROM product_product a;
To get December (now being February), use these expressions:
AND c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
date_trunc('month', now()) yields '2015-02-01 00:00', after subtracting 2 months, you get '2014-12-01 00:00'. So, "3 months" can be deceiving.
Also, be sure to use sargable expressions like demonstrated for faster performance and to allow index usage.
Alternatives
Depending on your actual DB design and data distribution, this may be faster:
SELECT a.id, a.default_code, c."Total Sold", c."Month 3"
FROM product_product a
LEFT JOIN (
SELECT product_id AS id
, SUM(product_uom_qty) AS "Total Sold"
, SUM(CASE WHEN c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
THEN product_uom_qty ELSE 0 END) AS "Month 3"
FROM sale_order_line
GROUP BY 1
) c USING (id);
Since you are selecting all rows, this is probably faster than correlated subqueries. While being at it, aggregate before you join, that's cheaper, yet.
When selecting a single or few products, this may actually be slower, though! Compare:
Aggregate a single column in query with many columns
Optimize GROUP BY query to retrieve latest record per user
Or with the FILTER clause in Postgres 9.4+:
...
, SUM(product_uom_qty)
FILTER (WHERE c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
) AS "Month 3"
...
Details:
Select multiple row values into single row with multi-table clauses
This will avoid the costly correlated subquery
select
pp.id, pp.default_code,
sum(sol.product_uom_qty) as "Total Sold",
sum((
date_trunc('month', pp.create_date) =
date_trunc('month', current_date) - interval '3 months'
)::int * sol.product_uom_qty
) as "Month 3"
from
product_product pp
left join
sale_order_line sol on pp.id = sol.product_id
group by 1, 2
The cast from boolean to integer results in 0 or 1 which is convenient to be multiplied by the value to be summed