How to group by only one column? - sql

I would like to select only one column (Failed_operation) and distinct column (SN) with hide column as below code but I got error
ERROR: column "rw_pcba.sn" must appear in the GROUP BY clause or be used in an aggregate function
I tried remove distinct on (SN) then the result was appear but result are including duplicate SN too. I don't want duplicate SN in result.
SELECT DISTINCT ON (sn) Failed_operation
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 0
AND (extract(day FROM NOW() - fail_timestamp)) <= 15 THEN 1 ELSE NULL END) AS AgingLessThan15
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 15
AND (extract(day FROM NOW() - fail_timestamp)) <= 30 THEN 1 ELSE NULL END) AS Aging16To30
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 30
AND (extract(day FROM NOW() - fail_timestamp)) <= 60 THEN 1 ELSE NULL END) AS Aging31To60
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 60 THEN 1 ELSE NULL END) AS AgingGreaterThan60
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) <= 0 THEN 1 ELSE NULL END) AS Aging0
FROM rw_pcba
WHERE rework_status = 'In-Process'
GROUP BY Failed_operation
ORDER BY sn
,Failed_operation ASC

You need to group by using the column sn, when you are using group by then it would be distinct combination of sn and failed_operation you don't have to specify distinct.
SELECT sn, Failed_operation,
count (case when (extract(day from NOW() - fail_timestamp)) >0 and (extract(day from NOW() - fail_timestamp))<=15 then 1 else null end) as AgingLessThan15,
count (case when (extract(day from NOW() - fail_timestamp)) >15 and (extract(day from NOW() - fail_timestamp))<=30 then 1 else null end) as Aging16To30,
count (case when (extract(day from NOW() - fail_timestamp)) >30 and (extract(day from NOW() - fail_timestamp))<=60 then 1 else null end) as Aging31To60,
count (case when (extract(day from NOW() - fail_timestamp)) >60 then 1 else null end) as AgingGreaterThan60,
count (case when (extract(day from NOW() - fail_timestamp)) <=0 then 1 else null end) as Aging0
FROM rw_pcba where rework_status='In-Process'
GROUP by sn,Failed_operation ORDER BY sn,Failed_operation ASC

You want to aggregate by sn as well as failed_operation. I also think you can simplify the calculation of each column:
SELECT sn, Failed_operation,
count(*) filter (where fail_timestamp > current_date and fail_timestamp < current_date + interval '15 day') as AgingLessThan15,
count(*) filter (where fail_timestamp > current_date + interval '15 day' and fail_timestamp < current_date + interval '30 day') as Aging16To30,
count(*) filter (where fail_timestamp > current_date + interval '30 day' and fail_timestamp < current_date + interval '600 day') as Aging31To60,
count(*) filter (where fail_timestamp > current_date + interval '60 day') as AgingGreaterThan60,
count(*) filter (where fail_timestamp <= current_date) as Aging0
FROM rw_pcba
WHERE rework_status = 'In-Process'
GROUP BY sn, Failed_operation
ORDER BY sn, Failed_operation ASC;
I prefer direct date comparisons for this type of logic rather than working with the difference between the dates. I simply find it easier to follow. For instance, using current_date rather than now() removes the question of what happens to the time component of now().
EDIT:
In older versions of Postgres, you can phrase this using sum:
sum( (fail_timestamp > current_date and fail_timestamp < current_date + interval '15 day')::int ) as AgingLessThan15,

Related

How to us CASE WHEN aggregation with bookshelf

I'm working with PostgreSQL and bookshelf and trying to run a simple SQL query in order to get multiple counts in a single query.
This query look like:
SELECT SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_two_months,
SUM(CASE WHEN date_last_check > (now() - interval '7 DAY') THEN 1 ELSE 0 END) as since_one_week,
SUM(CASE WHEN date_last_check > (now() - interval '1 DAY') THEN 1 ELSE 0 END) as since_one_days
FROM myTable;
It seems impossible to do a CASE statement in a sum() function in bookshelf. I'm tried:
return myTable.query(function(qb:any){
qb.sum("(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_two_months")
})
And this returns the following query:
select sum("(SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END)") as "since_two_months" from "myTable"
This does not work because of the quotes after the sum(").
Does anyone know how to make this work without using a raw query?
I found a poor solution, it's to use knew raw inside the bookshelf query :
return myTable.query(function(qb:any){
qb.select(bookshelf.knex.raw("SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_one_month"));
})
Rather use modern syntax for conditional aggregates: the aggregate FILTER clause:
SELECT count(*) FILTER (WHERE date_last_check > now() - interval '1 month') AS since_two_months -- one_month?
, count(*) FILTER (WHERE date_last_check > now() - interval '7 days') AS since_one_week
, count(*) FILTER (WHERE date_last_check > now() - interval '1 day') AS since_one_day
FROM mytable;
See:
Aggregate columns with additional (distinct) filters

Get columns of data with two different date range

I would like to get the average rating for last 7 days and last 14 days.
I tried using WITH AS to get the data but it's taking way too long to load. Any other way that is better and could reduce the run time?
syntax:
WITH last_7_days AS (
SELECT item, rating
FROM sales
WHERE (
rating IS NOT NULL
AND (entry_date >= CAST((CAST(now() AS timestamp) + (INTERVAL '-7 day')) AS date) AND entry_date < CAST((CAST(now() AS timestamp) + (INTERVAL '1 day')) AS date))
)
),
last_14_days AS (
SELECT item, rating
FROM sales
WHERE (
rating IS NOT NULL
AND (entry_date >= CAST((CAST(now() AS timestamp) + (INTERVAL '-14 day')) AS date) AND entry_date < CAST((CAST(now() AS timestamp) + (INTERVAL '1 day')) AS date))
)
)
SELECT last_7_days.item, avg(last_7_days.score) as "avg_last_7_days", avg(last_14_days.rating) as "avg_last_14_days", count(*) AS "count"
FROM last_7_days, last_14_days
WHERE last_7_days.item = last_14_days.item
GROUP BY last_7_days.item
ORDER BY "avg_last_7_days" DESC, last_7_days.item ASC
Result should be something like this:
item|avg_last_7_days|avg_last_14_days|count|
thank you
Use conditional aggregation:
SELECT item,
AVG(rating) FILTER (WHERE entry_date >= NOW() + interval '-7 day' AND entry_date < NOW() + interval '1 day') AS avg_rating_last_seven_days,
AVG(rating) FILTER (WHERE entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day') AS avg_rating_last_fourteen_days
FROM sales
WHERE rating IS NOT NULL AND
(entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day')
GROUP BY item;
Note: If you only care about the date, then perhaps you should use CURRENT_DATE or even NOW()::date.
Getting rid of all the casts and aggregating directly on the CTEs should help, try with the following:
WITH last_7_days AS (
SELECT
item,
AVG(rating) AS avg_rating_last_seven_days
FROM
sales
WHERE
rating IS NOT NULL AND
(entry_date >= NOW() + interval '-7 day' AND entry_date < NOW() + interval '1 day')
GROUP BY
1
),
last_14_days AS (
SELECT
item,
AVG(rating) AS avg_rating_last_fourteen_days
FROM
sales
WHERE
rating IS NOT NULL AND
(entry_date >= NOW() + interval '-14 day' AND entry_date < NOW() + interval '1 day')
GROUP BY
1
)
SELECT
lsd.item,
avg_rating_last_seven_days,
avg_rating_last_fourteen_days
FROM
last_7_days AS lsd
INNER JOIN
last_14_days AS lfd ON lsd.item = lfd.item
Let me know in case it helped on improving your current performance!

get List of counts from table based on dates in sql

I have to fetch List of counts from table by department here is my table structure
empid empname department departmentId joinedon
i want to populate all the joined employee on today , yesterday and More than 2 days like [12,25,89] i.e
12* joined today
25 joined yesterday
81 joined all prior to yesterday(2+day)
* 0 if there isn't any entries for given date range.
You would use aggregation on a case expression:
select (case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'older'
end) as grp,
count(*)
from t
group by grp;
In additional to #Gordon Linoff answer:
SELECT
days.day,
coalesce(t.cnt, 0) count
FROM (
SELECT * FROM (VALUES ('today'), ('yesterday'), ('older')) AS days (day)
)days
LEFT JOIN (
SELECT (CASE WHEN joinedon::date = current_date THEN 'today'
WHEN joinedon::date = current_date - interval '1 day' THEN 'yesterday'
WHEN joinedon::date < current_date - interval '1 day' THEN 'older'
end) as day,
count(*) cnt
FROM t
GROUP BY day
) t on t.day = days.day;
Test it here
You can use the group by as follows:
select department,
(case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'More than 2 days'
end) as grp,
Coalesce(count(*),0)
from t
group by grp, department;

How to save the results of a select expression as a variable?

In the following Postgresql sql, is there a way to save mo.delivered_at - mo.created_at as a variable so I don't have to repeat myself?
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '3 days' and mo.delivered_at - mo.created_at <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '6 days' and mo.delivered_at - mo.created_at <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '9 days') as ten_days,
min(mo.delivered_at - mo.created_at),
max(mo.delivered_at - mo.created_at),
percentile_disc(0.5) within group (order by mo.delivered_at - mo.created_at) as median,
avg(mo.delivered_at - mo.created_at) as average
FROM medication_order mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name
You can compute the derived value in a subquery or CTE as has been suggested.
But there is more. This should be faster (and correct). And can be sorted properly, too:
SELECT
to_char(mo.month,'MM-YYYY') AS month, -- optionally prettify
mo.sku,
s.condition_id, -- I added this to make the result unambiguous
(SELECT name FROM condition WHERE id = s.condition_id) AS condition_name,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE mo.my_interval < interval '3 days') AS three_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '3 days' AND mo.my_interval <= interval '6 days') AS six_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '6 days' AND mo.my_interval <= interval '9 days') AS nine_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '9 days') AS ten_days,
min(mo.my_interval),
max(mo.my_interval),
percentile_disc(0.5) WITHIN GROUP (ORDER BY mo.my_interval) AS median,
avg(mo.my_interval) AS average
FROM (
SELECT
date_trunc('month', mo.created_at) AS month, -- faster, keeps ORDER
delivered_at - created_at AS my_interval, -- your core request
sku_key AS sku
FROM medication_order mo
WHERE status = 'DELIVERED' -- filter early
AND payment_preference = 'INSURANCE'
AND created_at > '2020-01-01'
AND delivered_at IS NOT NULL
AND sku_key <> 'manual_order_sku'
) mo
LEFT JOIN subscription s ON s.id = mo.subscription_id
GROUP BY mo.month, mo.sku, s.condition_id -- GROUP BY unique ID! Correct - and cheaper, too
ORDER BY mo.month, mo.sku, s.condition_id; -- my addition: sorting by date works across years, 'MM-YYYY' does not
Aside: condition.name should probably be UNIQUE. And "name" is almost never a good name.
You could just compute the information in a subquery when selecting from the table:
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivery_interval < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '3 days' and mo.delivery_interval <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '6 days' and mo.delivery_interval <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '9 days') as ten_days,
min(mo.delivery_interval),
max(mo.delivery_interval),
percentile_disc(0.5) within group (order by mo.delivery_interval) as median,
avg(mo.delivery_interval) as average
FROM (
SELECT mo.*, mo.delivery_interval delivery_interval --> here
FROM medication_order
) mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name

Postgres: Count over a series of days

I have created the following query which returns 3 values for 1 day ('20170731'). What I am struggling to figure out is how do I run this query for everyday in series from 30 days ago to 60 days from now and return a row for each day.
SELECT DATE_TRUNC('day', '20170731'::TIMESTAMP),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions
WHERE DATE_TRUNC('day', run) >= '20170731'
AND DATE_TRUNC('day', created_at) <= '20170731'
ORDER BY 1
You can use generate_series() to generate the dates. The idea is:
SELECT gs.dte,
SUM( (state NOT IN ('unsub','skipped', 'error'))::int) AS a,
SUM( (state IN ('unsub') AND DATE_TRUNC('month', unsub_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS b,
SUM( (state IN ('skipped') AND DATE_TRUNC('month', skipped_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS c
FROM subscriptions s CROSS JOIN
generate_series(current_date - interval '30 day',
current_date + interval '60 day',
interval '1 day'
) gs(dte)
WHERE DATE_TRUNC('day', run) >= gs.dte AND
DATE_TRUNC('day', created_at) <= gs.dte
GROUP BY gs.dte
ORDER BY 1;
I switched the query to cast the booleans as integers -- I just find that easier to follow.
See Set Returning Functions. The generate_series function is what you want.
First check this, so you know what it does:
SELECT
*
FROM
generate_series(
'2017-07-31'::TIMESTAMP - INTERVAL '30 days',
'2017-07-31'::TIMESTAMP + INTERVAL '60 days',
INTERVAL '1 day');
Then your query could look something like that:
SELECT DATE_TRUNC('day', stamp),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN stamp AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions,
generate_series('2017-07-31'::TIMESTAMP - INTERVAL '30 days', '2017-07-31'::TIMESTAMP + INTERVAL '60 days', INTERVAL '1 day') AS stamp
WHERE DATE_TRUNC('day', run) >= stamp
AND DATE_TRUNC('day', created_at) <= stamp
ORDER BY 1
Just add generate_series function as you would do with plain input table (alias it AS stamp), JOIN with subscriptions (cartesian product) and use stamp value instead of hard-coded '20170731'.