How to us CASE WHEN aggregation with bookshelf

How to us CASE WHEN aggregation with bookshelf - sql

I'm working with PostgreSQL and bookshelf and trying to run a simple SQL query in order to get multiple counts in a single query.
This query look like:
SELECT SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_two_months,
SUM(CASE WHEN date_last_check > (now() - interval '7 DAY') THEN 1 ELSE 0 END) as since_one_week,
SUM(CASE WHEN date_last_check > (now() - interval '1 DAY') THEN 1 ELSE 0 END) as since_one_days
FROM myTable;
It seems impossible to do a CASE statement in a sum() function in bookshelf. I'm tried:
return myTable.query(function(qb:any){
qb.sum("(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_two_months")
})
And this returns the following query:
select sum("(SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END)") as "since_two_months" from "myTable"
This does not work because of the quotes after the sum(").
Does anyone know how to make this work without using a raw query?

I found a poor solution, it's to use knew raw inside the bookshelf query :
return myTable.query(function(qb:any){
qb.select(bookshelf.knex.raw("SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_one_month"));
})

Rather use modern syntax for conditional aggregates: the aggregate FILTER clause:
SELECT count(*) FILTER (WHERE date_last_check > now() - interval '1 month') AS since_two_months -- one_month?
, count(*) FILTER (WHERE date_last_check > now() - interval '7 days') AS since_one_week
, count(*) FILTER (WHERE date_last_check > now() - interval '1 day') AS since_one_day
FROM mytable;
See:
Aggregate columns with additional (distinct) filters

Related

Create a report with 2 rows of values, each of which is from a separate SELECT statement

I have a report (using Blazer, if you care) that displays data like this, of recently updated or created rows in the jobs table:
5 Minutes | 1 Hour | 1 Day | Total
----------------------------------
0 0 367 30,989
The SQL looks something like this:
SELECT
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
) as "Total"
;
I want to add a second row, for jobs WHERE "Jobs"."active" IS TRUE. How do I make this display another row?
I want the final result to be something like this:
Status | 5 Minutes | 1 Hour | 1 Day | Total
-------------------------------------------
* 0 0 367 30,989
Active 0 0 123 24,972
The labels are not the issue. The only thing that's not obvious is how to create a new row.

The simplest way is to UNION on another bunch of queries, that have this more restrictive where clause:
SELECT
'*' as Kind,
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
) as "Total"
UNION ALL
SELECT
'Active',
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW()
) as "5 Minutes",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW()
) as "1 Hour",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE AND "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW()
) as "1 Day",
(SELECT COUNT(*)
FROM public.jobs AS "Jobs"
WHERE "Jobs"."IsActive" IS TRUE
) as "Total"

If I were you, I would prefer this way to resolve your query:
select
"Jobs"."active" as Status,
sum(case when "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '5 minutes' AND NOW() then 1 else 0 end) as "5 Minutes",
sum(case when "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Hours' AND NOW() then 1 else 0 end) as "1 Hour",
sum(case when "Jobs"."updated_at" "Jobs"."updated_at" BETWEEN NOW() - INTERVAL '1 Day' AND NOW() then 1 else 0 end) as "1 Day",
count(*) as "Total"
from public.jobs AS "Jobs"
group by "Jobs"."active"
This way you read your table public.jobs once, and not several times (once per count). With this choice, grouping by the status is a simple group by operation

Basically, you want conditional aggregation. In Postgres, that would normally use filter:
SELECT COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '5 minute' AND NOW()) as cnt_5_minutes,
COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '1 hour' AND NOW()) as cnt_1_hour,
COUNT(*) FILTER (WHERE j."updated_at" BETWEEN NOW() - INTERVAL '1 day' AND NOW()) as cnt_1_day,
COUNT(*) as Total
FROM public.jobs j;
You probably don't have future update dates, so this would more simply be written as:
SELECT COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '5 minute') as cnt_5_minutes,
COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '1 hour') as cnt_1_hour,
COUNT(*) FILTER (WHERE j."updated_at" >= NOW() - INTERVAL '1 day') as cnt_1_day,
COUNT(*) as Total
FROM public.jobs j;
In addition, I would advise you to drop the double quotes from updated_at. Using double quotes around identifiers is just a bad habit.

The only thing that's not obvious is how to create a new row.
Basically, add a second row with UNION ALL.
First get rid of all the separate SELECT queries for each metric, though. That's needlessly expensive (important if the table is not trivially small). A single SELECT with conditional aggregates can replace all of your original (like Gordon suggested). In Postgres 9.4 or later, the aggregate FILTER clause is the way to go. See:
Aggregate columns with additional (distinct) filters
To get another row you could just run a another query adding the filter "active" IS TRUE to each expression (which boils down to just active, as a boolean column needs no further evaluation).
But that would double the cost again, and we can avoid that. Run a single SELECT in a CTE, and the split results with UNION ALL in the outer query:
WITH cte AS (
SELECT count(*) FILTER (WHERE updated_at > now() - interval '5 min') AS ct_5min
, count(*) FILTER (WHERE updated_at > now() - interval '5 min' AND active) AS ct_5min_a
, count(*) FILTER (WHERE updated_at > now() - interval '1 hour') AS ct_1h
, count(*) FILTER (WHERE updated_at > now() - interval '1 hour' AND active) AS ct_1h_a
, count(*) FILTER (WHERE updated_at > now() - interval '1 day') AS ct_1d
, count(*) FILTER (WHERE updated_at > now() - interval '1 day' AND active) AS ct_1d_a
, count(*) AS ct_all
, count(*) FILTER (WHERE active) AS ct_all_a
FROM public.jobs
)
SELECT '*' AS status, ct_5min, ct_1h, ct_1d, ct_all
FROM cte
UNION ALL
SELECT 'Active', ct_5min_a, ct_1h_a, ct_1d_a, ct_all_a
FROM cte

Postgres: must appear in the GROUP BY while I am using an aggregate function

When I run the sql as below I receive the following error message:
column subQuery.numbers must appear in the GROUP BY clause or be used in an aggregate function**"
I don't understand why this error comes out while I'm using aggregate functions sum and count in left join with an alias.
I think that the parent query doesn't recognize the subquery with its alias("subQuery").
I'm trying to find the solution but i didn't find other cases like me.
Could you please explain me why this error comes out while aggregate function used?
select to_char(customer1.date_time, 'MM-dd') as DateTime,
(case when subQuery.numbers is null then 0 else subQuery.numbers end) as "2019-numbers",
(case when subQuery.amount is null then 0 else subQuery.amount end) as "2019-amount"
from customer_table customer1
left join (
select to_char(customer2.date_time, 'MM-dd') as DateTime,
count(*) as numbers,
sum(amount) as amount
from customer_table customer2
where customer2.date_time > date_trunc('day', (now() - interval '1 day') - interval '1 year')
and customer2.date_time < date_trunc('day', now() - interval '1 year')
and customer2.status = 'OK'
group by to_char(customer2.date_time, 'MM-dd')) as subQuery on subQuery.DateTime = to_char(customer1.date_time, 'MM-dd')
where customer1.date_time > date_trunc('day', now() - interval '1 day')
and customer1.date_time < current_date - interval '1 day' + time '23:59'
group by to_char(customer1.date_time, 'MM-dd');

Simply add subQuery.numbers and subQuery.amount (they appear in the SELECT list) to the GROUP BY clause.
Otherwise, which of the several subQuery.numbers that belong to one to_char(customer1.date_time, 'MM-dd') should be used?

How to group by only one column?

I would like to select only one column (Failed_operation) and distinct column (SN) with hide column as below code but I got error
ERROR: column "rw_pcba.sn" must appear in the GROUP BY clause or be used in an aggregate function
I tried remove distinct on (SN) then the result was appear but result are including duplicate SN too. I don't want duplicate SN in result.
SELECT DISTINCT ON (sn) Failed_operation
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 0
AND (extract(day FROM NOW() - fail_timestamp)) <= 15 THEN 1 ELSE NULL END) AS AgingLessThan15
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 15
AND (extract(day FROM NOW() - fail_timestamp)) <= 30 THEN 1 ELSE NULL END) AS Aging16To30
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 30
AND (extract(day FROM NOW() - fail_timestamp)) <= 60 THEN 1 ELSE NULL END) AS Aging31To60
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) > 60 THEN 1 ELSE NULL END) AS AgingGreaterThan60
,count(CASE WHEN (extract(day FROM NOW() - fail_timestamp)) <= 0 THEN 1 ELSE NULL END) AS Aging0
FROM rw_pcba
WHERE rework_status = 'In-Process'
GROUP BY Failed_operation
ORDER BY sn
,Failed_operation ASC

You need to group by using the column sn, when you are using group by then it would be distinct combination of sn and failed_operation you don't have to specify distinct.
SELECT sn, Failed_operation,
count (case when (extract(day from NOW() - fail_timestamp)) >0 and (extract(day from NOW() - fail_timestamp))<=15 then 1 else null end) as AgingLessThan15,
count (case when (extract(day from NOW() - fail_timestamp)) >15 and (extract(day from NOW() - fail_timestamp))<=30 then 1 else null end) as Aging16To30,
count (case when (extract(day from NOW() - fail_timestamp)) >30 and (extract(day from NOW() - fail_timestamp))<=60 then 1 else null end) as Aging31To60,
count (case when (extract(day from NOW() - fail_timestamp)) >60 then 1 else null end) as AgingGreaterThan60,
count (case when (extract(day from NOW() - fail_timestamp)) <=0 then 1 else null end) as Aging0
FROM rw_pcba where rework_status='In-Process'
GROUP by sn,Failed_operation ORDER BY sn,Failed_operation ASC

You want to aggregate by sn as well as failed_operation. I also think you can simplify the calculation of each column:
SELECT sn, Failed_operation,
count(*) filter (where fail_timestamp > current_date and fail_timestamp < current_date + interval '15 day') as AgingLessThan15,
count(*) filter (where fail_timestamp > current_date + interval '15 day' and fail_timestamp < current_date + interval '30 day') as Aging16To30,
count(*) filter (where fail_timestamp > current_date + interval '30 day' and fail_timestamp < current_date + interval '600 day') as Aging31To60,
count(*) filter (where fail_timestamp > current_date + interval '60 day') as AgingGreaterThan60,
count(*) filter (where fail_timestamp <= current_date) as Aging0
FROM rw_pcba
WHERE rework_status = 'In-Process'
GROUP BY sn, Failed_operation
ORDER BY sn, Failed_operation ASC;
I prefer direct date comparisons for this type of logic rather than working with the difference between the dates. I simply find it easier to follow. For instance, using current_date rather than now() removes the question of what happens to the time component of now().
EDIT:
In older versions of Postgres, you can phrase this using sum:
sum( (fail_timestamp > current_date and fail_timestamp < current_date + interval '15 day')::int ) as AgingLessThan15,

Postgres: Count over a series of days

I have created the following query which returns 3 values for 1 day ('20170731'). What I am struggling to figure out is how do I run this query for everyday in series from 30 days ago to 60 days from now and return a row for each day.
SELECT DATE_TRUNC('day', '20170731'::TIMESTAMP),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions
WHERE DATE_TRUNC('day', run) >= '20170731'
AND DATE_TRUNC('day', created_at) <= '20170731'
ORDER BY 1

You can use generate_series() to generate the dates. The idea is:
SELECT gs.dte,
SUM( (state NOT IN ('unsub','skipped', 'error'))::int) AS a,
SUM( (state IN ('unsub') AND DATE_TRUNC('month', unsub_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS b,
SUM( (state IN ('skipped') AND DATE_TRUNC('month', skipped_at) BETWEEN gs.dte AND DATE_TRUNC('day', NOW()))::int) AS c
FROM subscriptions s CROSS JOIN
generate_series(current_date - interval '30 day',
current_date + interval '60 day',
interval '1 day'
) gs(dte)
WHERE DATE_TRUNC('day', run) >= gs.dte AND
DATE_TRUNC('day', created_at) <= gs.dte
GROUP BY gs.dte
ORDER BY 1;
I switched the query to cast the booleans as integers -- I just find that easier to follow.

See Set Returning Functions. The generate_series function is what you want.
First check this, so you know what it does:
SELECT
*
FROM
generate_series(
'2017-07-31'::TIMESTAMP - INTERVAL '30 days',
'2017-07-31'::TIMESTAMP + INTERVAL '60 days',
INTERVAL '1 day');
Then your query could look something like that:
SELECT DATE_TRUNC('day', stamp),
COUNT(CASE WHEN state NOT IN ('unsub','skipped', 'error') THEN 1 ELSE NULL END) AS a,
COUNT(CASE WHEN (state IN ('unsub')) AND (DATE_TRUNC('month', unsub_at) BETWEEN '20170731' AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS b,
COUNT(CASE WHEN (state IN ('skipped')) AND (DATE_TRUNC('month', skipped_at) BETWEEN stamp AND DATE_TRUNC('day', NOW())) THEN 1 ELSE NULL END) AS c
FROM subscriptions,
generate_series('2017-07-31'::TIMESTAMP - INTERVAL '30 days', '2017-07-31'::TIMESTAMP + INTERVAL '60 days', INTERVAL '1 day') AS stamp
WHERE DATE_TRUNC('day', run) >= stamp
AND DATE_TRUNC('day', created_at) <= stamp
ORDER BY 1
Just add generate_series function as you would do with plain input table (alias it AS stamp), JOIN with subscriptions (cartesian product) and use stamp value instead of hard-coded '20170731'.

Two SELECT statements as two columns

This seems like a simple query, but I'm struggling with it.
Here's a sampling of my data.
user_id dated
463 2016-01-01
463 2016-01-02
1456 2016-01-01
1456 2016-01-02
1398 2015-12-01
1398 2015-12-02
I want to get the number of unique users in two different time periods. Here are the queries I want to get a combined output from in a single row, and two columns.
-- 60
SELECT COUNT(DISTINCT(tld.user_id)) count_active_users_60
FROM table tld
WHERE tld.dated BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days')
-- 30
SELECT COUNT(DISTINCT(tld.user_id)) count_active_users_30
FROM table tld
WHERE tld.dated >= NOW() - INTERVAL '30 days'
I'd like an output that looks like this:
count_active_users_60 count_active_users_30
1 2
I've been messing with various CASE statements, and sub-selects, but the distinct clause is throwing me off.
SELECT COUNT(DISTINCT(rar.user_id))
FROM
(
SELECT user_id,
COUNT(CASE WHEN tld.dated BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days') THEN 1 ELSE NULL END) AS count_active_users_60,
COUNT(CASE WHEN tld.dated >= NOW() - INTERVAL '30 days' THEN 1 ELSE NULL END) AS count_active_users_30
FROM testing_login_duration tld
GROUP BY user_id
) rar;

Use conditional aggregation:
SELECT COUNT(DISTINCT CASE WHEN tld.dated BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days')
THEN tld.user_id
END) count_active_users_60,
COUNT(DISTINCT CASE WHEN tld.dated >= NOW() - INTERVAL '30 days'
THEN tld.user_id
END) count_active_users_30
FROM table tld
WHERE tld.dated >= NOW() - INTERVAL '60 days';

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to us CASE WHEN aggregation with bookshelf - sql

I found a poor solution, it's to use knew raw inside the bookshelf query : return myTable.query(function(qb:any){ qb.select(bookshelf.knex.raw("SUM(CASE WHEN date_last_check > (now() - interval '1 MONTH') THEN 1 ELSE 0 END) as since_one_month")); })

Related

Create a report with 2 rows of values, each of which is from a separate SELECT statement

Postgres: must appear in the GROUP BY while I am using an aggregate function

How to group by only one column?

Postgres: Count over a series of days

Two SELECT statements as two columns

Categories

Resources