Sum of values from 3rd previous month - sql

I'm having difficulty grabbing rows from December (anything from the 3rd previous month). I'm attempting to count the amount of products sold within a certain time period. This is my current query:
SELECT
a.id,
a.default_code,
(
SELECT SUM(product_uom_qty)
AS
"Total Sold"
FROM
sale_order_line c
WHERE
c.product_id = a.id
),
(
SELECT SUM(product_uom_qty)
AS
"Month 3"
FROM sale_order_line c
WHERE
c.product_id = a.id
AND
MONTH(c.create_date) = MONTH(CURRENT_DATE - INTERVAL '3 Months')
AND
YEAR(c.create_date) = YEAR(CURRENT_DATE - INTERVAL '3 Months')
)
FROM
product_product a
This is what the DB looks like:
sale_order_line
product_id product_uom_qty create_date
33 230 2014-07-01 16:47:45.294313
product_product
id default_code
33 WHDXEB33
Here's the error I'm receiving:
ERROR: function month(timestamp without time zone) does not exist
LINE 21: MONTH(c.create_date) = MONTH(CURRENT_DATE - INTERVAL
Any help pointing me in the right direction?

Use date_trunc() to calculate timestamp bounds:
SELECT id, default_code
, (SELECT SUM(product_uom_qty)
FROM sale_order_line c
WHERE c.product_id = a.id
) AS "Total Sold"
, (SELECT SUM(product_uom_qty)
FROM sale_order_line c
WHERE c.product_id = a.id
AND c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
) AS "Month 3"
FROM product_product a;
To get December (now being February), use these expressions:
AND c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
date_trunc('month', now()) yields '2015-02-01 00:00', after subtracting 2 months, you get '2014-12-01 00:00'. So, "3 months" can be deceiving.
Also, be sure to use sargable expressions like demonstrated for faster performance and to allow index usage.
Alternatives
Depending on your actual DB design and data distribution, this may be faster:
SELECT a.id, a.default_code, c."Total Sold", c."Month 3"
FROM product_product a
LEFT JOIN (
SELECT product_id AS id
, SUM(product_uom_qty) AS "Total Sold"
, SUM(CASE WHEN c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
THEN product_uom_qty ELSE 0 END) AS "Month 3"
FROM sale_order_line
GROUP BY 1
) c USING (id);
Since you are selecting all rows, this is probably faster than correlated subqueries. While being at it, aggregate before you join, that's cheaper, yet.
When selecting a single or few products, this may actually be slower, though! Compare:
Aggregate a single column in query with many columns
Optimize GROUP BY query to retrieve latest record per user
Or with the FILTER clause in Postgres 9.4+:
...
, SUM(product_uom_qty)
FILTER (WHERE c.create_date >= date_trunc('month', now()) - interval '2 month'
AND c.create_date < date_trunc('month', now()) - interval '1 month'
) AS "Month 3"
...
Details:
Select multiple row values into single row with multi-table clauses

This will avoid the costly correlated subquery
select
pp.id, pp.default_code,
sum(sol.product_uom_qty) as "Total Sold",
sum((
date_trunc('month', pp.create_date) =
date_trunc('month', current_date) - interval '3 months'
)::int * sol.product_uom_qty
) as "Month 3"
from
product_product pp
left join
sale_order_line sol on pp.id = sol.product_id
group by 1, 2
The cast from boolean to integer results in 0 or 1 which is convenient to be multiplied by the value to be summed

Related

Filtering query with join is not returning the correct results

I am trying to filter my query that looks at payments data joining with another table (accounts table) as I want the data filtered by the condition accounts.provider = 'z'. However, the results I'm returned are exact multiples of the real figures (times 13, 20 etc) - different dates are a different multiple. The query is also really slow, so looking for advice to make it run quicker too.
SELECT
distinct on (t.day) t.day as day,
coalesce(collected_payments,0)
from
( SELECT day::date
FROM generate_series(timestamp '2017-03-13', current_date + interval '1 week', interval '1 day') day
) d
left JOIN (
SELECT date_trunc('day', t.payment_date)::date AS day,
sum(case when t.payment_amount > 0
and t.description not ilike '%credit%'
and t.state = 'success'
then t.payment_amount end) as collected_payments
FROM payments t
inner join payments p on p.payment_date = date_trunc('day', t.payment_date)::date
inner join accounts on accounts.id = p.account_id and accounts.provider = 'z'
where date_trunc('day', t.payment_date)::date <= current_date + interval '1 week'
and date_trunc('day', t.payment_date)::date >= current_date - interval'1 months'
GROUP BY 1
) t USING (day)
ORDER BY day desc

postgres sql query to convert group by result in multiple columns

I have two tables
financial_account having columns account_name
financial_transaction having columns transaction_date,transaction_type, transaction_amount
I need data as SUM(transaction_amount) where transaction_type='A' under column SUM_A and SUM(transaction_amount) under column SUM_B where transaction_type='B'
I took reference of this stackoverflow post , wrote query as below :
select fa.account_name ,
to_char(current_date - interval '1' month, 'Mon-YY') as "Previous Month",
SUM(case when ft.transaction_type='A' then ft.transaction_amount else 0 end) as "SUM_A",
SUM(case when ft.transaction_type='B' then ft.transaction_amount else 0 end) as "SUM_B"
from financial_transaction ft
join financial_account fa on fa.account_name = 'XYZ'
where ft.transaction_date >= date_trunc('month', now()) - interval '1 month' and
ft.transaction_date < date_trunc('month', now())
group by ft.transaction_type,fa.account_name
having ft.transaction_type in ('A','B')
However, this query is generating data in two rows
I needed data in single row format.
How can i get data in 1 one row format?
Basically, you want to remove ft.transaction_type from the group by clause, so all rows of the same account are grouped together.
Let me pinpoint that your query seems to be missing a join condition between the transactions and accounts.
I would write the query as:
select
fa.account_name,
trunc(current_date - interval '1' month) as previous_month,
sum(ft.transaction_amount) filter(where ft.transaction_type = 'A') as sum_a,
sum(ft.transaction_amount) filter(where ft.transaction_type = 'B') as sum_b
from financial_transaction ft
inner join financial_account fa on ??
where
fa.account_name = 'XYZ'
and ft.transaction_date >= date_trunc('month', current_date) - interval '1 month'
and ft.transaction_date < date_trunc('month', current_date)
and ft.transaction_type in ('A','B')
group by fa.account_name
Changes to your original code:
Fixed the group by clause
I represented that missing join condition as ??.
The condition on the transaction type should belong to the where clause rather than the having clause
The conditional sums can be simplified with the standard filter clause
Considering your query is working properly, you can write your query like below:
select
fa.account_name ,
to_char(current_date - interval '1' month, 'Mon-YY') as "Previous Month",
SUM(case when ft.transaction_type='A' then ft.transaction_amount else 0 end) as "SUM_A",
SUM(case when ft.transaction_type='B' then ft.transaction_amount else 0 end) as "SUM_B"
from financial_transaction ft
join financial_account fa on fa.account_name = 'XYZ'
where ft.transaction_date >= date_trunc('month', now()) - interval '1 month' and
ft.transaction_date < date_trunc('month', now()) and ft.transaction_type in ('A','B')
group by 1,2
You can write it like below also:
select
fa.account_name ,
to_char(current_date - interval '1' month, 'Mon-YY') as "Previous Month",
SUM(ft.transaction_amount) filter (where ft.transaction_type='A') as "SUM_A",
SUM(ft.transaction_amount) filter (where ft.transaction_type='B') as "SUM_B",
from financial_transaction ft
join financial_account fa on fa.account_name = 'XYZ'
where ft.transaction_date >= date_trunc('month', now()) - interval '1 month' and
ft.transaction_date < date_trunc('month', now()) and ft.transaction_type in ('A','B')
group by 1,2

How to save the results of a select expression as a variable?

In the following Postgresql sql, is there a way to save mo.delivered_at - mo.created_at as a variable so I don't have to repeat myself?
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '3 days' and mo.delivered_at - mo.created_at <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '6 days' and mo.delivered_at - mo.created_at <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivered_at - mo.created_at > interval '9 days') as ten_days,
min(mo.delivered_at - mo.created_at),
max(mo.delivered_at - mo.created_at),
percentile_disc(0.5) within group (order by mo.delivered_at - mo.created_at) as median,
avg(mo.delivered_at - mo.created_at) as average
FROM medication_order mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name
You can compute the derived value in a subquery or CTE as has been suggested.
But there is more. This should be faster (and correct). And can be sorted properly, too:
SELECT
to_char(mo.month,'MM-YYYY') AS month, -- optionally prettify
mo.sku,
s.condition_id, -- I added this to make the result unambiguous
(SELECT name FROM condition WHERE id = s.condition_id) AS condition_name,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE mo.my_interval < interval '3 days') AS three_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '3 days' AND mo.my_interval <= interval '6 days') AS six_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '6 days' AND mo.my_interval <= interval '9 days') AS nine_days,
COUNT(*) FILTER (WHERE mo.my_interval > interval '9 days') AS ten_days,
min(mo.my_interval),
max(mo.my_interval),
percentile_disc(0.5) WITHIN GROUP (ORDER BY mo.my_interval) AS median,
avg(mo.my_interval) AS average
FROM (
SELECT
date_trunc('month', mo.created_at) AS month, -- faster, keeps ORDER
delivered_at - created_at AS my_interval, -- your core request
sku_key AS sku
FROM medication_order mo
WHERE status = 'DELIVERED' -- filter early
AND payment_preference = 'INSURANCE'
AND created_at > '2020-01-01'
AND delivered_at IS NOT NULL
AND sku_key <> 'manual_order_sku'
) mo
LEFT JOIN subscription s ON s.id = mo.subscription_id
GROUP BY mo.month, mo.sku, s.condition_id -- GROUP BY unique ID! Correct - and cheaper, too
ORDER BY mo.month, mo.sku, s.condition_id; -- my addition: sorting by date works across years, 'MM-YYYY' does not
Aside: condition.name should probably be UNIQUE. And "name" is almost never a good name.
You could just compute the information in a subquery when selecting from the table:
SELECT
to_char(mo.created_at,'MM-YYYY') AS month,
mo.sku_key as sku,
c.name,
COUNT(*) as total,
COUNT(*) FILTER (WHERE mo.delivery_interval < interval '3 days') as three_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '3 days' and mo.delivery_interval <= interval '6 days') as six_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '6 days' and mo.delivery_interval <= interval '9 days') as nine_days,
COUNT(*) FILTER (WHERE mo.delivery_interval > interval '9 days') as ten_days,
min(mo.delivery_interval),
max(mo.delivery_interval),
percentile_disc(0.5) within group (order by mo.delivery_interval) as median,
avg(mo.delivery_interval) as average
FROM (
SELECT mo.*, mo.delivery_interval delivery_interval --> here
FROM medication_order
) mo
LEFT JOIN subscription s ON s.id=mo.subscription_id
LEFT JOIN condition c on s.condition_id = c.id
WHERE
mo.status = 'DELIVERED' AND
mo.payment_preference = 'INSURANCE' AND
mo.created_at > '2020-01-01' AND
mo.delivered_at IS NOT null AND
mo.sku_key != 'manual_order_sku'
GROUP BY month, mo.sku_key, c.name

Speed up query where results with count(*) = 0 are included

I have a table squitters with, amongst others, a column parsed_time. I want to know the number of records per hour for the last two days and used this query:
SELECT date_trunc('hour', parsed_time) AS hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC;
This works, but hours with zero records do not appear in the result. I want to have hours
with zero records also in the result with a count equal to zero, so I wrote this query using the generate_series function:
SELECT bins.hour, count(squitters.parsed_time)
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(hour)
LEFT OUTER JOIN squitters ON bins.hour = date_trunc('hours', squitters.parsed_time)
GROUP BY bins.hour
ORDER BY bins.hour DESC;
This works, in the results are hour-bins with counts equal to zero, but is considerably slower.
How can I have the speed of the first query with the count=zero results of the second query?
(btw. there is an index on parsed_time)
You could try and change the join condition so no date function is applied on column parsed_time:
SELECT b.hour, COUNT(s.parsed_time) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
LEFT OUTER JOIN squitters s
ON s.parsed_time >= b.hour
AND s.parsed_time < b.hours + interval '1 hour'
GROUP BY b.hour
ORDER BY b.hour DESC;
Alternatively, you could also try using a correlated subquery (or a lateral join) instead of a left join - this avoids the need for outer aggregation:
SELECT
b.hour,
(
SELECT COUNT(*)
FROM squitters s
WHERE s.parsed_time >= b.hour AND s.parsed_time < b.hours + interval '1 hour'
) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
ORDER BY b.hour desc
You could take advantage of Common Table Expressions to divide your problem into small chunks:
WITH cte AS (
--First query your table
SELECT date_trunc('hour', parsed_time) AS sq_hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC
), series AS (
--Create the series without the data returned from 1st query
SELECT
bins.series_hour,
0
FROM
generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(series_hour)
WHERE
series_hour not in (SELECT sq_hour FROM cte)
)
--Union the result
SELECT * FROM cte
UNION
SELECT * FROM series
ORDER BY 1

Postgres - Return 0 count for intervals with no data in date_trunc

I am trying to create a table that lists how many counts i have in 5 minute intervals over 10 days. I think my join is wrong since i am not getting the empty rows in my query.
select date_trunc('minute', activities.activitytime) -
(CAST(EXTRACT(MINUTE FROM activities.activitytime)
AS integer) % 5) * interval '1 minute' as day_column, count(activities.activityid)
from generate_series(current_date - interval '10 day', current_date, '1 minute') d
left join activities on date(activities.activitytime) = d
group by day_column
order by day_column;
You are close. But the key idea is that you need to use the columns from the generate_series() for the group by key:
select d.dte, count(a.activitytime)
from generate_series(current_date - interval '10 day', current_date, '5 minute') d(dte) left join
activities a
on a.activitytime >= d.dte and a.activitytime < d.dte + interval '5 minute'
group by d.dte
order by d.dte;