How to get Postgres to return 0 for empty rows - sql

I have a query which get data summarised between two dates like so:
SELECT date(created_at),
COUNT(COALESCE(id, 0)) AS total_orders,
SUM(COALESCE(total_price, 0)) AS total_price,
SUM(COALESCE(taxes, 0)) AS taxes,
SUM(COALESCE(shipping, 0)) AS shipping,
AVG(COALESCE(total_price, 0)) AS average_order_value,
SUM(COALESCE(total_discount, 0)) AS total_discount,
SUM(total_price - COALESCE(taxes, 0) - COALESCE(shipping, 0) - COALESCE(total_discount, 0)) as net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
order by created_at::date desc
However for dates that do not have any orders, the query returns nothing and I'd like to return 0.
I have tried with COALESCE but that doesn't seem to do the trick?
Any suggestions?

This should be substantially faster - and correct:
SELECT *
, total_price - taxes - shipping - total_discount AS net_sales -- ⑤
FROM (
SELECT created_at
, COALESCE(total_orders , 0) AS total_orders
, COALESCE(total_price , 0) AS total_price
, COALESCE(taxes , 0) AS taxes
, COALESCE(shipping , 0) AS shipping
, COALESCE(average_order_value , 0) AS average_order_value
, COALESCE(total_discount , 0) AS total_discount
FROM generate_series(timestamp '2022-07-20' -- ①
, timestamp '2022-07-26'
, interval '1 day') AS g(created_at)
LEFT JOIN ( -- ③
SELECT created_at::date
, count(*) AS total_orders -- ⑥
, sum(total_price) AS total_price
, sum(taxes) AS taxes
, sum(shipping) AS shipping
, avg(total_price) AS average_order_value
, sum(total_discount) AS total_discount
FROM orders
WHERE shop_id = 43
AND active -- simpler
AND created_at >= '2022-07-20'
AND created_at < '2022-07-27' -- ② !
GROUP BY 1
) o USING (created_at) -- ④
) sub
ORDER BY created_at DESC;
db<>fiddle here
I copied, simplified, and extended Xu's fiddle for comparison.
① Why this particular form for generate_series()? See:
Generating time series between two dates in PostgreSQL
② Assuming created_at is data type timestamp your original formulation is most probably incorrect. created_at <= '2022-07-26' would include the first instant of '2022-07-26' and exclude the rest. To include all of '2022-07-26', use created_at < '2022-07-27'. See:
How do I write a function in plpgsql that compares a date with a timestamp without time zone?
③ The LEFT JOIN is the core feature of this answer. Generate all days with generate_series(), independently aggregate days from table orders, then LEFT JOIN to retain one row per day like you requested.
④ I made the column name match created_at, so we can conveniently shorten the join syntax with the USING clause.
⑤ Compute net_sales in an outer SELECT after replacing NULL values, so we need COALESCE() only once.
⑥ count(*) is equivalent to COUNT(COALESCE(id, 0)) in any case, but cheaper. See:
Optimizing GROUP BY + COUNT DISTINCT on unnested jsonb column
PostgreSQL: running count of rows for a query 'by minute'

Please refer to the below script.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT created_at
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
There is one sample for your reference.
Sample
I got your duplicate test cases at my side. The root cause is created_at field (datattype:timestamp), hence there are duplicate lines.
Below script is correct for your request.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT date (created_at)
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
Here is a sample that's same with your side. Link

You can use WITH RECURSIVE to build a table of dates and then select dates that are not in your table
WITH RECURSIVE t(d) AS (
(SELECT '2015-01-01'::date)
UNION ALL
(SELECT d + 1 FROM t WHERE d + 1 <= '2015-01-10')
) SELECT d FROM t WHERE d NOT IN (SELECT d_date FROM tbl);
[look on this post : ][1]
[1]: https://stackoverflow.com/questions/28583379/find-missing-dates-postgresql#:~:text=You%20can%20use%20WITH%20RECURSIVE,SELECT%20d_date%20FROM%20tbl)%3B

Related

PostgreSQL showing different time periods in a single query

I have a query that will return the ratio of issuances from (issuances from specific network with specific time period / total issuances). so the issuances from specific network with a specific time period divided to total issuances from all networks. Right now it returns the ratios of issuances only from last year (year-to-date I mean), I want to include several time periods in it such as one month ago, 2 month ago etc. LEFT JOIN usually works but I couldn't figure it out for this one. How do I do it?
Here is the query:
SELECT IR1.network,
count(*) / ((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at >= date_trunc('year',current_date)) * 1.) as issuance_ratio_ytd
FROM issuances_extended as IR1 WHERE status = 'completed' and
(issued_at >= date_trunc('year',current_date))
GROUP BY
IR1.network
order by IR1.network
I would break your query into CTEs something like this:
with periods (period_name, period_range) as (
values
('YTD', daterange(date_trunc('year', current_date), null)),
('LY', daterange(date_trunc('year', current_date - 'interval 1 year'),
date_trunc('year', current_date))),
('MTD', daterange(date_trunc('month', current_date - 'interval 1 month'),
date_trunc('month', current_date));
-- Add whatever other intervals you want to see
), period_totals as ( -- Get period totals
select p.period_name, p.period_range, count(*) as total_issuances
from periods p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
)
select p.period_name, p.period_range,
i.network, count(*) as network_issuances,
1.0 * count(*) / p.total_issuances as issuance_ratio
from period_totals p
join issuances_extended i
on i.status = 'completed'
and i.issued_at <# p.period_range
group by p.period_name, p.period_range, i.network, p.total_issuances;
The problem with this is that you get rows instead of columns, but you can use a spreadsheet program or reporting tool to pivot if you need to. This method simplifies the calculations and lets you add whatever period ranges you want by adding more values to the periods CTE.
Something like this? Obviously not tested
SELECT
IR1.network,
count(*)/((select count(*) FROM issuances_extended
where status = 'completed' and
issued_at between mon.t and current_date ) * 1.) as issuance_ratio_ytd
FROM
issuances_extended as IR1 ,
(
SELECT
generate_series('2022-01-01'::date,
'2022-07-01'::date, '1 month') AS t)
AS mon
WHERE
status = 'completed' and
(issued_at between mon.t and current_date)
GROUP BY
IR1.network
ORDER BY
IR1.network
I've managed to join these tables, so I am answering my question for those who would need some help. To add more tables all you have to do is put new queries in LEFT JOIN and acknowledge them in the base query (IR3, IR4, blabla etc.)
SELECT
IR1.network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('year', current_date)
) * 1./ 100
) as issuances_ratio_ytd,
max(coalesce(IR2.issuances_ratio_m0, 0)) as issuances_ratio_m0
FROM
issuances_extended as IR1
LEFT JOIN (
SELECT
network,
count(*) / (
(
select
count(*)
FROM
issuances_extended
where
status = 'completed'
and issued_at >= date_trunc('month', current_date)
) * 1./ 100
) as issuances_ratio_m0
FROM
issuances_extended
WHERE
status = 'completed'
and (issued_at >= date_trunc('month', current_date))
GROUP BY
network
) AS IR2 ON IR1.network = IR2.network
WHERE
status = 'completed'
and (issued_at >= date_trunc('year', current_date))
GROUP BY
IR1.network,
IR2.issuances_ratio_m0
order by
IR1.network

How to subtract two timestamps in SQL and then count?

I want to basically find out how many users paid within 15 mins, 30 mins and 60 mins of my payment_time and trigger_time
I have the following query
with redshift_direct() as conn:
trigger_time_1 = pd.read_sql(f"""
with new_data as
(
select
cycle_end_date
, prime_tagging_by_issuer_and_product
, u.user_id
, settled_status
, delay,
ots_created_at + interval '5:30 hours' as payment_time
,case when to_char(cycle_end_date,'DD') = '15' then 'Odd' else 'Even' end as cycle_order
from
settlement_summary_from_snapshot s
left join (select distinct user_phone_number, user_id from user_events where event_name = 'UserCreatedEvent') u
on u.user_id = s.user_id
and cycle_type = 'end_cycle'
and cycle_end_date > '2021-11-30' and cycle_end_date < '2022-01-15'
)
select
bucket_id
, cycle_end_date, d.cycle_order
, date(cycle_end_date) as t_cycle_end_date
,d.prime_tagging_by_issuer_and_product
,source
,status as cause
,split_part(campaign_name ,'|', 1) as campaign
,split_part(campaign_name ,'|', 2) as sms_cycle_end_date
,split_part(campaign_name ,'|', 3) as day
,split_part(campaign_name ,'|', 4) as type
,to_char(to_date(split_part(campaign_name ,'|', 2) , 'DD/MM/YYYY'), 'YYYY-MM-DD') as campaign_date,
d.payment_time, payload_event_timestamp + interval '5:30 hours' as trigger_time
,count( s.user_id) as count
from sms_callback_events s
inner join new_data d
on s.user_id = d.user_id
where bucket_id > 'date_2021_11_30' and bucket_id < 'date_2022_01_15'
and campaign_name like '%RC%'
and event_name = 'SmsStatusUpdatedEvent'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14
""",conn)
How do i achieve making 3 columns with number of users who paid within 15mins, 30 mins and 60 mins after trigger_time in this query? I was doing it with Pandas but I want to find a way to do it here itself. Can someone help?
I wrote my own DATEDIFF function, which returns an integer value of differencing between two dates, difference by day, by month, by year, by hour, by minute and etc. You can use this function on your queries.
DATEDIFF Function SQL Code on GitHub
Sample Query about using our DATEDIFF function:
select
datediff('minute', mm.start_date, mm.end_date) as diff_minute
from
(
select
'2022-02-24 09:00:00.100'::timestamp as start_date,
'2022-02-24 09:15:21.359'::timestamp as end_date
) mm;
Result:
---------------
diff_minute
---------------
15
---------------

Divide results from two query by another query in SQL

I have this query in Metabase:
with l1 as (SELECT date_trunc ('day', Ticket_Escalated_At) as time_scale, count (Ticket_ID) as chat_per_day
FROM CHAT_TICKETS where SUPPORT_QUEUE = 'transfer_investigations'
and date_trunc('month', TICKET_ESCALATED_AT) > now() - interval '6' Month
GROUP by 1)
with l2 as (SELECT date_trunc('day', created_date) as week, count(*) as TI_watchman_ticket
FROM jira_issues
WHERE issue_type NOT IN ('Transfer - General', 'TI - Advanced')
and date_trunc('month', created_date) > now() - interval '6' Month
and project_key = 'TI2'
GROUP BY 1)
SELECT l1.* from l1
UNION SELECT l2.* from l2
ORDER by 1
and this one:
with hours as (SELECT date_trunc('day', ws.start_time) as date_
,(ifnull(sum((case when ws.shift_position = 'TI - Non-watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) + ifnull(sum((case when ws.shift_position = 'TI - Watchman' then (minutes_between(ws.end_time, ws.start_time)/60) end)),0) ) as total
from chat_agents a
join wiw_shifts ws on a.email = ws.user_email
left join people_ops.employees h on substr(h.email,1, instr(h.email,'#revolut') - 1) = a.login
where (seniority != 'Lead' or seniority is null)
and date_trunc('month', ws.start_time) > now() - interval '6' Month
GROUP BY 1)
I would like to divide the output of the UNION of the first one, by the result of the second one, any ideas.

Postgresql - can you do this without a CTE?

I wanted to get the number of orders and money spent by customers in the first 7 days from their initial order. I managed to do it with a common table expression, but was curious to see if someone could point out to an obvious update to the main query's WHERE or HAVING section, or perhaps a subquery.
--This is a temp table to use in the main query
WITH first_seven AS
(
select min(o.created_at), min(o.created_at) + INTERVAL '7 day' as max_order_date, o.user_id
from orders o
where o.total_price > 0 and o.status = 30
group by o.user_id
having min(o.created_at) > '2015-09-01'
)
--This is the main query, find orders in first 7 days of purchasing
SELECT sum(o.total_price) as sales, count(distinct o.objectid) as orders, o.user_id, min(o.created_at) as first_order
from orders o, first_seven f7
where o.user_id = f7.user_id and o.created_at < f7.max_order_date and o.total_price > 0 and o.status = 30
group by o.user_id
having min(o.created_at) > '2015-09-01'
You can do this without the join by using window functions:
select sum(o.total_price) as sales, count(distinct o.objectid) as orders,
o.user_id, min(o.created_at) as first_order
from (select o.*,
min(o.created_at) over (partition by user_id) as startdate
from orders o
where o.total_price > 0 and o.status = 30
) o
where startdate > '2015-09-01' and
created_at <= startdate + INTERVAL '7 day';
A more complicated query (with the right indexes) is probably more efficient:
select sum(o.total_price) as sales, count(distinct o.objectid) as orders,
o.user_id, min(o.created_at) as first_order
from (select o.*,
min(o.created_at) over (partition by user_id) as startdate
from orders o
where o.total_price > 0 and o.status = 30 and
not exists (select 1 from orders o2 where o2.user_id = o.user_id and created_at <= '2015-09-01')
) o
where startdate > '2015-09-01' and
created_at <= startdate + INTERVAL '7 day';
This filters out older customers before the windows calculation, which should make it more efficient. Indexes that are useful are orders(user_id, created_at) and orders(status, total_price).

How to get the discount number of customers in prior period?

I have a requirement where I supposed to roll customer data in the prior period of 365 days.
Table:
CREATE TABLE orders (
persistent_key_str character varying,
ord_id character varying(50),
ord_submitted_date date,
item_sku_id character varying(50),
item_extended_actual_price_amt numeric(18,2)
);
Sample data:
INSERT INTO orders VALUES
('01120736182','ORD6266073' ,'2010-12-08','100856-01',39.90),
('01120736182','ORD33997609' ,'2011-11-23','100265-01',49.99),
('01120736182','ORD33997609' ,'2011-11-23','200020-01',29.99),
('01120736182','ORD33997609' ,'2011-11-23','100817-01',44.99),
('01120736182','ORD89267964' ,'2012-12-05','200251-01',79.99),
('01120736182','ORD89267964' ,'2012-12-05','200269-01',59.99),
('01011679971','ORD89332495' ,'2012-12-05','200102-01',169.99),
('01120736182','ORD89267964' ,'2012-12-05','100907-01',89.99),
('01120736182','ORD89267964' ,'2012-12-05','200840-01',129.99),
('01120736182','ORD125155068','2013-07-27','201443-01',199.99),
('01120736182','ORD167230815','2014-06-05','200141-01',59.99),
('01011679971','ORD174927624','2014-08-16','201395-01',89.99),
('01000217334','ORD92524479' ,'2012-12-20','200021-01',29.99),
('01000217334','ORD95698491' ,'2013-01-08','200021-01',19.99),
('01000217334','ORD90683621' ,'2012-12-12','200021-01',29.990),
('01000217334','ORD92524479' ,'2012-12-20','200560-01',29.99),
('01000217334','ORD145035525','2013-12-09','200972-01',49.99),
('01000217334','ORD145035525','2013-12-09','100436-01',39.99),
('01000217334','ORD90683374' ,'2012-12-12','200284-01',39.99),
('01000217334','ORD139437285','2013-11-07','201794-01',134.99),
('01000827006','W02238550001','2010-06-11','HL 101077',349.000),
('01000827006','W01738200001','2009-12-10','EL 100310 BLK',119.96),
('01000954259','P00444170001','2009-12-03','PC 100455 BRN',389.99),
('01002319116','W02242430001','2010-06-12','TR 100966',35.99),
('01002319116','W02242430002','2010-06-12','EL 100985',99.99),
('01002319116','P00532470001','2010-05-04','HO 100482',49.99);
Using the query below I am trying to get the number of distinct customers by order_submitted_date:
select
g.order_date as "Ordered",
count(distinct o.persistent_key_str) as "customers"
from
generate_series(
(select min(ord_submitted_date) from orders),
(select max(ord_submitted_date) from orders),
'1 day'
) g (order_date)
left join
orders o on o.ord_submitted_date between g.order_date - interval '364 days'
and g.order_date
WHERE extract(year from ord_submitted_date) <= 2009
group by 1
order by 1
This is the output I expected.
Ordered Customers
2009-12-03 1
2009-12-10 1
When I execute the query above I get incorrect results.
How can I make this right?
To get your expected output ("the number of distinct customers") - only days with actual orders 2009:
SELECT ord_submitted_date, count(DISTINCT persistent_key_str) AS customers
FROM orders
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
Formulate the WHERE conditions this way to make the query sargable, and input easy.
If you want one row per day (from the earliest entry up to the latest in orders) - within 2009:
SELECT ord_submitted_date AS ordered
, count(DISTINCT o.persistent_key_str) AS customers
FROM (SELECT generate_series(min(ord_submitted_date) -- single query ...
, max(ord_submitted_date) -- ... to get min / max
, '1d')::date FROM orders) g (ord_submitted_date)
LEFT join orders o USING (ord_submitted_date)
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
SQL Fiddle.
Distinct customers per year
SELECT extract(year from ord_submitted_date) AS year
, count(DISTINCT persistent_key_str) AS customers
FROM orders
GROUP BY 1
ORDER BY 1;
SQL Fiddle.