Postgresql - can you do this without a CTE? - sql

I wanted to get the number of orders and money spent by customers in the first 7 days from their initial order. I managed to do it with a common table expression, but was curious to see if someone could point out to an obvious update to the main query's WHERE or HAVING section, or perhaps a subquery.
--This is a temp table to use in the main query
WITH first_seven AS
(
select min(o.created_at), min(o.created_at) + INTERVAL '7 day' as max_order_date, o.user_id
from orders o
where o.total_price > 0 and o.status = 30
group by o.user_id
having min(o.created_at) > '2015-09-01'
)
--This is the main query, find orders in first 7 days of purchasing
SELECT sum(o.total_price) as sales, count(distinct o.objectid) as orders, o.user_id, min(o.created_at) as first_order
from orders o, first_seven f7
where o.user_id = f7.user_id and o.created_at < f7.max_order_date and o.total_price > 0 and o.status = 30
group by o.user_id
having min(o.created_at) > '2015-09-01'

You can do this without the join by using window functions:
select sum(o.total_price) as sales, count(distinct o.objectid) as orders,
o.user_id, min(o.created_at) as first_order
from (select o.*,
min(o.created_at) over (partition by user_id) as startdate
from orders o
where o.total_price > 0 and o.status = 30
) o
where startdate > '2015-09-01' and
created_at <= startdate + INTERVAL '7 day';
A more complicated query (with the right indexes) is probably more efficient:
select sum(o.total_price) as sales, count(distinct o.objectid) as orders,
o.user_id, min(o.created_at) as first_order
from (select o.*,
min(o.created_at) over (partition by user_id) as startdate
from orders o
where o.total_price > 0 and o.status = 30 and
not exists (select 1 from orders o2 where o2.user_id = o.user_id and created_at <= '2015-09-01')
) o
where startdate > '2015-09-01' and
created_at <= startdate + INTERVAL '7 day';
This filters out older customers before the windows calculation, which should make it more efficient. Indexes that are useful are orders(user_id, created_at) and orders(status, total_price).

Related

How to get Postgres to return 0 for empty rows

I have a query which get data summarised between two dates like so:
SELECT date(created_at),
COUNT(COALESCE(id, 0)) AS total_orders,
SUM(COALESCE(total_price, 0)) AS total_price,
SUM(COALESCE(taxes, 0)) AS taxes,
SUM(COALESCE(shipping, 0)) AS shipping,
AVG(COALESCE(total_price, 0)) AS average_order_value,
SUM(COALESCE(total_discount, 0)) AS total_discount,
SUM(total_price - COALESCE(taxes, 0) - COALESCE(shipping, 0) - COALESCE(total_discount, 0)) as net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
order by created_at::date desc
However for dates that do not have any orders, the query returns nothing and I'd like to return 0.
I have tried with COALESCE but that doesn't seem to do the trick?
Any suggestions?
This should be substantially faster - and correct:
SELECT *
, total_price - taxes - shipping - total_discount AS net_sales -- ⑤
FROM (
SELECT created_at
, COALESCE(total_orders , 0) AS total_orders
, COALESCE(total_price , 0) AS total_price
, COALESCE(taxes , 0) AS taxes
, COALESCE(shipping , 0) AS shipping
, COALESCE(average_order_value , 0) AS average_order_value
, COALESCE(total_discount , 0) AS total_discount
FROM generate_series(timestamp '2022-07-20' -- ①
, timestamp '2022-07-26'
, interval '1 day') AS g(created_at)
LEFT JOIN ( -- ③
SELECT created_at::date
, count(*) AS total_orders -- ⑥
, sum(total_price) AS total_price
, sum(taxes) AS taxes
, sum(shipping) AS shipping
, avg(total_price) AS average_order_value
, sum(total_discount) AS total_discount
FROM orders
WHERE shop_id = 43
AND active -- simpler
AND created_at >= '2022-07-20'
AND created_at < '2022-07-27' -- ② !
GROUP BY 1
) o USING (created_at) -- ④
) sub
ORDER BY created_at DESC;
db<>fiddle here
I copied, simplified, and extended Xu's fiddle for comparison.
① Why this particular form for generate_series()? See:
Generating time series between two dates in PostgreSQL
② Assuming created_at is data type timestamp your original formulation is most probably incorrect. created_at <= '2022-07-26' would include the first instant of '2022-07-26' and exclude the rest. To include all of '2022-07-26', use created_at < '2022-07-27'. See:
How do I write a function in plpgsql that compares a date with a timestamp without time zone?
③ The LEFT JOIN is the core feature of this answer. Generate all days with generate_series(), independently aggregate days from table orders, then LEFT JOIN to retain one row per day like you requested.
④ I made the column name match created_at, so we can conveniently shorten the join syntax with the USING clause.
⑤ Compute net_sales in an outer SELECT after replacing NULL values, so we need COALESCE() only once.
⑥ count(*) is equivalent to COUNT(COALESCE(id, 0)) in any case, but cheaper. See:
Optimizing GROUP BY + COUNT DISTINCT on unnested jsonb column
PostgreSQL: running count of rows for a query 'by minute'
Please refer to the below script.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT created_at
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
There is one sample for your reference.
Sample
I got your duplicate test cases at my side. The root cause is created_at field (datattype:timestamp), hence there are duplicate lines.
Below script is correct for your request.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT date (created_at)
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
Here is a sample that's same with your side. Link
You can use WITH RECURSIVE to build a table of dates and then select dates that are not in your table
WITH RECURSIVE t(d) AS (
(SELECT '2015-01-01'::date)
UNION ALL
(SELECT d + 1 FROM t WHERE d + 1 <= '2015-01-10')
) SELECT d FROM t WHERE d NOT IN (SELECT d_date FROM tbl);
[look on this post : ][1]
[1]: https://stackoverflow.com/questions/28583379/find-missing-dates-postgresql#:~:text=You%20can%20use%20WITH%20RECURSIVE,SELECT%20d_date%20FROM%20tbl)%3B

How ovoid duplicate code in sql query subQuery

I'm new in SQL. Need some help to improve my query to ovoid duplicate code.
SELECT customers.name, orders.price
FROM customers
JOIN orders ON orders.id = customers.order_id
WHERE customers.order_id IN (
SELECT orders.id
FROM orders
WHERE orders.price = (
SELECT orders.price
FROM orders
WHERE orders.order_date BETWEEN
(SELECT MIN(orders.order_date) FROM orders)
AND
(SELECT DATE_ADD(MIN(orders.order_date), INTERVAL 10 year)FROM orders)
ORDER BY orders.price DESC LIMIT 1
)
AND orders.order_date BETWEEN
(SELECT MIN(orders.order_date) FROM orders)
AND
(SELECT DATE_ADD(MIN(orders.order_date), INTERVAL 10 year)FROM orders)
)
I would like ovoid duplicate code here
SELECT MIN(orders.order_date) FROM orders
and
SELECT DATE_ADD(MIN(orders.order_date), INTERVAL 10 year)FROM orders
You can use WITH to get first 10 years orders. By defitinion there exists no orders with the date < min(date), so you needn't between, just <= .
firstOrders as (
SELECT *
FROM orders
WHERE order_date <=
(SELECT DATE_ADD(MIN(o.order_date), INTERVAL 10 year)
FROM orders o)
)
SELECT customers.name, orders.price
FROM customers
JOIN FirsrOrders orders ON orders.id = customers.order_id
AND orders.price = (
select price
from firstOrders
order py price desc
limit 1
)
You want orders from the first ten years where the price was equal to the maximum price among those orders. So rank by price and grab those holding the #1 spot.
with data as (
select *,
date_add(min(order_date) over (), interval 10 year) as max_date,
rank() over (order by price desc) as price_rank
from orders
)
select *
from data
where order_date <= max_date and price_rank = 1;

Add running or cumulative total

I have below query which gives me expected results:
SELECT
total_orders,
quantity,
available_store_credits
FROM
(
SELECT
COUNT(orders.id) as total_orders,
date_trunc('year', confirmed_at) as year,
date_trunc('month', confirmed_at) as month,
SUM( quantity ) as quantity,
FROM
orders
INNER JOIN (
SELECT
orders.id,
sum(quantity) as quantity
FROM
orders
INNER JOIN line_items ON line_items.order_id = orders.id
WHERE
orders.deleted_at IS NULL
AND orders.status IN (
'paid', 'packed', 'in_transit', 'delivered'
)
GROUP BY
orders.id
) as order_quantity
ON order_quantity.id = orders.id
GROUP BY month, year) as orders_transactions
FULL OUTER JOIN
(
SELECT
date_trunc('year', created_at) as year,
date_trunc('month', created_at) as month,
SUM( ROUND( ( CASE WHEN amount_in_cents > 0 THEN amount_in_cents end) / 100, 2 )) AS store_credit_given,
SUM( ROUND( amount_in_cents / 100, 2 )) AS available_store_credits
FROM
store_credit_transactions
GROUP BY month, year
) as store_credit_results
ON orders_transactions.month = store_credit_results.month
I want to add one more column beside available_store_credits which will calculate running total of available_store_credits.
These are my trials, but none are working:
Attempt #1
SELECT
total_orders,
quantity,
available_store_credits,
cum_amt
FROM
(
SELECT
COUNT(orders.id) as total_orders,
date_trunc('year', confirmed_at) as year,
date_trunc('month', confirmed_at) as month,
SUM( quantity ) as quantity,
FROM
orders
INNER JOIN (
SELECT
orders.id,
sum(quantity) as quantity
FROM
orders
INNER JOIN line_items ON line_items.order_id = orders.id
WHERE
orders.deleted_at IS NULL
AND orders.status IN (
'paid', 'packed', 'in_transit', 'delivered'
)
GROUP BY
orders.id
) as order_quantity
ON order_quantity.id = orders.id
GROUP BY month, year) as orders_transactions
FULL OUTER JOIN
(
SELECT
date_trunc('year', created_at) as year,
date_trunc('month', created_at) as month,
SUM( ROUND( ( CASE WHEN amount_in_cents > 0 THEN amount_in_cents end) / 100, 2 )) AS store_credit_given,
SUM( ROUND( amount_in_cents / 100, 2 )) AS available_store_credits
SUM( amount_in_cents ) OVER (ORDER BY date_trunc('month', created_at), date_trunc('year', created_at)) AS cum_amt
FROM
store_credit_transactions
GROUP BY month, year
) as store_credit_results
ON orders_transactions.month = store_credit_results.month
Attempt #2
SELECT
total_orders,
quantity,
available_store_credits,
running_tot
FROM
(
SELECT
COUNT(orders.id) as total_orders,
date_trunc('year', confirmed_at) as year,
date_trunc('month', confirmed_at) as month,
FROM
orders
INNER JOIN (
SELECT
orders.id,
sum(quantity) as quantity
FROM
orders
INNER JOIN line_items ON line_items.order_id = orders.id
WHERE
orders.deleted_at IS NULL
AND orders.status IN (
'paid', 'packed', 'in_transit', 'delivered'
)
GROUP BY
orders.id
) as order_quantity
ON order_quantity.id = orders.id
GROUP BY month, year) as orders_transactions
FULL OUTER JOIN
(
SELECT
date_trunc('year', created_at) as year,
date_trunc('month', created_at) as month,
SUM( ROUND( amount_in_cents / 100, 2 )) AS available_store_credits,
SUM (available_store_creds) as running_tot
FROM
store_credit_transactions
INNER JOIN (
SELECT t0.id,
(
SELECT SUM( ROUND( amount_in_cents / 100, 2 )) as running_total
FROM store_credit_transactions as t1
WHERE date_trunc('month', t1.created_at) <= date_trunc('month', t0.created_at)
) AS available_store_creds
FROM store_credit_transactions AS t0
) as results
ON results.id = store_credit_transactions.id
GROUP BY month, year
) as store_credit_results
ON orders_transactions.month = store_credit_results.month
Making some assumptions about the undisclosed table definition and Postgres version (assuming current Postgres 14), this should do it:
SELECT total_orders, quantity, available_store_credits
, sum(available_store_credits) OVER (ORDER BY month) AS cum_amt -- HERE!!
FROM (
SELECT date_trunc('month', confirmed_at) AS month
, count(*) AS total_orders
, sum(quantity) AS quantity
FROM (
SELECT o.id, o.confirmed_at, sum(quantity) AS quantity
FROM orders o
JOIN line_items l ON l.order_id = o.id
WHERE o.deleted_at IS NULL
AND o.status IN ('paid', 'packed', 'in_transit', 'delivered')
GROUP BY 1
) o
GROUP BY 1
) orders_transactions
FULL JOIN (
SELECT date_trunc('month', created_at) AS month
, round(sum(amount_in_cents) FILTER (WHERE amount_in_cents > 0) / 100, 2) AS store_credit_given
, round(sum(amount_in_cents) / 100, 2) AS available_store_credits
FROM store_credit_transactions
GROUP BY 1
) store_credit_results USING (month)
Assuming you want the running sum to show up in every row and order of the date.
First, I simplified and removed some cruft:
date_trunc('year', confirmed_at) as year, was 100 % redundant noise in your query. I removed it.
As was another join to orders. Removed that, too. Assuming orders.id is defined as PK, we can further simplify. See:
PostgreSQL - GROUP BY clause
Use the superior aggregate FILTER. See:
Aggregate columns with additional (distinct) filters
Simplified a couple of other minor bits.

Filtering data set on first where clause then filtering left over data set on 2nd where clause

I have a table with order item and Order date.
I want to pull out all records which have order date in year 2020,But if the item does not have any order in 2020 then I want to pull out all records for that order
So If a item has a order in 2020 then only 2020 records
if it dose not have a order in 2020 then all other records
OrderItem Order Date
----------------------
A 4/21/2020
A 7/22/2020
A 5/15/2019
B 2/20/2019
Expected Output
OrderItem Order Date
----------------------
A 4/21/2020
A 7/22/2020
B 2/20/2019
You can use not exists:
select o.*
from orders o
where (o.orderdate >= '2020-01-01' and o.orderdate < '2021-01-01') or
not exists (select 1
from orders o2
where o2.orderitem = o.orderitem and
o2.orderdate >= '2020-01-01' and
o2.orderdate < '2021-01-01'
);
If "orders" is really a complicated query, you might find window functions are a better choice:
select o.*
from (select o.*,
count(case when o.orderdate >= '2020-01-01' and o.orderdate < '2021-01-01' then 1 else 0 end) over (partition by orderitem) as cnt_2020
from orders o
) o
where cnt_2020 = 0 or
o.orderdate >= '2020-01-01' and o.orderdate < '2021-01-01';
This only references the "complex query" once.

How to get the discount number of customers in prior period?

I have a requirement where I supposed to roll customer data in the prior period of 365 days.
Table:
CREATE TABLE orders (
persistent_key_str character varying,
ord_id character varying(50),
ord_submitted_date date,
item_sku_id character varying(50),
item_extended_actual_price_amt numeric(18,2)
);
Sample data:
INSERT INTO orders VALUES
('01120736182','ORD6266073' ,'2010-12-08','100856-01',39.90),
('01120736182','ORD33997609' ,'2011-11-23','100265-01',49.99),
('01120736182','ORD33997609' ,'2011-11-23','200020-01',29.99),
('01120736182','ORD33997609' ,'2011-11-23','100817-01',44.99),
('01120736182','ORD89267964' ,'2012-12-05','200251-01',79.99),
('01120736182','ORD89267964' ,'2012-12-05','200269-01',59.99),
('01011679971','ORD89332495' ,'2012-12-05','200102-01',169.99),
('01120736182','ORD89267964' ,'2012-12-05','100907-01',89.99),
('01120736182','ORD89267964' ,'2012-12-05','200840-01',129.99),
('01120736182','ORD125155068','2013-07-27','201443-01',199.99),
('01120736182','ORD167230815','2014-06-05','200141-01',59.99),
('01011679971','ORD174927624','2014-08-16','201395-01',89.99),
('01000217334','ORD92524479' ,'2012-12-20','200021-01',29.99),
('01000217334','ORD95698491' ,'2013-01-08','200021-01',19.99),
('01000217334','ORD90683621' ,'2012-12-12','200021-01',29.990),
('01000217334','ORD92524479' ,'2012-12-20','200560-01',29.99),
('01000217334','ORD145035525','2013-12-09','200972-01',49.99),
('01000217334','ORD145035525','2013-12-09','100436-01',39.99),
('01000217334','ORD90683374' ,'2012-12-12','200284-01',39.99),
('01000217334','ORD139437285','2013-11-07','201794-01',134.99),
('01000827006','W02238550001','2010-06-11','HL 101077',349.000),
('01000827006','W01738200001','2009-12-10','EL 100310 BLK',119.96),
('01000954259','P00444170001','2009-12-03','PC 100455 BRN',389.99),
('01002319116','W02242430001','2010-06-12','TR 100966',35.99),
('01002319116','W02242430002','2010-06-12','EL 100985',99.99),
('01002319116','P00532470001','2010-05-04','HO 100482',49.99);
Using the query below I am trying to get the number of distinct customers by order_submitted_date:
select
g.order_date as "Ordered",
count(distinct o.persistent_key_str) as "customers"
from
generate_series(
(select min(ord_submitted_date) from orders),
(select max(ord_submitted_date) from orders),
'1 day'
) g (order_date)
left join
orders o on o.ord_submitted_date between g.order_date - interval '364 days'
and g.order_date
WHERE extract(year from ord_submitted_date) <= 2009
group by 1
order by 1
This is the output I expected.
Ordered Customers
2009-12-03 1
2009-12-10 1
When I execute the query above I get incorrect results.
How can I make this right?
To get your expected output ("the number of distinct customers") - only days with actual orders 2009:
SELECT ord_submitted_date, count(DISTINCT persistent_key_str) AS customers
FROM orders
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
Formulate the WHERE conditions this way to make the query sargable, and input easy.
If you want one row per day (from the earliest entry up to the latest in orders) - within 2009:
SELECT ord_submitted_date AS ordered
, count(DISTINCT o.persistent_key_str) AS customers
FROM (SELECT generate_series(min(ord_submitted_date) -- single query ...
, max(ord_submitted_date) -- ... to get min / max
, '1d')::date FROM orders) g (ord_submitted_date)
LEFT join orders o USING (ord_submitted_date)
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
SQL Fiddle.
Distinct customers per year
SELECT extract(year from ord_submitted_date) AS year
, count(DISTINCT persistent_key_str) AS customers
FROM orders
GROUP BY 1
ORDER BY 1;
SQL Fiddle.