SQL | How to sum over partition group of 3 items? - sql

I'm trying to get the percent of a day's revenue for top 3 product categories but struggling with the percentage. I have already the revenue per product ranked 1 to 3 but cant wrap my head on how to get the percentage.
Any pointers will be appreciated.
SELECT * FROM (
SELECT date,
category_name,
revenue,
row_number() OVER(PARTITION BY DATE(date) ORDER BY revenue DESC) AS category_rank,
(revenue / (select sum(revenue) from a group by 1)) * 100 percentage AS percentage_of_daily_total -- this is the wrong one
FROM (
SELECT DATE(orders.order_timestamp) AS date,
products.product_cat AS category_name,
ROUND(SUM(payments.payment)) AS revenue
FROM table1.orders orders
JOIN table1.t_payments payments ON orders.order_id = payments.order_id
JOIN table1.t_items items ON orders.order_id = items.order_id
JOIN table1.t_products products ON items.product_id = products.product_id
GROUP BY 1 ,2) a) b
WHERE category_rank <= 3;
Sample data is as follow: date, category_name, revenue, category_rank
2016-10-05 jeans 20 1
2016-10-05 shirts 15 2
2016-10-05 shoes 10 3
2016-10-06 skirts 50 1
2016-10-06 sports_wear 30 2
2016-10-06 accesories 10 3
Desired outcome:date, category_name, revenue, category_rank, percentage_of_daily_total
2016-10-05 jeans 30 1 50
2016-10-05 shirts 20 2 33
2016-10-05 shoes 10 3 17
2016-10-06 skirts 20 1 50
2016-10-06 sports_wear 16 2 40
2016-10-06 accessories 4 3 10

Use CTEs
WITH a AS (
SELECT DATE(orders.order_timestamp) AS date,
products.product_cat AS category_name,
ROUND(SUM(payments.payment)) AS revenue
FROM table1.orders orders
JOIN table1.t_payments payments ON orders.order_id = payments.order_id
JOIN table1.t_items items ON orders.order_id = items.order_id
JOIN table1.t_products products ON items.product_id = products.product_id
GROUP BY 1 ,2
)
SELECT * FROM (
SELECT a.date,
a.category_name,
a.revenue,
row_number() OVER(PARTITION BY DATE(a.date) ORDER BY a.revenue DESC) AS category_rank,
(a.revenue / b.revenue_sum) * 100 percentage AS percentage_of_daily_total
FROM a
JOIN (SELECT date, sum(revenue) AS revenue_sum FROM a GROUP BY 1) AS b
ON a.date = b.date)
WHERE category_rank <= 3;

Your original query is very close. Calculate the percent in the OUTERMOST sql. So drop your percentage calc and then in the outer Select:
Select *, 100*revenue/(sum(revenue) Over (Partition by Date)) as percentage_of_daily_total
Remember that by the time you get to calculating the windowing functions the Where clause has already been executed and you're down to 3 records per day so any total will only be based on the top 3.

Related

Find the second top selling product in terms of sales and quantity

There are two Tables - orders and item_line
orders
order_id
created_at
total_amount
123
2022-11-11 13:40:50
450.00
124
2022-10-30 00:40:50
1500.00
item_line
order_id
product_id
product_name
quantity
unit_price
123
a1b
milo
4
100.00
123
c2d
coke
5
10.00
124
c2d
coke
150
10.00
The question is:
Find the second top selling product in terms of sales and quantity in the current year sold between 6PM to 9PM.
My Take on This is -
SELECT * FROM (
SELECT i.product_name,
SUM(o.total_amount)sales,
SUM(i.quantity)total_qty,
ROW_NUMBER() OVER (ORDER BY SUM(o.total_amount) DESC,SUM(i.quantity)total_qty DESC) AS rn
FROM item_line i
WHERE o.created_at BETWEEN 18:00:00 AND 21:00:00
JOIN orders o on o.order_id = i.order_id
GROUP BY i.product_name ) temp
WHERE rn = 2;
But it's not correct. What wrong I am doing?
SELECT * FROM (
SELECT i.product_name,SUM(o.total_amount)AS 'Net Sales',
ROW_NUMBER() OVER(ORDER BY SUM(o.total_amount) DESC) AS rn
FROM item_line i
JOIN orders o on o.order_id = i.order_id
WHERE DATEPART(HOUR,o.created_at) BETWEEN 18 AND 21
GROUP BY i.product_name) temp
WHERE rn =2;
-- In terms of total quantity
SELECT * FROM (
SELECT i.product_name,SUM(i.quantity)AS 'Total Quantity',
ROW_NUMBER() OVER(ORDER BY SUM(i.quantity) DESC) AS rn
FROM item_line i
JOIN orders o on o.order_id = i.order_id
WHERE DATEPART(HOUR,o.created_at) BETWEEN 18 AND 21
GROUP BY i.product_name) temp
WHERE rn =2;
select o.order_id, sum(quantity), total_amount from orders [o]
inner join item_line[i] on o.order_id = i.order_id
group by o.order_id, total_amount order by total_amount desc, sum(quantity) desc
OFFSET 1 ROWS
FETCH NEXT 1 ROWS ONLY;
you can add target date time in filter

Calculating average time between customer orders and average order value in Postgres

In PostgreSQL I have an orders table that represents orders made by customers of a store:
SELECT * FROM orders
order_id
customer_id
value
created_at
1
1
188.01
2020-11-24
2
2
25.74
2022-10-13
3
1
159.64
2022-09-23
4
1
201.41
2022-04-01
5
3
357.80
2022-09-05
6
2
386.72
2022-02-16
7
1
200.00
2022-01-16
8
1
19.99
2020-02-20
For a specified time range (e.g. 2022-01-01 to 2022-12-31), I need to find the following:
Average 1st order value
Average 2nd order value
Average 3rd order value
Average 4th order value
E.g. the 1st purchases for each customer are:
for customer_id 1, order_id 8 is their first purchase
customer 2, order 6
customer 3, order 5
So, the 1st-purchase average order value is (19.99 + 386.72 + 357.80) / 3 = $254.84
This needs to be found for the 2nd, 3rd and 4th purchases also.
I also need to find the average time between purchases:
order 1 to order 2
order 2 to order 3
order 3 to order 4
The final result would ideally look something like this:
order_number
AOV
av_days_since_last_order
1
254.84
0
2
300.00
28
3
322.22
21
4
350.00
20
Note that average days since last order for order 1 would always be 0 as it's the 1st purchase.
Thanks.
select order_number
,round(avg(value),2) as AOV
,coalesce(round(avg(days_between_orders),0),0) as av_days_since_last_order
from
(
select *
,row_number() over(partition by customer_id order by created_at) as order_number
,created_at - lag(created_at) over(partition by customer_id order by created_at) as days_between_orders
from t
) t
where created_at between '2022-01-01' and '2022-12-31'
group by order_number
order by order_number
order_number
aov
av_days_since_last_order
1
372.26
0
2
25.74
239
3
200.00
418
4
201.41
75
5
159.64
175
Fiddle
Im suppose it should be something like this
WITH prep_data AS (
SELECT order_id,
cuntomer_id,
ROW_NUMBER() OVER(PARTITION BY order_id, cuntomer_id ORDER BY created_at) AS pushcase_num,
created_at,
value
FROM pushcases
WHERE created_at BETWEEN :date_from AND :date_to
), prep_data2 AS (
SELECT pd1.order_id,
pd1.cuntomer_id,
pd1.pushcase_num
pd2.created_at - pd1.created_at AS date_diff,
pd1.value
FROM prep_data pd1
LEFT JOIN prep_data pd2 ON (pd1.order_id = pd2.order_id AND pd1.cuntomer_id = pd2.cuntomer_id AND pd1.pushcase_num = pd2.pushcase_num+1)
)
SELECT order_id,
cuntomer_id,
pushcase_num,
avg(value) AS avg_val,
avg(date_diff) AS avg_date_diff
FROM prep_data2
GROUP BY pushcase_num

BigQuery missing rows with SUM OVER PARTITION BY

TL;DR:
Given this table:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
How to do I get a table where the missing date/product combination (2020-11-02 - premium) is included with a fallback value for diff of 0.
Ideally, for multiple products. A list of all products can be get like this:
SELECT ARRAY_AGG(DISTINCT product) FROM subscriptions
I want to be able to get the subscription count per day, either for all products or just for some products.
And the way I think this can be easily achieved is by preparing a database that looks like this:
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 50 |
|---------------------|------------------|------------------|
With this table, I can easily group by date and product or just by date and sum the total.
Before I get to the result table I have generated a table where for each day and product I calculate the difference in subscriptions. How many new subscribers for each product are there and how many are no longer subscribed.
This table looks like this:
|---------------------|------------------|------------------|
| date | product | diff |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | -20 |
|---------------------|------------------|------------------|
Meaning on November, 1st the total count of premium subscribers increased by 50, and the total count of basic subscribers decreased by 20.
The problem now is that this temporary table is missing date points if there weren't any changes one product, see the example below.
When I started there was no product table and I only had the date and diff column.
To get from the second to the first table I used this query which worked perfect:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, 150 as diff
UNION ALL SELECT TIMESTAMP("2020-11-02"), -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), 60
)
SELECT
*,
SUM(diff) OVER (ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
But when I add the product column and try to calculate the sum per day and product there are some data points missing.
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
SELECT
*,
SUM(diff) OVER (PARTITION BY product ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
--
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-02 | basic | 90 |
|---------------------|------------------|------------------|
| 2020-11-03 | basic | 130 |
|---------------------|------------------|------------------|
| 2020-11-03 | premium | 70 |
|---------------------|------------------|------------------|
If I now show the total number of subscriptions per day, I would get:
150 -> 90 -> 200
But I would expect:
150 -> 140 -> 200
Same goes for the total number of premium subscriptions per day:
50 -> 0 -> 70
But I would expect:
50 -> 50 -> 70
I believe the best option to fix this would be to add the missing date/product combinations.
How would I do this?
-- Try this,I am creating a table for list of products and add total product in that list. Joining with your table to get data as per your requirement.
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
product_name as (
Select product from subscriptions group by 1
union all
Select "Total" as product
)
Select date
,product
,total_subscriptions
from (
Select a.date
,a.product
,diff
,SUM(diff) OVER (PARTITION BY a.product ORDER BY a.date) as total_subscriptions
from
(
Select date,a.product
from product_name A
join subscriptions B
on 1=1
where a.product !='Total'
group by 1,2
) A
left join subscriptions B
on A.product = B.product
and A.date = B.date
group by 1,2,3
) group by 1,2,3
union all
Select date
,product
,total_subscriptions
from
(
Select date,a.product
,diff
,SUM(diff) OVER (PARTITION BY a.product ORDER BY date) as total_subscriptions
from product_name A
join subscriptions B
on 1=1
where a.product ='Total'
group by 1,2,3
) group by 1,2,3
order by 1,2
If I follow you correctly, one approach is to can generate a fixed the list of dates for the period you want, and cross join it with the list of products. This gives you all possible combinations. Then, you can bring the subscriptions table with a left join, and finally perform the window sum:
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from unnest(generate_timestamp_array(
timestamp('2020-11-01'),
timestamp('2020-11-03'),
interval 1 day)
) dt
cross join (
select 'basic' product
union all select 'premium'
) p
left join subscriptions on s.product = p.product and s.date = dt
We can make the query a more generic by dynamically generating the date range and list of products:
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from (select min(date) min_dt, max(date) max_dt from subscriptions) d0
cross join unnest(generate_timestamp_array(d0.min_dt, d0.max_dt, interval 1 day)) dt
cross join (select distinct product from subscriptions) p
left join subscriptions on s.product = p.product and s.date = dt
Use GENERATE_TIMESTAMP_ARRAY:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
dates AS (
SELECT *
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2020-11-01 00:00:00', '2020-11-03 00:00:00', INTERVAL 1 DAY)) as date
),
products AS (
SELECT DISTINCT product FROM subscriptions
)
SELECT dates.date, products.product, subscriptions.diff
FROM dates
CROSS JOIN products
LEFT JOIN subscriptions
ON subscriptions.date = dates.date AND subscriptions.product = products.product

calculating percentage of sales profit in SQL

Product Group Product ID Sales Profit
A 6797 1,000 200
A 6745 500 90
B 1278 200 60
B 1245 1,500 350
C 7890 650 80
D 4587 350 50
Q1). Filter out product IDs that contribute to top 80% of the total profit of their respective group.
Not sure what rdbms you are using you can get the output in SQL server in this way. You can get profit for a group and use aggregate function to compare and filter the rows.
select 'A' as Product_group, 6797 as ProductID, 1000 as Sales , 200 as Profit into #temp1 union all
select 'A' as Product_group, 6745 as ProductID, 500 as Sales , 90 as Profit union all
select 'B' as Product_group, 1278 as ProductID, 200 as Sales , 60 as Profit union all
select 'B' as Product_group, 1245 as ProductID, 1500 as Sales , 350 as Profit union all
select 'C' as Product_group, 7890 as ProductID, 650 as Sales , 80 as Profit union all
select 'D' as Product_group, 4587 as ProductID, 350 as Sales , 50 as Profit
select t.Product_group, t.ProductID, sum(t.sales) totalsles, sum(t.profit) totalProfit, sum(Profit_grp) Groupprofit from #temp1 t
join (select Product_group, sum(sales) totalsles_group, sum(profit) Profit_grp from #temp1 t1 group by Product_group) t1 on t1.Product_group = t.Product_group
group by t.Product_group, t.ProductID
having sum(t.profit) *1.0/ sum(t1.Profit_grp) *1.0 >= 0.8
Output: I added group profit just to compare. You can remove the aggregate and add in group by if you would like
Product_group ProductID totalsles totalProfit Groupprofit
B 1245 1500 350 410
C 7890 650 80 80
D 4587 350 50 50
I think this may works out:
with CTE as(
select [Product Group], sum([Sales]) as Tolsum from Table
group by [Product Group]
select prod.*,
sum(prod.[Profit]/cte.[Tolsum]) over (Partition by prod.[Product Group] Order by prod.[Product ID]) as contribution
from CTE cte
inner join
Table prod
on
cte.[Product Group] = prod.[Product Group]
having
sum(prod.[Profit]/cte.[Tolsum]) over (Partition by prod.[Product Group] Order by prod.[Product ID]) < 0.8

How can I sum cost items, grouped by invoice?

I have two SQL Server tables below:
Invoice
InvoiceId Amount [Date]
1 10 2015-05-28 21:47:50.000
2 20 2015-05-28 21:47:50.000
3 25 2015-05-28 23:25:50.000
InvoiceItem
Id InvoiceId Cost
1 1 8
2 1 3
3 1 7
4 2 15
5 2 17
6 3 20
7 3 22
Now I want to JOIN these two tables ON InvoiceId and retrieve the following:
COUNT of DISTINCT InvoiceId from Invoice table AS [Count]
SUM of Amount from Invoice table AS Amount
SUM of Cost from InvoiceItem table AS Cost
HOUR part of [Date]
and GROUP them BY HOUR part of [Date].
Desired Output wil be:
[Count] Amount Cost HourOfDay
2 30 50 22
1 25 42 23
How can I do this?
one approach is to use a derived table:
SELECT CAST([Date] AS DATE) AS [Date],
DATEPART(HOUR,i.[Date]) AS HourOfDay,
COUNT(i.InvoiceId) AS NumberOfInvoices,
SUM(i.Amount) AS Amount,
SUM(it.Cost) AS Cost
FROM invoice i
INNER JOIN
(SELECT InvoiceId, SUM(Cost) AS Cost
FROM invoiceitem
GROUP BY InvoiceId) it ON i.InvoiceId = it.InvoiceId
GROUP BY [Date],DATEPART(HOUR,i.[Date])
or a CTE (Common Table Expression)
WITH InvoiceCosts (InvoiceId, Cost)
AS
(
SELECT InvoiceId, SUM(Cost) AS Cost
FROM invoiceitem
GROUP BY InvoiceId
)
SELECT CAST([Date] AS DATE) AS [Date],
DATEPART(HOUR,i.[Date]) AS HourOfDay,
COUNT(i.InvoiceId) AS NumberOfInvoices,
SUM(i.Amount) AS Amount,
SUM(ic.Cost) AS Cost
FROM invoice i
INNER JOIN
InvoiceCosts ic ON i.InvoiceId = ic.InvoiceId
GROUP BY [Date],DATEPART(HOUR,i.[Date])
SELECT COUNT (DISTINCT inv.InvoiceId) [Count],
SUM (Amount) Amount,
SUM (Cost) Cost,
datepart(HOUR, inv.[Date]) HourOfDay
FROM Invoice inv
INNER JOIN InvoiceItem itm
ON inv.InvoiceId = itm.InvoiceId
GROUP BY datepart(HOUR, inv.[Date]);