Calculate the number of products responsible for 50% of my sales - sql

I have a shop that sells products in different countries.
I end up with a sales table like this ( with much more month)
Month
Country
Product
Sales
01-2022
UK
Tomato
10
01-2022
UK
Banana
4
01-2022
UK
Garlic
1
01-2022
FR
Tomato
1
01-2022
FR
Banana
2
01-2022
FR
Garlic
1
I would like to know the number of products responsible for 50% of the sales per month and country. Something like this.
Month
Country
Nb products accountable for 50% sales
01-2022
UK
1
02-2022
UK
3
03-2022
UK
2
01-2022
FR
1
02-2022
FR
4
03-2022
FR
3
The objective is to have the percentage of my catalogue responsible for the majority of sales. Exemple: 10% of my catalogue represents 50% of sales.
I have tried to solve the problem with multiple window functions and I have already searched the open topics without success

I finally found solution tweaking windows functions.
,t1 AS (
SELECT
*
,SUM(sales) OVER (PARTITION BY country_group, order_date ORDER BY sales DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING) AS running_total
,0.5*SUM(sales) OVER(PARTITION BY country_group, order_date) AS total_sales_x_50perc
FROM t0
ORDER BY 1
)
SELECT
order_date
,country_group
,COUNT(DISTINCT CASE WHEN running_total <= total_sales_x_50perc THEN product ELSE NULL END) AS nb_products
,COUNT(DISTINCT product) AS total_nb_products
,COUNT(DISTINCT CASE WHEN running_total <= total_sales_x_50perc THEN product ELSE NULL END)/COUNT(DISTINCT products) AS perc
FROM t1
GROUP BY 1,2
ORDER BY 1

Related

How to find out first product item client purchased whose bought specific products?

I want to write a query to locate a group of clients whose purchased specific 2 product categories, at the same time, getting the information of first transaction date and first item they purchased. Since I used group by function, I could only get customer id but not first item purchase due to the nature of group by. Any thoughts to solve this problem?
What I have are transaction tables(t), customer_id tables(c) and product tables(p). Mine is SQL server 2008.
Update
SELECT t.customer_id
,t.product_category
,MIN(t.transaction_date) AS FIRST_TRANSACTION_DATE
,SUM(t.quantity) AS TOTAL_QTY
,SUM(t.sales) AS TOTAL_SALES
FROM transaction t
WHERE t.product_category IN ('VEGETABLES', 'FRUITS')
AND t.transaction_date BETWEEN '2020/01/01' AND '2022/09/30'
GROUP BY t.customer_id
HAVING COUNT(DISTINCT t.product_category) = 2
**Customer_id** **transaction_date** **product_category** **quantity** **sales**
1 2022-05-30 VEGETABLES 1 100
1 2022-08-30 VEGETABLES 1 100
2 2022-07-30 VEGETABLES 1 100
2 2022-07-30 FRUITS 1 50
2 2022-07-30 VEGETABLES 2 200
3 2022-07-30 VEGETABLES 3 300
3 2022-08-01 FRUITS 1 50
3 2022-08-05 FRUITS 1 50
4 2022-08-07 FRUITS 1 50
4 2022-09-05 FRUITS 2 100
In the above, what I want to show after executing the SQL query is
**Customer_id** **FIRST_TRANSACTION_DATE** **first_product_category** **TOTAL_QUANTITY** **TOTAL_SALES**
2 2022-07-30 VEGETABLES, FRUITS 4 350
3 2022-07-30 VEGETABLES 5 400
Customer_id 1 and 4 will not be shown as they only purchased either vegetables or fruits but not both
Check now, BTW need find logic with product_category
select CustomerId, transaction_date, product_category, quantity, sales
from(
select CustomerId, transaction_date, product_category , sum(quantity) over(partition by CustomerId ) as quantity , sum(sales) over(partition by CustomerId ) as sales, row_number() over(partition by CustomerId order by transaction_date ASC) rn
from(
select CustomerId, transaction_date, product_category, quantity, sales
from tablee t
where (product_category = 'FRUITS' and
EXISTS (select CustomerId
from tablee tt
where product_category = 'VEGETABLES'
and t.CustomerId = tt.CustomerId)) OR
(product_category = 'VEGETABLES' and
EXISTS (select CustomerId
from tablee tt
where product_category = 'FRUITS'
and t.CustomerId = tt.CustomerId)))x)over_all
where rn = 1;
HERE is FIDDLE

Calculating multiple averages across different parts of the table?

I have the following transactions table:
customer_id purchase_date product category department quantity store_id
1 2020-10-01 Kit Kat Candy Food 2 store_A
1 2020-10-01 Snickers Candy Food 1 store_A
1 2020-10-01 Snickers Candy Food 1 store_A
2 2020-10-01 Snickers Candy Food 2 store_A
2 2020-10-01 Baguette Bread Food 5 store_A
2 2020-10-01 iPhone Cell phones Electronics 2 store_A
3 2020-10-01 Sony PS5 Games Electronics 1 store_A
I would like to calculate the average number of products purchased (for each product in the table). I'm also looking to calculate averages across each category and each department by accounting for all products within the same category or department respectively. Care should be taken to divide over unique customers AND the product quantity being greater than 0 (a 0 quantity indicates a refund, and should not be accounted for).
So basically, the output table would like below:
...where store_id and average_level_type are partition columns.
Is there a way to achieve this in a single pass over the transactions table? or do I need to break down my approach into multiple steps?
Thanks!
How about using “union all” as below -
Select store_id, 'product' as average_level_type,product as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,product
Union all
Select store_id, 'category' as average_level_type, category as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,category
Union all
Select store_id, 'department' as average_level_type,department as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,department;
If you want to avoid using union all in that case you can use something like rollup() or group by grouping sets() to achieve the same but the query would be a little more complicated to get the output in the exact format which you have shown in the question.
EDIT : Below is how you can use grouping sets to get the same output -
Select store_id,
case when G_ID = 3 then 'product'
when G_ID = 5 then 'category'
when G_ID = 6 then 'department' end As average_level_type,
case when G_ID = 3 then product
when G_ID = 5 then category
when G_ID = 6 then department end As id,
total_quantity,
unique_customer_count,
average
from
(select store_id, product, category, department, sum(quantity) as total_quantity, Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average, GROUPING__ID As G_ID
from transactions
group by store_id,product,category,department
grouping sets((store_id,product),(store_id,category),(store_id,department))
) Tab
order by 2
;

Find out the top 3 customers by sum of sales from different groups for the last 30 days - Amazon interview

This was my Amazon SQL interview question which I bombed miserably.
We have 3 tables:
customers orders catalog
cust_id order_date catalog_id
cust_name order_id catalog_name
unit_price cust_id
quantity
catalog_id
The output expected was to find top 3 customers from the 3 catalog / business units for the last 30 days. I tried partitioning over total sales but the last 30 day sales and multiple joins threw me off. Following were the columns requested:
cust_id cust_name catalog_name total_sales(unit_price*quantity)
1 David Books 1400
2 John Books 1200
3 Lisa Books 1000
4 Paul DVDs 500
2 John DVDs 313.5
5 James DVDs 220
6 Alice TV 110
1 David TV 87.5
7 Jerry TV 56
I understand basic 'partitioning over order by' however I have not used it over multiple tables with a datestamp. Kindly help me in understanding this concept. Thank you all in advance!
The query below should give you an idea.
select *
from (select c.cust_id,c.cust_name,ct.catalog_name,sum(o.unit_price * o.quantity) as total_sales,
,dense_rank() over(partition by ct.catalog_name order by sum(o.unit_price * o.quantity) desc) as rnk
from customers c
join orders o on o.cust_id = c.cust_id
join catalog ct on ct.catalog_id = o.catalog_id
--last 30 days filter
where o.order_date >= date_add(day,-30,cast(getdate() as date)) and o.order_date < cast(getdate() as date)
group by c.cust_id,c.cust_name,ct.catalog_name
) t
where rnk <= 3

pgsql -Showing top 10 products's sales and other products as 'others' and its sum of sales

I have a table called "products" where it has 100 records with sales details. My requirement is so simple that I was not able to do it.
I need to show the top 10 product names with sales and other product names as "others" and its sales. so totally my o/p will be 11 rows. 11-th row should be others and sum of sales of all remaining products. Can anyone give me the logic?
O/p should be like this,
Name sales
------ -----
1 colgate 9000
2 pepsodent 8000
3 closeup 7000
4 brittal 6000
5 ariies 5000
6 babool 4000
7 imami 3000
8 nepolop 2500
9 lactoteeth 2000
10 menwhite 1500
11 Others 6000 (sum of sales of remaining 90 products)
here is my sql query,
select case when rank<11 then prod_cat else 'Others' END as prod_cat,
total_sales,ID,rank from (select ROW_NUMBER() over (order by (sum(i.grandtotal)) desc) as rank,pc.name as prod_cat,sum(i.grandtotal) as total_sales, pc.m_product_category_id as ID`enter code here`
from adempiere.c_invoice i join adempiere.c_invoiceline il on il.c_invoice_id=i.c_invoice_id join adempiere.m_product p on p.m_product_id=il.m_product_id join adempiere.m_product_category pc on pc.m_product_category_id=p.m_product_category_id
where extract(year from i.dateacct)=extract(year from now())
group by pc.m_product_category_id) innersql
order by total_sales desc
o/p what i got is,
prod_cat total_sales id rank
-------- ----------- --- ----
BSHIRT 4511697.63 460000015 1
BT-SHIRT 2725167.03 460000016 2
SHIRT 2630471.56 1000003 3
BJEAN 1793514.07 460000005 4
JEAN 1115402.90 1000004 5
GT-SHIRT 1079596.33 460000062 6
T SHIRT 446238.60 1000006 7
PANT 405189.00 1000005 8
GDRESS 396789.02 460000059 9
BTROUSER 393739.48 460000017 10
Others 164849.41 1000009 11
Others 156677.00 1000008 12
Others 146678.00 1000007 13
As #e4c5 suggests, use UNION:
select id, prod_cat, sum(total_sales) as total_sales
with
totals as (
select --pc.m_product_category_id as id,
pc.name as prod_cat,
sum(i.grandtotal) as total_sales,
ROW_NUMBER() over (order by sum(i.grandtotal) desc) as rank
from adempiere.c_invoice i
join adempiere.c_invoiceline il on (il.c_invoice_id=i.c_invoice_id)
join adempiere.m_product p on (p.m_product_id=il.m_product_id)
join adempiere.m_product_category pc on (pc.m_product_category_id=p.m_product_category_id)
where i.dateacct >= date_trunc('year', now()) and i.dateacct < date_trunc('year', now()) + interval '1' year
group by pc.m_product_category_id, pc.name
),
rankedothers as (
select prod_cat, total_sales, rank
from totals where rank <= 10
union
select 'Others', sum(total_sales), 11
from totals where rank > 10
)
select prod_cat, total_sales
from ranked_others
order by rank
Also, I recommend using sargable conditions like the one above, which is slightly more complicated than the one you implemented, but generally worth the extra effort.

SQL - select top xx% rows

I have a table, sales, which is ordered by descending TotalSales
user_id | TotalSales
----------------------
4 10
2 1.5
5 0.99
3 0.5
1 0.33
What I would like to do is find the percentage of the sum of all sales that the xx% most important sales represent.
For example if I wanted to do it for top 40% sales, here I would get (10+1.5)/(10+1.5+0.99+0.5+0.33)= 86%
But right now I haven't been able to select "top xx% rows".
Edit: DB management system can be MySQL or Vertica or Hive
select Sum(a) as s from sales where a in (Select TotalSales from sales where TotalSales>=x)
GROUP BY a
select Sum(TotalSales) as b from sales group by b
your result is s/b
and x= the percentage you set each time