Find out the top 3 customers by sum of sales from different groups for the last 30 days - Amazon interview

Find out the top 3 customers by sum of sales from different groups for the last 30 days - Amazon interview - sql

This was my Amazon SQL interview question which I bombed miserably.
We have 3 tables:
customers orders catalog
cust_id order_date catalog_id
cust_name order_id catalog_name
unit_price cust_id
quantity
catalog_id
The output expected was to find top 3 customers from the 3 catalog / business units for the last 30 days. I tried partitioning over total sales but the last 30 day sales and multiple joins threw me off. Following were the columns requested:
cust_id cust_name catalog_name total_sales(unit_price*quantity)
1 David Books 1400
2 John Books 1200
3 Lisa Books 1000
4 Paul DVDs 500
2 John DVDs 313.5
5 James DVDs 220
6 Alice TV 110
1 David TV 87.5
7 Jerry TV 56
I understand basic 'partitioning over order by' however I have not used it over multiple tables with a datestamp. Kindly help me in understanding this concept. Thank you all in advance!

The query below should give you an idea.
select *
from (select c.cust_id,c.cust_name,ct.catalog_name,sum(o.unit_price * o.quantity) as total_sales,
,dense_rank() over(partition by ct.catalog_name order by sum(o.unit_price * o.quantity) desc) as rnk
from customers c
join orders o on o.cust_id = c.cust_id
join catalog ct on ct.catalog_id = o.catalog_id
--last 30 days filter
where o.order_date >= date_add(day,-30,cast(getdate() as date)) and o.order_date < cast(getdate() as date)
group by c.cust_id,c.cust_name,ct.catalog_name
) t
where rnk <= 3

Related

How to apply join on tables while using having function?

There are two tables - trans and customer.
trans
customerid
pdate
purchase
1
11-01-2021
200
2
13-02-2021
400
1
18-01-2021
400
3
31-01-2021
600
4
23-03-2021
700
customer
customerid
firstname
lastname
1
raj
sharma
2
rahul
dev
3
riya
sen
4
rose
menon
The question is -
Find the firstname, lastname and sum of the purchase of the customers where sum of the purchase is greater than 500 in the month of January (Without using any CTE).
My take on this was -
SELECT
c.firstname,
c.lastname,
SUM( t.purchase )
FROM
customer c
INNER JOIN trans t ON c.id = t.id
GROUP BY
c.firstname,
c.lastname
HAVING
SUM( t.purchase ) > 500;
But this ignores the customers with id 1. The customerid 1 has made a purchase twice and the sum of his purchase is greater than 500.
I am using PostgreSQL.

Calculate the number of products responsible for 50% of my sales

I have a shop that sells products in different countries.
I end up with a sales table like this ( with much more month)
Month
Country
Product
Sales
01-2022
UK
Tomato
10
01-2022
UK
Banana
4
01-2022
UK
Garlic
1
01-2022
FR
Tomato
1
01-2022
FR
Banana
2
01-2022
FR
Garlic
1
I would like to know the number of products responsible for 50% of the sales per month and country. Something like this.
Month
Country
Nb products accountable for 50% sales
01-2022
UK
1
02-2022
UK
3
03-2022
UK
2
01-2022
FR
1
02-2022
FR
4
03-2022
FR
3
The objective is to have the percentage of my catalogue responsible for the majority of sales. Exemple: 10% of my catalogue represents 50% of sales.
I have tried to solve the problem with multiple window functions and I have already searched the open topics without success

I finally found solution tweaking windows functions.
,t1 AS (
SELECT
*
,SUM(sales) OVER (PARTITION BY country_group, order_date ORDER BY sales DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING) AS running_total
,0.5*SUM(sales) OVER(PARTITION BY country_group, order_date) AS total_sales_x_50perc
FROM t0
ORDER BY 1
)
SELECT
order_date
,country_group
,COUNT(DISTINCT CASE WHEN running_total <= total_sales_x_50perc THEN product ELSE NULL END) AS nb_products
,COUNT(DISTINCT product) AS total_nb_products
,COUNT(DISTINCT CASE WHEN running_total <= total_sales_x_50perc THEN product ELSE NULL END)/COUNT(DISTINCT products) AS perc
FROM t1
GROUP BY 1,2
ORDER BY 1

Aggregate before and after a date column

I have two tables: db.transactions and db.salesman, which I would like to combine in order to create an output that has aggregated sales before each salesman's hire date and after each salesman's hire date.
select * from db.transactions
index sales_rep sales trx_date
1 Tom 200 9/18/2020
2 Jerry 435 6/21/2020
3 Patrick 1400 4/30/2020
4 Tom 560 5/24/2020
5 Francis 240 1/2/2021
select * from db.salesman
index sales_rep hire_date
1 Tom 8/19/2020
2 Jerry 1/28/2020
3 Patrick 4/6/2020
4 Francis 9/4/2020
I would like to aggregate sales from db.transactions before and after each sales rep's hire date.
Expected output:
index sales_rep hire_date agg_sales_before_hire_date agg_sales_after_hire_date
1 Tom 8/19/2020 1200 5000
2 Jerry 1/28/2020 500 900
3 Patrick 4/6/2020 5000 300
4 Francis 9/4/2020 2900 1500
For a single sales rep, to calculate the agg_sales_before_hire_date is likely:
select tx.sales_rep, tx.sum(sales)
from db.transactions tx
inner join db.salesman sm on sm.sales_rep = tx.sales_rep
where hire_date < '8/19/2020' and sales_rep = 'Tom'
group by tx.sales_rep
PostGRESQL. I am also open to the idea of doing it into Tableau or Python.

Using CROSS JOIN LATERAL
select
sa.sales_rep, sa.hire_date,
l.agg_sales_before_hire_date,
l.agg_sales_after_hire_date
from salesman sa
cross join lateral
(
select
sum(tx.sales) filter (where tx.trx_date < sa.hire_date) agg_sales_before_hire_date,
sum(tx.sales) filter (where tx.trx_date >= sa.hire_date) agg_sales_after_hire_date
from transactions tx
where tx.sales_rep = sa.sales_rep
) l;

Use conditional aggregation:
select tx.sales_rep,
sum(case when tx.txn_date < sm.hire_date then sales else 0 end) as before_sales,
sum(case when tx.txn_date >= sm.hire_date then sales else 0 end) as after_sales
from db.transactions tx inner join
db.salesman sm
on sm.sales_rep = tx.sales_rep
group by tx.sales_rep;
EDIT:
In Postgres, you would use filter for the logic:
select tx.sales_rep,
sum(sales) filter (where tx.txn_date < sm.hire_date) as before_sales,
sum(sales) filter (where tx.txn_date >= sm.hire_date then sales) as after_sales

pgsql -Showing top 10 products's sales and other products as 'others' and its sum of sales

I have a table called "products" where it has 100 records with sales details. My requirement is so simple that I was not able to do it.
I need to show the top 10 product names with sales and other product names as "others" and its sales. so totally my o/p will be 11 rows. 11-th row should be others and sum of sales of all remaining products. Can anyone give me the logic?
O/p should be like this,
Name sales
------ -----
1 colgate 9000
2 pepsodent 8000
3 closeup 7000
4 brittal 6000
5 ariies 5000
6 babool 4000
7 imami 3000
8 nepolop 2500
9 lactoteeth 2000
10 menwhite 1500
11 Others 6000 (sum of sales of remaining 90 products)
here is my sql query,
select case when rank<11 then prod_cat else 'Others' END as prod_cat,
total_sales,ID,rank from (select ROW_NUMBER() over (order by (sum(i.grandtotal)) desc) as rank,pc.name as prod_cat,sum(i.grandtotal) as total_sales, pc.m_product_category_id as ID`enter code here`
from adempiere.c_invoice i join adempiere.c_invoiceline il on il.c_invoice_id=i.c_invoice_id join adempiere.m_product p on p.m_product_id=il.m_product_id join adempiere.m_product_category pc on pc.m_product_category_id=p.m_product_category_id
where extract(year from i.dateacct)=extract(year from now())
group by pc.m_product_category_id) innersql
order by total_sales desc
o/p what i got is,
prod_cat total_sales id rank
-------- ----------- --- ----
BSHIRT 4511697.63 460000015 1
BT-SHIRT 2725167.03 460000016 2
SHIRT 2630471.56 1000003 3
BJEAN 1793514.07 460000005 4
JEAN 1115402.90 1000004 5
GT-SHIRT 1079596.33 460000062 6
T SHIRT 446238.60 1000006 7
PANT 405189.00 1000005 8
GDRESS 396789.02 460000059 9
BTROUSER 393739.48 460000017 10
Others 164849.41 1000009 11
Others 156677.00 1000008 12
Others 146678.00 1000007 13

As #e4c5 suggests, use UNION:
select id, prod_cat, sum(total_sales) as total_sales
with
totals as (
select --pc.m_product_category_id as id,
pc.name as prod_cat,
sum(i.grandtotal) as total_sales,
ROW_NUMBER() over (order by sum(i.grandtotal) desc) as rank
from adempiere.c_invoice i
join adempiere.c_invoiceline il on (il.c_invoice_id=i.c_invoice_id)
join adempiere.m_product p on (p.m_product_id=il.m_product_id)
join adempiere.m_product_category pc on (pc.m_product_category_id=p.m_product_category_id)
where i.dateacct >= date_trunc('year', now()) and i.dateacct < date_trunc('year', now()) + interval '1' year
group by pc.m_product_category_id, pc.name
),
rankedothers as (
select prod_cat, total_sales, rank
from totals where rank <= 10
union
select 'Others', sum(total_sales), 11
from totals where rank > 10
)
select prod_cat, total_sales
from ranked_others
order by rank
Also, I recommend using sargable conditions like the one above, which is slightly more complicated than the one you implemented, but generally worth the extra effort.

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.

In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.

Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.

in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id

So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find out the top 3 customers by sum of sales from different groups for the last 30 days - Amazon interview - sql

Related

How to apply join on tables while using having function?

Calculate the number of products responsible for 50% of my sales

Aggregate before and after a date column

pgsql -Showing top 10 products's sales and other products as 'others' and its sum of sales

Firebird Query- Return first row each group

Categories

Resources