Get the first 10 rows for each group - sql

I have three tables:
Customer(IdCustomer, Name)
Product(IdProduct, Product)
Order(IdProduct, IdCustomer, nbOrders)
So the Order table stores how many times a customer has ordered a product.
I need a view like this:
TopOrder(Name, Product, nbCommands)
But I only want 10 products for each customer, the ones he ordered the most and I can't figure it out.

The dense_rank window function should be exactly what the doctor prescribed:
CREATE View TopOrder AS
SELECT Name, Product, nbOrders
FROM (SELECT Name, Product, nbOrders,
DENSE_RANK() OVER (PARTITION BY o.idCustomer
ORDER BY nbOrders DESC) AS rk
FROM Customer c
JOIN Orders o ON c.idCustomer = o.idCustomer
JOIN Product p ON p.idProduct = o.idProduct
) t
WHERE rk <= 10

Related

Calculating the average of order value without using a WITH statement

I am trying to add a new column to my table which will be the average value calculated as the division of two existing columns. Therefore Average value = Total Sales / Number of Orders.
My data looks like this:click to view picture
I don't understand why Example Code A does not work but Example Code B does. Please can someone explain?
Example Code A
%%sql
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales,
order_num / total_sales avg_order_value
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC;
Example Code B
%%sql
WITH
customer_sales AS
(
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC
)
SELECT
country,
customer_num,
order_num,
total_sales,
total_sales / order_num avg_order_value
FROM customer_sales;
Thank you!
Depending on the DBMS some allow you to reference the alias in the calculation (in the same select) and others require you to either bring it outside in an outer query or state your previous aggregation/functions, such as counts or sums.
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales,
count(i.invoice_id) / ROUND(SUM(i.total),2) avg_order_value
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC;

Include the sales revenue amount into the select query

I also have a (somewhat) similar scenario as from this guy.
This is my current code:
SELECT Vendor_Name, Product
FROM (
SELECT v.Vendor_Name, p.Description AS Product, ROW_NUMBER() OVER (PARTITION BY v.Vendor_Key ORDER BY SUM(sf.Price * sf.Quantity) DESC) AS seqnum
FROM SalesFacts sf JOIN Vendor v
ON sf.Vendor_Key = v.Vendor_Key JOIN Product p
ON sf.Product_Key = p.Product_Key
GROUP BY v.Vendor_Key, v.Vendor_Name, p.Product_Key, p.Description
) vp
WHERE vp.seqnum = 1
The result of the query is show as below:
What the above query did was to extract the top-grossing product for each vendor from the entire database, or in other words, the query obtained the highest-revenue product per vendor.
I wanted to add in a new column, which is Sales Revenue, which calculation can be derived as such:
price of item * quantity. I wanted to add in the new column so that i can see see how much revenue the vendor earned from their respective best-selling products.
How do i obtain the same result with the inclusion of sales revenue column?
The way your question is asked, you just want to return this column from the subquery so it can be accessed in the outer query:
SELECT Vendor_Name, Product, Sales_Revenue
FROM (
SELECT
v.Vendor_Name,
p.Description AS Product,
SUM(sf.Price * sf.Quantity) Sales_Revenue
ROW_NUMBER() OVER (PARTITION BY v.Vendor_Key ORDER BY SUM(sf.Price * sf.Quantity) DESC) AS seqnum
FROM SalesFacts sf JOIN Vendor v
ON sf.Vendor_Key = v.Vendor_Key JOIN Product p
ON sf.Product_Key = p.Product_Key
GROUP BY v.Vendor_Key, v.Vendor_Name, p.Product_Key, p.Description
) vp
WHERE vp.seqnum = 1

Selecting the vendor with highest Ratio and lowest price per product

From my Orders table I'm trying to get per product the vendor with the highest Ratio order and lowest price.
select v.[Vendor_pk], p.[Product_pk],o.[UnitPrice], o.[RatioOrder], o.[TotalPrice]
from [DWH].[ORDERSfact] o
inner join [DWH].[VENDORdim] v
on o.[Vendor_pk]=v.[Vendor_pk]
inner join [DWH].[PRODUCTdim] p
on o.[Product_pk]= p.[Product_pk]
order by 2, 4 desc, 3
How can I do that here?
I'm trying to get per product the vendor with the highest Ratio order and lowest price.
Use row_number():
select vp.*
from (select v.[Vendor_pk], p.[Product_pk],o.[UnitPrice], o.[RatioOrder], o.[TotalPrice],
row_number() over (partition by product_pk order by ratioorder desc, price asc) sa seqnum
from [DWH].[ORDERSfact] o join
[DWH].[VENDORdim] v
on o.[Vendor_pk] = v.[Vendor_pk] join
[DWH].[PRODUCTdim] p
on o.[Product_pk] = p.[Product_pk]
) vp
where seqnum = 1;

How to find the three greatest values in each category in PostgreSQL?

I am a SQL beginner. I have trouble on how to find the top 3 max values in each category. The question was
"For order_ids in January 2006, what were the top (by revenue) 3 product_ids for each category_id? "
Table A:
(Column name)
customer_id
order_id
order_date
revenue
product_id
Table B:
product_id
category_id
I tried to combine table B and A using an Inner Join and filtered by the order_date. But then I am stuck on how to find the top 3 max values in each category_id.
Thanks.
This is so far what I can think of
SELECT B.product_id, category_id FROM A
JOIN B ON B.product_id = A.product_id
WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’
ORDER BY revenue DESC
LIMIT 3;
This kind of query is typically solved using window functions
select *
from (
SELECT b.product_id,
b.category_id,
a.revenue,
dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk
from A
join b ON B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
) as t
where rnk <= 3
order by product_id, category_id, revenue desc;
dense_rank() will also deal with ties (products with the same revenue in the same category) so you might actually get more than 3 rows per product/category.
If the same product can show up more than once in table b (for the same category) you need to combine this with a GROUP BY to get the sum of all revenues:
select *
from (
SELECT b.product_id,
b.category_id,
sum(a.revenue) as total_revenue,
dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk
from a
join b on B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
group by b.product_id, b.category_id
) as t
where rnk <= 3
order by product_id, category_id, total_revenue desc;
When combining window functions and GROUP BY, the window function will be applied after the GROUP BY.
You can use window functions to gather the grouped revenue and then pull the last X in the outer query. I have not worked in PostgreSQL in a bit so I may be missing a shortcut function below.
WITH ByRevenue AS
(
--This creates a virtualized table that can be queried similar to a physical table in the conjoined statements below
SELECT
category_id,
product_id,
MAX(revenue) as max_revenue
FROM
A
JOIN B ON B.product_id = A.product_id
WHERE
order_date BETWEEN ‘2018-01-01’ AND ‘2018-01-31’
GROUP BY
category_id,product_id
)
,Normalized
(
--Pull data from the in memory table above using normal sql syntax and normalize it with a RANK function to achieve the limit.
SELECT
category_id,
product_id,
max_revenue,
ROW_NUMBER() OVER (PARTITION BY category_id,product_id ORDER BY max_revenue DESC) as rn
FROM
ByRevenue
)
--Final query from stuff above with each category/product ranked by revenue
SELECT *
FROM Normalized
WHERE RN<=3;
For top-n queries, the first thing to try is usually the lateral join:
WITH categories as (
SELECT DISTINCT category_id
FROM B
)
SELECT categories.category_id, sub.product_id
FROM categories
JOIN LATERAL (
SELECT a.product_id
FROM B
JOIN A ON (a.product_id = b.product_id)
WHERE b.category_id = categories.category_id
AND order_date BETWEEN '2006-01-01' AND '2006-01-31'
GROUP BY a.product_id
ORDER BY sum(revenue) desc
LIMIT 3
) sub on true;
Try using Fetch n rows only?
Note: Let's think that your primary key here is product_id, so I used them for combining the two table.
SELECT A.category,A.revenue From Table A
INNER JOIN Table B on A.product_id = B.Product_ID
WHERE A.Order_Date between (from date) and (to date)
ORDER BY A.Revenue DESC
Fetch first 3 rows only

SQL nested aggregate grouping error

I'm trying to find name of product which has sold maximum units, I've two tables, purchases and products, products has pname and pid, purchases has pid, qty(units sold).
I've managed this
select p.pname, sum(q.qty) from purchases q
inner join products p on p.pid=q.pid
where p.pid=q.pid
group by p.pname
order by sum(q.qty) desc
I'm getting the result in descending order but I need only the top most selling units, multiple products can have top most selling units. When I use
max(sum(q.qty))
I get grouping error.
One approach is to derive the values first using a common table expression.
Simply put you can't wrap aggregates in other aggregates. You may be able to wrap an aggregate around an analytic however.
with cte as (select p.pname, sum(q.qty) from purchases q
inner join products p on p.pid=q.pid
where p.pid=q.pid
group by p.pname
order by sum(q.qty) desc)
Select pname, max(purchases)
from cte
group by pname
You can use ctes to do this.
1)First get the total quantity of each product
2)Then get the maximum of all those totals
3)Join it with your original query
with totals as (select pid, sum(qty) totalqty from purchases group by pid)
, t1 as (select p.pid, p.pname, sum(q.qty) totqty
from purchases q
inner join products p on p.pid=q.pid
group by p.pname)
, t2 as (select max(totalqty) maxtotal from totals)
select pname, totqty
from t1
join t2 on t1.totqty = t2.maxtotal
Analytics can simplify this for you. If you have more than one product with the same sum(qty) and that happens to be the max(sum(qty)), then this should get you them:
select pname, quantity
FROM (
select p.pname
, sum(q.qty) quantity
,rank() over (order by sum(q.qty desc) ranking
from purchases q
inner join products p on p.pid=q.pid
group by p.pname
)
where ranking = 1