get row with max from group by results - sql

I have sql such as:
select
c.customerID, sum(o.orderCost)
from customer c, order o
where c.customerID=o.customerID
group by c.customerID;
This returns a list of
customerID, orderCost
where orderCost is the total cost of all orders the customer has made. I want to select the customer who has paid us the most (who has the highest orderCost). Do I need to create a nested query for this?

You need a nested query, but you don't have to access the tables twice if you use analytic functions.
select customerID, sumOrderCost from
(
select customerID, sumOrderCost,
rank() over (order by sumOrderCost desc) as rn
from (
select c.customerID, sum(o.orderCost) as sumOrderCost
from customer c, orders o
where c.customerID=o.customerID
group by c.customerID
)
)
where rn = 1;
The rank() function ranks the results from your original query by the sum() value, then you only pick those with the highest rank - that is, the row(s) with the highest total order cost.
If more than one customer has the same total order cost, this will return both. If that isn't what you want you'll have to decide how to determine which single result to use. If you want the lowest customer ID, for example, add that to the ranking function:
select customerID, sumOrderCost,
rank() over (order by sumOrderCost desc, customerID) as rn
You can adjust you original query to return other data instead, just for the ordering, and not include it in the outer select.

You need to create nested query for this.
Two queries.

Related

SQL How to select customers with highest transaction amount by state

I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.
Table schemas
customers
|_state
|_customer_id
|_customer_name
transactions
|_customer_id
|_transact_amt
Attempts look something like this
SELECT state, Sum(transact_amt) AS HighestSum
FROM (
SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
FROM customers
INNER JOIN customers ON transactions.customer_id = customers.customer_id
GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum
I'm lost. Thank you.
Expected results are the names of customers with the top 5 highest transactions in each state.
ERROR: table name "customers" specified more than once
SQL state: 42712
First, you need for your JOIN to be correct. Second, you want to use window functions:
SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
FROM customers c JOIN
transaactions t
ON t.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.state
) ct
WHERE seqnum <= 5;
You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.
You can get them using aggregation and then by using the RANK() window function. For example:
select
state,
rk,
customer_name
from (
select
*,
rank() over(partition by state order by total desc) as rk
from (
select
c.customer_id,
c.customer_name,
c.state,
sum(t.transact_amt) as total
from customers c
join transactions t on t.customer_id = c.customer_id
group by c.customer_id
) x
) y
where rk <= 5
order by state, rk
There are two valid answers already. Here's a third:
SELECT *
FROM (
SELECT c.state, c.customer_name, t.*
, row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
FROM (
SELECT customer_id, sum(transact_amt) AS transact_sum
FROM transactions
GROUP BY customer_id
) t
JOIN customers c USING (customer_id)
) sub
WHERE rn < 6
ORDER BY state, rn;
Major points
When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:
Why does the following join increase the query time significantly?
Two SQL LEFT JOINS produce incorrect result
Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)
PostgreSQL sort by datetime asc, null first?

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

How do I proceed on this query

I want to know if there's a way to display more than one column on an aggregate result but without it affecting the group by.
I need to display the name alongside an aggregate result, but I have no idea what I am missing here.
This is the data I'm working with:
It is the result of the following query:
select * from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
I need to find the salesman that sold the most stuff (total price) for a specific year.
This is what I have so far:
select DATEPART(year,sale.sale_date)'year', Salesman.First_Name,sum(sale.price)
from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
group by DATEPART(year,sale.sale_date),Salesman.First_Name
This returns me the total sales made by each salesman.
How do I continue from here to get the top salesman of each year?
Maybe the query I am doing is completely wrong and there is a better way?
Any advice would be helpful.
Thanks.
This should work for you:
select *
from(
select DATEPART(year,s.sale_date) as SalesYear -- Avoid reserved words for object names
,sm.First_Name
,sum(s.price) as TotalSales
,row_number() over (partition by DATEPART(year,s.sale_date) -- Rank the data within the same year as this data row.
order by sum(s.price) desc -- Order by the sum total of sales price, with the largest first (Descending). This means that rank 1 is the highest amount.
) as SalesRank -- Orders your salesmen by the total sales within each year, with 1 as the best.
from Buyer b
inner join Sale s
on(b.ID = s.Buyer_ID)
inner join Salesman sm
on(sm.ID = b.Salesman_ID)
group by DATEPART(year,s.sale_date)
,sm.First_Name
) a
where SalesRank = 1 -- This means you only get the top salesman for each year.
First, never use commas in the FROM clause. Always use explicit JOIN syntax.
The answer to your question is to use window functions. If there is a tie and you wand all values, then RANK() or DENSE_RANK(). If you always want exactly one -- even if there are ties -- then ROW_NUMBER().
select ss.*
from (select year(s.sale_date) as yyyy, sm.First_Name, sum(s.price) as total_price,
row_number() over (partition by year(s.sale_date)
order by sum(s.price) desc
) as seqnum
from Salesman sm join
Sale s
on sm.ID = s.Salesman_ID
group by year(s.sale_date), sm.First_Name
) ss
where seqnum = 1;
Note that the Buyers table is unnecessary for this query.

how to match a value with SQL max(count) function?

I have a orderLine table looks like this
I would like to know which pizza is the best seller, and the quantity of pizza sold.
I've tried query:
select sum(quantity), pizza_name from order_line group by pizza_name;
it returns
which is almost what I want, But when I start adding Max function, it could not match the pizza name with the total quantity of pizza sold
For example:
select MAX(sum(quantity)), pizza_name from order_line group by pizza_name;
it returns following error:
"not a single-group group function"
I guess I could achieve this by using a sub-query, but I have no idea how to do this.
You don't need max for this. If you only want one pizza, then you can use order by and fetch first 1 row only (or something similar such as limit or top):
select sum(quantity), pizza_name
from order_line
group by pizza_name
order by sum(quantity)
fetch first 1 row only;
Or, if you want all such pizzas, use rank():
select p.*
from (select sum(quantity) as quantity, pizza_name,
rank() over (order by sum(quantity) desc) as seqnum
from order_line
group by pizza_name
) p
where seqnum = 1;
Both of the queries give the same desired result
SELECT PIZZA_NAME,
SUM(QUANTITY) "Total Quant"
FROM Order_line
GROUP BY PIZZA_NAME
ORDER BY "Total Quant" DESC
FETCH FIRST 1 row only;
SELECT PIZZA_NAME, "Total Quantity" FROM (
SELECT PIZZA_NAME,SUM(QUANTITY) "Total Quantity", RANK() OVER (ORDER BY SUM(QUANTITY) DESC) T FROM Order_line GROUP BY PIZZA_NAME
) query1 where query1.T=1 ;
You group by pizza_name to get sum(quantity) per pizza_name.
Then you aggregate again by using MAX on the quantity sum, but you don't specify which of the three pizza names to have in the result. You need an aggregate function on pizza_name as well, which you don't have. Hence the error.
If you want to use your query, you must apply the appropriate aggregation function on pizza_name, which is KEEP DENSE_RANK FIRST/LAST.
select
max(sum(quantity)),
max(pizza_name) keep (dense_rank last order by sum(quantity))
from order_line
group by pizza_name;
But on one hand Gordon's queries are more readable in my opinion. And on the other this double aggregation is Oracle specific and not SQL standard. Unexperienced readers may be confused that the query produces one result row in spite of the GROUP BY clause.

SQL: Sort Results of a Query Based on the Order of values from another Query

Suppose I have a table with 3 columns:
YEAR
COMPANY
DEPARTMENT
SALES
And I wanted to display the following:
COMPANY
DEPARTMENT
SUM(SALES)
The additional requirement being that the COMPANY with the Maximum Sales (across all departments) shows first, and the rest of the results ordered accordingly.
How would I go about writing my query?
I've tried the following, and it doesn't work:
SELECT t1.company,
cs1.deparment,
SUM(cs1.sales)
FROM company_sales cs1,
(SELECT cs2.company,
SUM(cs2.sales)
FROM company_sales cs2
WHERE cs2.company IS NOT NULL
GROUP BY cs2.company
ORDER BY 2 DESC) t1
WHERE cs1.company = t1.company
GROUP BY t1.company,
cs1.deparment;
You can do this using window functions:
select company, department, sum(sales)
from t
group by company, department
order by sum(sum(sales)) over (partition by company) desc, company;
You can also include the expression in the select clause to see the sum of sales for the entire company.
Try:
SELECT company,department,sum(sales)
FROM table GROUP BY company,department
ORDER BY Max(sales)