SQL: Take 1 value per grouping - sql

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?

I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;

The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date

You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins

My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

Related

SQL How to select customers with highest transaction amount by state

I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.
Table schemas
customers
|_state
|_customer_id
|_customer_name
transactions
|_customer_id
|_transact_amt
Attempts look something like this
SELECT state, Sum(transact_amt) AS HighestSum
FROM (
SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
FROM customers
INNER JOIN customers ON transactions.customer_id = customers.customer_id
GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum
I'm lost. Thank you.
Expected results are the names of customers with the top 5 highest transactions in each state.
ERROR: table name "customers" specified more than once
SQL state: 42712
First, you need for your JOIN to be correct. Second, you want to use window functions:
SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
FROM customers c JOIN
transaactions t
ON t.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.state
) ct
WHERE seqnum <= 5;
You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.
You can get them using aggregation and then by using the RANK() window function. For example:
select
state,
rk,
customer_name
from (
select
*,
rank() over(partition by state order by total desc) as rk
from (
select
c.customer_id,
c.customer_name,
c.state,
sum(t.transact_amt) as total
from customers c
join transactions t on t.customer_id = c.customer_id
group by c.customer_id
) x
) y
where rk <= 5
order by state, rk
There are two valid answers already. Here's a third:
SELECT *
FROM (
SELECT c.state, c.customer_name, t.*
, row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
FROM (
SELECT customer_id, sum(transact_amt) AS transact_sum
FROM transactions
GROUP BY customer_id
) t
JOIN customers c USING (customer_id)
) sub
WHERE rn < 6
ORDER BY state, rn;
Major points
When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:
Why does the following join increase the query time significantly?
Two SQL LEFT JOINS produce incorrect result
Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)
PostgreSQL sort by datetime asc, null first?

Improve performance of select on select query using temp table

As for Table structure, the table has weekly product prices for per country.
My goal here is to select the lowest price of each product for the most recent week/year per country per product.
The query below fulfills this goal, but is pretty slow performance wise. I was wondering if there is a more efficient way of doing the same task.
In the first part Im selecting the latest Year and week of prices per country. I included the CASE When to account for new year.
Im saving this in a #temptable.
Then I am selecting the min price based on the previous selected Year, Week and Country combo.
DECLARE #date DATE SET #date=getdate()
SELECT YearNb, Max(WeekNb) AS WeekNb, ISOCountryCode INTO #TempTable FROM PriceBenchWeekly
WHERE PriceBenchWeekly.YearNb = CASE WHEN DATEPART(ww,#date) = 1 THEN
Year(#date)-1
ELSE
Year(#date)
END
GROUP BY YearNb, ISOCountryCode
SELECT ProdNb,Min(WeeklyPrice) AS MinPrice, MarketPlayerCode, 'MKT' AS PriceOriginTypeCode, NatCoCode
FROM CE.PriceBenchWeekly INNER JOIN #TempTable ON PriceBenchWeekly.YearNb = #TempTable.YearNb AND
PriceBenchWeekly.WeekNb = #TempTable.WeekNb AND PriceBenchWeekly.ISOCountryCode = #TempTable.ISOCountryCode
GROUP BY PriceBenchweekly.YearNb, PriceBenchWeekly.ISOCountryCode, BNCode, MarketPlayerCode
the table has weekly product prices for per country. My goal here is to select the lowest price of each product for the most recent week/year per country per product.
Use window functions. Without sample data and desired results, it is a little hard to figure out what you really want. But the following gets the minimum price for each product from the most recent week in the data:
select pbw.*
from (select pbw.*,
min(weeklyprice) over (partition by prodnb) as min_weeklyprice
from (select pbw.*,
dense_rank() over (order by year desc, weeknb desc) as seqnum
from CE.PriceBenchWeekly pbw
) pbw
where seqnum = 1
) pbw
where weeklyprice = min_weeklyprice;
If you want to go with temp tables, do not create it using select into, use CREATE TABLE #TempTable instead, then you can create a non clustered index for Year, Week and Country code...
Anyway, I would prefer outer apply
SELECT DISTINCT A.ProductCode, A.CountryCode, B.YearNo, B.WeekNo, B.MinPrice
FROM YourTable A
OUTER APPLY (
SELECT TOP 1 YearNo, WeekNo, Min(Price) AS MinPrice
FROM YourTable
WHERE ProductCode = A.ProductCode AND CountryCode = B.CountryCode
GROUP BY YearNo, WeekNo
ORDER BY YearNo DESC, WeekNo DESC
) B

How do I proceed on this query

I want to know if there's a way to display more than one column on an aggregate result but without it affecting the group by.
I need to display the name alongside an aggregate result, but I have no idea what I am missing here.
This is the data I'm working with:
It is the result of the following query:
select * from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
I need to find the salesman that sold the most stuff (total price) for a specific year.
This is what I have so far:
select DATEPART(year,sale.sale_date)'year', Salesman.First_Name,sum(sale.price)
from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
group by DATEPART(year,sale.sale_date),Salesman.First_Name
This returns me the total sales made by each salesman.
How do I continue from here to get the top salesman of each year?
Maybe the query I am doing is completely wrong and there is a better way?
Any advice would be helpful.
Thanks.
This should work for you:
select *
from(
select DATEPART(year,s.sale_date) as SalesYear -- Avoid reserved words for object names
,sm.First_Name
,sum(s.price) as TotalSales
,row_number() over (partition by DATEPART(year,s.sale_date) -- Rank the data within the same year as this data row.
order by sum(s.price) desc -- Order by the sum total of sales price, with the largest first (Descending). This means that rank 1 is the highest amount.
) as SalesRank -- Orders your salesmen by the total sales within each year, with 1 as the best.
from Buyer b
inner join Sale s
on(b.ID = s.Buyer_ID)
inner join Salesman sm
on(sm.ID = b.Salesman_ID)
group by DATEPART(year,s.sale_date)
,sm.First_Name
) a
where SalesRank = 1 -- This means you only get the top salesman for each year.
First, never use commas in the FROM clause. Always use explicit JOIN syntax.
The answer to your question is to use window functions. If there is a tie and you wand all values, then RANK() or DENSE_RANK(). If you always want exactly one -- even if there are ties -- then ROW_NUMBER().
select ss.*
from (select year(s.sale_date) as yyyy, sm.First_Name, sum(s.price) as total_price,
row_number() over (partition by year(s.sale_date)
order by sum(s.price) desc
) as seqnum
from Salesman sm join
Sale s
on sm.ID = s.Salesman_ID
group by year(s.sale_date), sm.First_Name
) ss
where seqnum = 1;
Note that the Buyers table is unnecessary for this query.

Is it possible to calculate the sum of each group in a table without using group by clause

I am trying to find out if there is any way to aggregate a sales for each product. I realise I can achieve it either by using group-by clause or by writing a procedure.
example:
Table name: Details
Sales Product
10 a
20 a
4 b
12 b
3 b
5 c
Is there a way possible to perform the following query with out using group by query
select
product,
sum(sales)
from
Details
group by
product
having
sum(sales) > 20
I realize it is possible using Procedure, could it be done in any other way?
You could do
SELECT product,
(SELECT SUM(sales) FROM details x where x.product = a.product) sales
from Details a;
(and wrap it into another select to simulate the HAVING).
It's possible to use analytic functions to do the sum calculation, and then wrap that with another query to do your filtering.
See and play with the example here.
select
running_sum,
OwnerUserId
from (
select
id,
score,
OwnerUserId,
sum(score) over (partition by OwnerUserId order by Id) running_sum,
last_value(id) over (partition by OwnerUserId order by OwnerUserId) last_id
from
Posts
where
OwnerUserId in (2934433, 10583)
) inner_q
where inner_q.id = inner_q.last_id
--and running_sum > 20;
We keep a running sum going on the partition of the owner (product), and we tally up the last id for the same window, which is the ID we'll use to get the total sum. Wrap it all up with another query to make sure you get the "last id", take the sum, and then do any filtering you want on the result.
This is an extremely round-about way to avoid using GROUP BY though.
If you don't want nested select statements (run slower), use CASE:
select
sum(case
when c.qty > 20
then c.qty
else 0
end) as mySum
from Sales.CustOrders c

Write an Oracle query to get top 10 products for top 5000 stores [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Get top 10 products for every category
I am looking for an Oracle query to get top 5000 stores and for each store get top 10 products and for each top 10 products get top 5 sub-products. So In total I should get 5000*10*5 rows.
Can someone help me get this using Oracle's analytical functions.
My current query looks like
SELECT
store,
product,
sub-product,
count(*) as sales
FROM stores_data
GROUP BY store, product, sub-product;
Please assume table names as stores_data with columns store_id , product,sub_product
You should use dense_rank to get the top N rows.
Something like
SELECT
storeid,
store,
productid,
product,
subproductid,
subproduct
FROM
(
SELECT
s.storeid,
s.store,
p.productid,
p.product,
sp.subproductid,
sp.subproduct,
dense_rank() over ( order by s.storeid) as storerank,
dense_rank() over ( partition by s.storeid
order by p.productid) as productrank
dense_rank() over ( partition by s.storeid, p.productid
order by sp.subproductid) as productrank
FROM
stores s
INNER JOIN products p on p.storeid = s.storeid
INNER JOIN subproduct sp on sp.productid = p.productid
) t
WHERE
t.storerank <= 5000 and
t.productrank < 10 and
t.subproductrank < 5
Of course, I don't now your tables nor the relation between them. And the actual fields and conditions you want to check for, so this is just a simple query getting the top N records based on their id. Also, this query expects a product to have only one store which might not be the case.. At least it will show you how to use dense_rank to get a three-layered sorting/filtering.
I'll leave the other answer because that looks more like how such a table structure should be, I think.
But you described in your other thread to have a table that looks like this:
create table store_data (
store varchar2(40),
product varchar2(40),
subproduct varchar2(40),
sales int);
That actually looks like data that is aggregated already and that you do now want to analyze again. You query could look like this. It first aggregates the sum of the sales, so you can order shops and products by sales too (the sales in the table seem to be for the subproducts. After that, you can add ranks to the shops and products by sales. I added a rank to the subproducts too. I used rank here, so there is a gap in the numbering when more records have the same sales. This way, when you got 8 records with a rank of 1, because they all have the same sales, the 6th record will actually have rank 9 instead of 2, so you will only select the 8 top stores (you wanted 5, but why skip the other 3 if they actually sold exactly the same) and not 4 others too.
select
ts.*
from
(
select
ss.*,
rank() over (order by storesales) as storerank,
rank() over (partition by store order by productsales) as productrank,
rank() over (partition by store, product order by subproductsales) as subproductrank
from
(
select
sd.*,
sum(sales) over (partition by store) as STORESALES,
sum(sales) over (partition by store, product) as PRODUCTSALES,
sum(sales) over (partition by store, product, subproduct) as SUBPRODUCTSALES
from
store_data sd
) ss
) ts
where
ts.storerank <= 2 and
ts.productrank <= 3 and
ts.subproductrank <= 4