Improve performance of select on select query using temp table - sql

As for Table structure, the table has weekly product prices for per country.
My goal here is to select the lowest price of each product for the most recent week/year per country per product.
The query below fulfills this goal, but is pretty slow performance wise. I was wondering if there is a more efficient way of doing the same task.
In the first part Im selecting the latest Year and week of prices per country. I included the CASE When to account for new year.
Im saving this in a #temptable.
Then I am selecting the min price based on the previous selected Year, Week and Country combo.
DECLARE #date DATE SET #date=getdate()
SELECT YearNb, Max(WeekNb) AS WeekNb, ISOCountryCode INTO #TempTable FROM PriceBenchWeekly
WHERE PriceBenchWeekly.YearNb = CASE WHEN DATEPART(ww,#date) = 1 THEN
Year(#date)-1
ELSE
Year(#date)
END
GROUP BY YearNb, ISOCountryCode
SELECT ProdNb,Min(WeeklyPrice) AS MinPrice, MarketPlayerCode, 'MKT' AS PriceOriginTypeCode, NatCoCode
FROM CE.PriceBenchWeekly INNER JOIN #TempTable ON PriceBenchWeekly.YearNb = #TempTable.YearNb AND
PriceBenchWeekly.WeekNb = #TempTable.WeekNb AND PriceBenchWeekly.ISOCountryCode = #TempTable.ISOCountryCode
GROUP BY PriceBenchweekly.YearNb, PriceBenchWeekly.ISOCountryCode, BNCode, MarketPlayerCode

the table has weekly product prices for per country. My goal here is to select the lowest price of each product for the most recent week/year per country per product.
Use window functions. Without sample data and desired results, it is a little hard to figure out what you really want. But the following gets the minimum price for each product from the most recent week in the data:
select pbw.*
from (select pbw.*,
min(weeklyprice) over (partition by prodnb) as min_weeklyprice
from (select pbw.*,
dense_rank() over (order by year desc, weeknb desc) as seqnum
from CE.PriceBenchWeekly pbw
) pbw
where seqnum = 1
) pbw
where weeklyprice = min_weeklyprice;

If you want to go with temp tables, do not create it using select into, use CREATE TABLE #TempTable instead, then you can create a non clustered index for Year, Week and Country code...
Anyway, I would prefer outer apply
SELECT DISTINCT A.ProductCode, A.CountryCode, B.YearNo, B.WeekNo, B.MinPrice
FROM YourTable A
OUTER APPLY (
SELECT TOP 1 YearNo, WeekNo, Min(Price) AS MinPrice
FROM YourTable
WHERE ProductCode = A.ProductCode AND CountryCode = B.CountryCode
GROUP BY YearNo, WeekNo
ORDER BY YearNo DESC, WeekNo DESC
) B

Related

SQL - Pull distinct row based on max value

I am trying to pull the most recent sale amount for each salesperson. The salespeople have made a sale on multiple days, I only want the most recent one.
My attempt below:
SELECT salesperson, amount
FROM table
WHERE date = (SELECT MAX(date) FROM table);
Use correlated subquery :
SELECT t.salesperson, t.amount
FROM table t
WHERE t.date = (SELECT MAX(t1.date)
FROM table t1
WHERE t1.salesperson = t.salesperson -- for each salesperson
);
If you are using PostgreSQL, you can take advantage of DISTINCT ON:
SELECT DISTINCT ON (salesperson) salesperson, amount
FROM table t
ORDER BY salesperson, date DESC
This will return only one row for each salesperson. The ORDER BY clause says to return the one with the largest date for that salesperson.
Unfortunately, DISTINCT ON is not supported by other databases.

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

How do I proceed on this query

I want to know if there's a way to display more than one column on an aggregate result but without it affecting the group by.
I need to display the name alongside an aggregate result, but I have no idea what I am missing here.
This is the data I'm working with:
It is the result of the following query:
select * from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
I need to find the salesman that sold the most stuff (total price) for a specific year.
This is what I have so far:
select DATEPART(year,sale.sale_date)'year', Salesman.First_Name,sum(sale.price)
from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
group by DATEPART(year,sale.sale_date),Salesman.First_Name
This returns me the total sales made by each salesman.
How do I continue from here to get the top salesman of each year?
Maybe the query I am doing is completely wrong and there is a better way?
Any advice would be helpful.
Thanks.
This should work for you:
select *
from(
select DATEPART(year,s.sale_date) as SalesYear -- Avoid reserved words for object names
,sm.First_Name
,sum(s.price) as TotalSales
,row_number() over (partition by DATEPART(year,s.sale_date) -- Rank the data within the same year as this data row.
order by sum(s.price) desc -- Order by the sum total of sales price, with the largest first (Descending). This means that rank 1 is the highest amount.
) as SalesRank -- Orders your salesmen by the total sales within each year, with 1 as the best.
from Buyer b
inner join Sale s
on(b.ID = s.Buyer_ID)
inner join Salesman sm
on(sm.ID = b.Salesman_ID)
group by DATEPART(year,s.sale_date)
,sm.First_Name
) a
where SalesRank = 1 -- This means you only get the top salesman for each year.
First, never use commas in the FROM clause. Always use explicit JOIN syntax.
The answer to your question is to use window functions. If there is a tie and you wand all values, then RANK() or DENSE_RANK(). If you always want exactly one -- even if there are ties -- then ROW_NUMBER().
select ss.*
from (select year(s.sale_date) as yyyy, sm.First_Name, sum(s.price) as total_price,
row_number() over (partition by year(s.sale_date)
order by sum(s.price) desc
) as seqnum
from Salesman sm join
Sale s
on sm.ID = s.Salesman_ID
group by year(s.sale_date), sm.First_Name
) ss
where seqnum = 1;
Note that the Buyers table is unnecessary for this query.

Product sales by month - SQL

I just created a small data warehouse with the following details.
Fact Table
Sales
Dimensions
Supplier
Products
Time (Range is one year)
Stores
I want to query which product has the max sales by month, I mean the output to be like
Month - Product Code - Num_Of_Items
JAN xxxx xxxxx
FEB xxxx xxxxx
I tried the following query
with product_sales as(
SELECT dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num
FROM fact_sales fs
INNER JOIN dim_products dp
ON fs.p_id = dp.p_id
INNER JOIN dim_date dd
ON dd.date_id = fs.date_id
GROUP BY dd.month,
fs.p_id,
dp.title
)
select distinct month,movie_id,max(num)
from product_sales
group by movie_id,title, month;
Instead of max of 12 rows, I am having 132 records. I need guidance with this. Thanks.
There are a few things about your query that don't make sense, such as:
Where does movie_id come from?
What is from abc? Should it be from product_sales?
That said, if you need the maximum product sales by month and you need to include the product code (or movie ID or whatever), you need an analytical query. Yours would go something like this:
WITH product_sales AS (
SELECT
dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num,
RANK() OVER (PARTITION BY dd.month ORDER BY SUM(number_of_items) DESC) NumRank
FROM fact_sales fs
INNER JOIN dim_products dp ON fs.p_id = dp.p_id
INNER JOIN dim_date dd ON dd.date_id = fs.date_id
GROUP BY dd.month, fs.p_id, dp.title
)
SELECT month, p_id, title, num
FROM product_sales
WHERE NumRank = 1
Note that if there's a tie for top sales in any month, this query will show all top sales for the month. In other words, if product codes AAAA and BBBB are tied for top sales in January, the query results will have a January row for both products.
If you want just one row per month even if there's a tie, use ROW_NUMBER instead of RANK(), but note that ROW_NUMBER will arbitrarily pick a winner unless you define a tie-breaker. For example, to have the lowest p_id be the tie-breaker, define the NumRank column like this:
ROW_NUMBER() OVER (
PARTITION BY dd.month
ORDER BY SUM(number_of_items) DESC, p_id
) NumRank
you can user MAX () KEEP (DENSE_RANK FIRST ORDER BY ) to select the movie_id with the max value of num
...
select
month,
MAX(movie_id) KEEP (DENSE_RANK FIRST order by num desc) as movie_id,
MAX(num)
from
abc
group by month
;

SQL join using TOP condition in subquery

I'm writing a stock program to improve my programming skills and I've hit a roadblock.
I have two tables I'm working with for:
**stocks**
---------
id
name
symbol
ipo_year
sector
industry
**stock_trends**
----------------
stock_id
trend_id
direction_id
date
price
breakout_price
**trends**
----------
id
type
An entry is made into the stock_trends table for that stock when a condition of one of my four trends are met.
What I'm looking to do is create a query that will return all the information from the stock table and the date from the stock_trends table where the most recent entry in stock_trends for that stock is the trend_id I'm interested in looking at.
I have this query that works great which returns the most recent trend if for a single stock.
SELECT top 1 stock_id, trend_id, [timestamp], price, breakout_price from stock_trends
WHERE stock_id = #stock_id and trend_id = #trend_id order by [timestamp] desc
I just haven't been able to figure out how to write a query that returns the stocks whose top entry in the stock_trends table is the trend I wish to analyze.
Thanks in advance for your help!
Edit
So I have made some progress and I'm almost there. I'm using this query to return the max "timestamp" (it's really a date, just have to fix it) for each stock.
select s.*, v.latest_trend_date from stocks s
join(select stock_id, MAX(timestamp) as latest_trend_date from stock_trends st
group by st.stock_id) v on v.stock_id = s.id
Now if I could only find a way to determine which trend_id "latest_trend_date" is associated with I would be all set!
select stock_id
from stock_trends
where trend_id = (select top 1 trend_id
from stock_trends
order by [timestamp] desc)
This will select all the stock_id that are in the stock_trends table, with the same trend_id as the most recent entry in the stock_trends table.
See if something like this works:
with TrendsRanked as (
select
*,
rank() over (
partition by stock_id
order by [date] desc
) as daterank_by_stock
from stock_trends
)
select
s.id, s.name, s.symbol,
TrendsRanked.[date]
from stocks as S
join TrendsRanked as T
on T.stock_id = S.id
where T.daterank_by_stock = 1
and T.trend_id = #my_trend
The idea here is to add a date ranking to the stock_trends table: for a given stock, daterank_by_stock will equal 1 for the most recent stock_trends row (including ties) for that stock.
Then in the main query, the only results will be those that match the trend you're following (#my_trend) for a row in stock_trends ranked #1.
This gives what I think you want - stock information for stocks whose latest stock_trends entry happens to be an entry for the trend you're following.
I'm assuming that [timestamp] in your original query is "date" from your table model.
select s.*, st1.trend_id, st1.timestamp
from stocks as s
inner join (
select top 1 stock_id, trend_id, [timestamp] as timestamp
from stock_trends
where stock_id = #stock_id and trend_id = #trend_id
order by [timestamp] desc
) as st1
on s.id = st1.stock_id
Putting the sub-query in as a joined in-line view will allow you easily to put the date into the results as you were looking to do in your specification.
"... return all the information from the stock table and the date from the stock_trends table ..."