Product sales by month - SQL - sql

I just created a small data warehouse with the following details.
Fact Table
Sales
Dimensions
Supplier
Products
Time (Range is one year)
Stores
I want to query which product has the max sales by month, I mean the output to be like
Month - Product Code - Num_Of_Items
JAN xxxx xxxxx
FEB xxxx xxxxx
I tried the following query
with product_sales as(
SELECT dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num
FROM fact_sales fs
INNER JOIN dim_products dp
ON fs.p_id = dp.p_id
INNER JOIN dim_date dd
ON dd.date_id = fs.date_id
GROUP BY dd.month,
fs.p_id,
dp.title
)
select distinct month,movie_id,max(num)
from product_sales
group by movie_id,title, month;
Instead of max of 12 rows, I am having 132 records. I need guidance with this. Thanks.

There are a few things about your query that don't make sense, such as:
Where does movie_id come from?
What is from abc? Should it be from product_sales?
That said, if you need the maximum product sales by month and you need to include the product code (or movie ID or whatever), you need an analytical query. Yours would go something like this:
WITH product_sales AS (
SELECT
dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num,
RANK() OVER (PARTITION BY dd.month ORDER BY SUM(number_of_items) DESC) NumRank
FROM fact_sales fs
INNER JOIN dim_products dp ON fs.p_id = dp.p_id
INNER JOIN dim_date dd ON dd.date_id = fs.date_id
GROUP BY dd.month, fs.p_id, dp.title
)
SELECT month, p_id, title, num
FROM product_sales
WHERE NumRank = 1
Note that if there's a tie for top sales in any month, this query will show all top sales for the month. In other words, if product codes AAAA and BBBB are tied for top sales in January, the query results will have a January row for both products.
If you want just one row per month even if there's a tie, use ROW_NUMBER instead of RANK(), but note that ROW_NUMBER will arbitrarily pick a winner unless you define a tie-breaker. For example, to have the lowest p_id be the tie-breaker, define the NumRank column like this:
ROW_NUMBER() OVER (
PARTITION BY dd.month
ORDER BY SUM(number_of_items) DESC, p_id
) NumRank

you can user MAX () KEEP (DENSE_RANK FIRST ORDER BY ) to select the movie_id with the max value of num
...
select
month,
MAX(movie_id) KEEP (DENSE_RANK FIRST order by num desc) as movie_id,
MAX(num)
from
abc
group by month
;

Related

how to show for every user the last two payement date and sum of mounts?

i have 2 tables where from i'm trying to extract from table 1 the last 2 taxe dates per user who were taxed for the last time on the 19/06/2022 and with product id 12 in table 2, and the sum amount of taxes, as well as the time range between the two last taxe dates as mentionned in the image bellow .
First step is to add a RANK() or ROW_NUMBER() to order the payments backwards, by id so you're only looking at the 2 last payments. Like this.
The next step is to aggregate those to get min and max dates, and sum of amount. Like this.
Lastly, you calculate the difference between min and max dates. Like this.
WITH LAST_TWO AS (
SELECT *,ROW_NUMBER() OVER(PARTITION BY id ORDER BY tax_date DESC) AS time_ago
FROM table1
QUALIFY time_ago <= 2
),
AGG AS (
SELECT
id,
MIN(tax_date) as tax_date_MIN,
MAX(tax_date) as tax_date_MAX,
SUM(amount) as amount_SUM
FROM LAST_TWO
GROUP BY id
)
SELECT id, amount_SUM, DATEDIFF(day, tax_date_MIN, tax_date_MAX) as DATE_RANGE
FROM AGG
INNER JOIN table2 ON AGG.id = table2.id
WHERE table2.product_id = 12;

Improve performance of select on select query using temp table

As for Table structure, the table has weekly product prices for per country.
My goal here is to select the lowest price of each product for the most recent week/year per country per product.
The query below fulfills this goal, but is pretty slow performance wise. I was wondering if there is a more efficient way of doing the same task.
In the first part Im selecting the latest Year and week of prices per country. I included the CASE When to account for new year.
Im saving this in a #temptable.
Then I am selecting the min price based on the previous selected Year, Week and Country combo.
DECLARE #date DATE SET #date=getdate()
SELECT YearNb, Max(WeekNb) AS WeekNb, ISOCountryCode INTO #TempTable FROM PriceBenchWeekly
WHERE PriceBenchWeekly.YearNb = CASE WHEN DATEPART(ww,#date) = 1 THEN
Year(#date)-1
ELSE
Year(#date)
END
GROUP BY YearNb, ISOCountryCode
SELECT ProdNb,Min(WeeklyPrice) AS MinPrice, MarketPlayerCode, 'MKT' AS PriceOriginTypeCode, NatCoCode
FROM CE.PriceBenchWeekly INNER JOIN #TempTable ON PriceBenchWeekly.YearNb = #TempTable.YearNb AND
PriceBenchWeekly.WeekNb = #TempTable.WeekNb AND PriceBenchWeekly.ISOCountryCode = #TempTable.ISOCountryCode
GROUP BY PriceBenchweekly.YearNb, PriceBenchWeekly.ISOCountryCode, BNCode, MarketPlayerCode
the table has weekly product prices for per country. My goal here is to select the lowest price of each product for the most recent week/year per country per product.
Use window functions. Without sample data and desired results, it is a little hard to figure out what you really want. But the following gets the minimum price for each product from the most recent week in the data:
select pbw.*
from (select pbw.*,
min(weeklyprice) over (partition by prodnb) as min_weeklyprice
from (select pbw.*,
dense_rank() over (order by year desc, weeknb desc) as seqnum
from CE.PriceBenchWeekly pbw
) pbw
where seqnum = 1
) pbw
where weeklyprice = min_weeklyprice;
If you want to go with temp tables, do not create it using select into, use CREATE TABLE #TempTable instead, then you can create a non clustered index for Year, Week and Country code...
Anyway, I would prefer outer apply
SELECT DISTINCT A.ProductCode, A.CountryCode, B.YearNo, B.WeekNo, B.MinPrice
FROM YourTable A
OUTER APPLY (
SELECT TOP 1 YearNo, WeekNo, Min(Price) AS MinPrice
FROM YourTable
WHERE ProductCode = A.ProductCode AND CountryCode = B.CountryCode
GROUP BY YearNo, WeekNo
ORDER BY YearNo DESC, WeekNo DESC
) B

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

Show duplicate rows(all columns of that row) where all columns are duplicate except one column

In below table, I need to select duplicate records where all columns are duplicate except Customer Type and Price for a particular week.
For e.g
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
2 John Motor Consumer 200
3 John Motor Consumer 200
3 John Motor Reseller 201
I am using below query but this query doesn't show me both costumer type, it just shows me consumer count(*) for a combination.
select Week, Customer, product, count(distinct Customer Type)
from table
group by Week, Customer, product
having count(distinct Customer Type) > 1
I would like to see below result, that shows me duplicate values and not just the count(*) of duplicate row. I am trying to see customers assigned to multiple customer types in a particular week for a product and at the same time show me all columns. It doesn't matter if the price is different.
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
3 John Motor Consumer 200
3 John Motor Reseller 201
Thanks
Shaki
WITH CustomerDistribution_CTE (WeekC ,CustomerC, ProductC)
AS
(
select Week, Customer, product
from Your_Table_Name group by Week, Customer,
product having count(distinct CustomerType) > 1
)
SELECT Y.*
FROM CustomerDistribution_CTE C
inner join Your_Table_Name Y on C.WeekC =Y.Week
and C.CustomerC =Y.Customer and C.productC =Y.product
Note :Please replace "Your_Table_Name" with exact table name and Try.
One way to achieve this, using generic SQL, is to use a "derived table" like this:
select x.*
from tablex x
inner join (
select Week, Customer, Product
from tablex
group by Week, Customer, Product
having count(*) > 1
) d on x.Week = d.Week and x.Customer = d.Customer and x.Product = d.Product
You can do that by using DISTINCT like
select DISTINCT Customer,Product,Customer_Type,Price from Your_Table_Name
will look for DISTINCT combination.
Note: This query if of SQL Server
From the expected result that you have pasted, it looks like you are not concerned about the week.
If you have a ID (incremental PK), it would be much simpler like below
select * from table where ID not in
(select max(ID) from table group by Customer, Product, CustomerType having count(*) > 1 )
This is tested on MySQL. Do you have a ID column?
In case you don't have a ID column, try the below:
select max(week) week, Customer, Product, CustomerType, max(price) from device group by Customer, Product, CustomerType;
I have not verified this one.
This will return your expected result set:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify
count(*)
over (partition by Week, Customer, product) > 1
For other DBMSes you will need to nest your query:
select *
from
(
select ...,
count(*)
over (partition by Week, Customer, product) as cnt
from table
) as dt
where cnt > 1
Edit:
After re-reading your description above Select might be not exactly what you want, because it will also return rows with a single type. Then switch to:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify -- at least two different types:
min(Customer_Type) over (partition by Week, Customer, product)
<> max(Customer_Type) over (partition by Week, Customer, product)

How do I proceed on this query

I want to know if there's a way to display more than one column on an aggregate result but without it affecting the group by.
I need to display the name alongside an aggregate result, but I have no idea what I am missing here.
This is the data I'm working with:
It is the result of the following query:
select * from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
I need to find the salesman that sold the most stuff (total price) for a specific year.
This is what I have so far:
select DATEPART(year,sale.sale_date)'year', Salesman.First_Name,sum(sale.price)
from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
group by DATEPART(year,sale.sale_date),Salesman.First_Name
This returns me the total sales made by each salesman.
How do I continue from here to get the top salesman of each year?
Maybe the query I am doing is completely wrong and there is a better way?
Any advice would be helpful.
Thanks.
This should work for you:
select *
from(
select DATEPART(year,s.sale_date) as SalesYear -- Avoid reserved words for object names
,sm.First_Name
,sum(s.price) as TotalSales
,row_number() over (partition by DATEPART(year,s.sale_date) -- Rank the data within the same year as this data row.
order by sum(s.price) desc -- Order by the sum total of sales price, with the largest first (Descending). This means that rank 1 is the highest amount.
) as SalesRank -- Orders your salesmen by the total sales within each year, with 1 as the best.
from Buyer b
inner join Sale s
on(b.ID = s.Buyer_ID)
inner join Salesman sm
on(sm.ID = b.Salesman_ID)
group by DATEPART(year,s.sale_date)
,sm.First_Name
) a
where SalesRank = 1 -- This means you only get the top salesman for each year.
First, never use commas in the FROM clause. Always use explicit JOIN syntax.
The answer to your question is to use window functions. If there is a tie and you wand all values, then RANK() or DENSE_RANK(). If you always want exactly one -- even if there are ties -- then ROW_NUMBER().
select ss.*
from (select year(s.sale_date) as yyyy, sm.First_Name, sum(s.price) as total_price,
row_number() over (partition by year(s.sale_date)
order by sum(s.price) desc
) as seqnum
from Salesman sm join
Sale s
on sm.ID = s.Salesman_ID
group by year(s.sale_date), sm.First_Name
) ss
where seqnum = 1;
Note that the Buyers table is unnecessary for this query.