MAX of SUMs from GROUP BY of JOIN - sql

I'm stuck.
I have two tables:
First, [PurchasedItemsByCustomer] with the columns:
[CustID] INT NULL,
[ItemId] INT NULL,
[Quantity] INT NULL,
[OnDate] DATE NULL
Second, Table [Items] with the columns:
[ItemId] INT NULL,
[Price] FLOAT NULL,
[CategoryId] INT NULL
I need to output a list with 3 columns:
a month
the category which sold the most (in items quantity) in that month
how many items from that category were purchased in that month.
Thank you

I think you can use a query like this:
;With SoldPerMonth as (
select datepart(month, p.onDate) [Month], i.CategoryId [Category], sum(p.Quntity) [Count]
from PurchasedItemsByCustomer p
join Items i on p.ItemId = i.ItemId
group by datepart(month, p.onDate), i.CategoryId
), SoldPerMonthRanked as (
select *, rank() over (partition by [Month] order by [Count] desc) rnk
from SoldPerMonth
)
select [Month], [Category], [Count]
from SoldPerMonthRanked
where rnk = 1;
SQL Server Demo
Note: In above query by using rank() will provide all max categories if you want to return just one row use row_number() instead.

Divide et Impera:
with dept_sales as(
select month(ondate) as month, year(ondate) as year, category, count(*) as N -- measure sales for each month and category
from purchase join items using itemid
group by year(ondate), month(ondate), category)
select top 1 * --pick the highest
from dept_sales
where year = year(current_timestamp) -- I imagine you need data only for current year
order by N desc --order by N asc if you want the least selling category
If you don't group by year you'll get january of all the years in the same 'january' entry, so I added a filter on current year.
I used CTE for code clarity to split the phases of calculation, you can nest them if you want to.

Here you go,
SELECT
A.[CategoryId],
A.[Month],
A.[CategoryMonthCount]
FROM
(
SELECT
A.[CategoryId],
A.[Month],
A.[CategoryMonthCount],
RANK() OVER(
PARTITION BY A.[Month]
ORDER BY A.[CategoryMonthCount] DESC) [RN]
FROM
(
SELECT
I.[CategoryId],
MONTH(PIBC.[OnDate]) [Month],
SUM(PIBC.[Quantity]) [CategoryMonthCount]
FROM
[dbo].[PurchasedItemsByCustomer] PIBC
JOIN
[dbo].[Items] I
GROUP BY
I.[CategoryId],
MONTH(PIBC.[OnDate])
) A
) A
WHERE
A.[RN] = 1;

Related

SQL - Return count of consecutive days where value was unchanged

I have a table like
date
ticker
Action
'2022-03-01'
AAPL
BUY
'2022-03-02'
AAPL
SELL.
'2022-03-03'
AAPL
BUY.
'2022-03-01'
CMG
SELL.
'2022-03-02'
CMG
HOLD.
'2022-03-03'
CMG
HOLD.
'2022-03-01'
GPS
SELL.
'2022-03-02'
GPS
SELL.
'2022-03-03'
GPS
SELL.
I want to do a group by ticker then count all the times that Actions have sequentially been the value that they are as of the last date, here it's 2022-03-03. ie for this example table it'd be like;
ticker
NumSequentialDaysAction
AAPL
0
CMG
1
GPS
2
Fine to pass in 2022-03-03 as a value, don't need to figure that out on the fly.
Tried something like this
---Table Creation---
CREATE TABLE UserTable
([Date] DATETIME2, [Ticker] varchar(5), [Action] varchar(5))
;
INSERT INTO UserTable
([Date], [Ticker], [Action])
VALUES
('2022-03-01' , 'AAPL' , 'BUY'),
('2022-03-02' , 'AAPL' , 'SELL'),
('2022-03-03' , 'AAPL' , 'BUY'),
('2022-03-01' , 'CMG' , 'SELL'),
('2022-03-02' , 'CMG' , 'HOLD'),
('2022-03-03' , 'CMG' , 'HOLD'),
('2022-03-01' , 'GPS' , 'SELL'),
('2022-03-02' , 'GPS' , 'SELL'),
('2022-03-03' , 'GPS' , 'SELL')
;
---Attempted Solution---
I'm thinking that I need to do a sub query to get the last value and join on itself to get the matching values. Then apply a window function, ordered by date to see that the proceeding value is sequential.
WITH CTE AS (SELECT Date, Ticker, Action,
ROW_NUMBER() OVER (PARTITION BY Ticker, Action ORDER BY Date) as row_num
FROM UserTable)
SELECT Ticker, COUNT(DISTINCT Date) as count_of_days
FROM CTE
WHERE row_num = 1
GROUP BY Ticker;
WITH CTE AS (SELECT Date, Ticker, Action,
DENSE_RANK() OVER (PARTITION BY Ticker ORDER BY Action,Date) as rank
FROM table)
SELECT Ticker, COUNT(DISTINCT Date) as count_of_days
FROM CTE
WHERE rank = 1
GROUP BY Ticker;
You can do this with the help of the LEAD function like so. You didn't specify which RDBMS you're using. This solution works in PostgreSQL:
WITH "withSequential" AS (
SELECT
ticker,
(LEAD("Action") OVER (PARTITION BY ticker ORDER BY date ASC) = "Action") AS "nextDayIsSameAction"
FROM UserTable
)
SELECT
ticker,
SUM(
CASE
WHEN "nextDayIsSameAction" IS TRUE THEN 1
ELSE 0
END
) AS "NumSequentialDaysAction"
FROM "withSequential"
GROUP BY ticker
Here is a way to do this using gaps and islands solution.
Thanks for sharing the create and insert scripts, which helps to build the solution quickly.
dbfiddle link.
https://dbfiddle.uk/rZLDTrNR
with data
as (
select date
,ticker
,action
,case when lag(action) over(partition by ticker order by date) <> action then
1
else 0
end as marker
from usertable
)
,interim_data
as (
select *
,sum(marker) over(partition by ticker order by date) as grp_val
from data
)
,interim_data2
as (
select *
,count(*) over(partition by ticker,grp_val) as NumSequentialDaysAction
from interim_data
)
select ticker,NumSequentialDaysAction
from interim_data2
where date='2022-03-03'
Another option, you could use the difference between two row_numbers approach as the following:
select [Ticker], count(*)-1 NumSequentialDaysAction -- you could use (distinct) to remove duplicate rows
from
(
select *,
row_number() over (partition by [Ticker] order by [Date]) -
row_number() over (partition by [Ticker], [Action] order by [Date]) grp
from UserTable
where [date] <= '2022-03-03'
) RN_Groups
/* get only rows where [Action] = last date [Action] */
where [Action] = (select top 1 [Action] from UserTable T
where T.[Ticker] = RN_Groups.[Ticker] and [date] <= '2022-03-03'
order by [Date] desc)
group by [Ticker], [Action], grp
See demo

Taking most recent values in sum over date range

I have a table which has the following columns: DeskID *, ProductID *, Date *, Amount (where the columns marked with * make the primary key). The products in use vary over time, as represented in the image below.
Table format on the left, and a (hopefully) intuitive representation of the data on the right for one desk
The objective is to have the sum of the latest amounts of products by desk and date, including products which are no longer in use, over a date range.
e.g. using the data above the desired table is:
So on the 1st Jan, the sum is 1 of Product A
On the 2nd Jan, the sum is 2 of A and 5 of B, so 7
On the 4th Jan, the sum is 1 of A (out of use, so take the value from the 3rd), 5 of B, and 2 of C, so 8 in total
etc.
I have tried using a partition on the desk and product ordered by date to get the most recent value and turned the following code into a function (Function1 below) with #date Date parameter
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum' from (
select #date 'Date', t.DeskID, t.ProductID, t.Amount
, row_number() over (partition by t.DeskID, t.ProductID order by t.Date desc) as roworder
from Table1 t
where 1 = 1
and t.Date <= #date
) t
where t.roworder = 1
group by t.DeskID
And then using a utility calendar table and cross apply to get the required values over a time range, as below
select * from Calendar c
cross apply Function1(c.CalendarDate)
where c.CalendarDate >= '20190101' and c.CalendarDate <= '20191009'
This has the expected results, but is far too slow. Currently each desk uses around 50 products, and the products roll every month, so after just 5 years each desk has a history of ~3000 products, which causes the whole thing to grind to a halt. (Roughly 30 seconds for a range of a single month)
Is there a better approach?
Change your function to the following should be faster:
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum'
FROM (SELECT m.DeskID, m.ProductID, MAX(m.[Date) AS MaxDate
FROM Table1 m
where m.[Date] <= #date) d
INNER JOIN Table1 t
ON d.DeskID=t.DeskID
AND d.ProductID=t.ProductID
and t.[Date] = d.MaxDate
group by t.DeskID
The performance of TVF usually suffers. The following removes the TVF completely:
-- DROP TABLE Table1;
CREATE TABLE Table1 (DeskID int not null, ProductID nvarchar(32) not null, [Date] Date not null, Amount int not null, PRIMARY KEY ([Date],DeskID,ProductID));
INSERT Table1(DeskID,ProductID,[Date],Amount)
VALUES (1,'A','2019-01-01',1),(1,'A','2019-01-02',2),(1,'B','2019-01-02',5),(1,'A','2019-01-03',1)
,(1,'B','2019-01-03',4),(1,'C','2019-01-03',3),(1,'B','2019-01-04',5),(1,'C','2019-01-04',2),(1,'C','2019-01-05',2)
GO
DECLARE #StartDate date=N'2019-01-01';
DECLARE #EndDate date=N'2019-01-05';
;WITH cte_p
AS
(
SELECT DISTINCT DeskID,ProductID
FROM Table1
WHERE [Date] <= #EndDate
),
cte_a
AS
(
SELECT #StartDate AS [Date], p.DeskID, p.ProductID, ISNULL(a.Amount,0) AS Amount
FROM (
SELECT t.DeskID, t.ProductID
, MAX(t.Date) AS FirstDate
FROM Table1 t
WHERE t.Date <= #StartDate
GROUP BY t.DeskID, t.ProductID) f
INNER JOIN Table1 a
ON f.DeskID=a.DeskID
AND f.ProductID=a.ProductID
AND f.[FirstDate]=a.[Date]
RIGHT JOIN cte_p p
ON p.DeskID=a.DeskID
AND p.ProductID=a.ProductID
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], t.DeskID, t.ProductID, t.Amount
FROM Table1 t
INNER JOIN cte_a a
ON t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date])
WHERE a.[Date]<#EndDate
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], a.DeskID, a.ProductID, a.Amount
FROM cte_a a
WHERE NOT EXISTS(SELECT 1 FROM Table1 t
WHERE t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date]))
AND a.[Date]<#EndDate
)
SELECT [Date], DeskID, SUM(Amount)
FROM cte_a
GROUP BY [Date], DeskID;

Incremental count of duplicates

The following query displays duplicates in a table with the qty alias showing the total count, eg if there are five duplicates then all five will have the same qty = 5.
select s.*, t.*
from [Migrate].[dbo].[Table1] s
join (
select [date] as d1, [product] as h1, count(*) as qty
from [Migrate].[dbo].[Table1]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[d1] and s.[product] = t.[h1]
ORDER BY s.[product], s.[date], s.[id]
Is it possible to amend the count(*) as qty to show an incremental count so that five duplicates would display 1,2,3,4,5?
The answer to your question is row_number(). How you use it is rather unclear, because you provide no guidance, such as sample data or desired results. Hence this answer is rather general:
select s.*, t.*,
row_number() over (partition by s.product order by s.date) as seqnum
from [Migrate].[dbo].[Table1] s join
(select [date] as d1, [product] as h1, count(*) as qty
from [Migrate].[dbo].[Table1]
group by [date], [product]
having count(*) > 1
) t
on s.[date] = t.[d1] and s.[product] = t.[h1]
order by s.[product], s.[date], s.[id];
The speculation is that the duplicates are by product. This enumerates them by date. Some combination of the partition by and group by is almost certainly what you need.

TSQL Row_Number

This question has been covered similarly before BUT I'm struggling.
I need to find top N sales based on customer buying patterns..
ideally this needs to be top N by customer by Month Period by Year but for now i'm just looking at top N over the whole DB.
My query looks like:
-- QUERY TO SHOW TOP 2 CUSTOMER INVOICES BY CUSTOMER BY MONTH
SELECT
bill_to_code,
INVOICE_NUMBER,
SUM( INVOICE_AMOUNT_CORP ) AS 'SALES',
ROW_NUMBER() OVER ( PARTITION BY bill_to_code ORDER BY SUM( INVOICE_AMOUNT_CORP ) DESC ) AS 'Row'
FROM
FACT_OM_INVOICE
JOIN dim_customer_bill_to ON FACT_OM_INVOICE.dim_customer_bill_to_key = dim_customer_bill_to.dim_customer_bill_to_key
--WHERE
-- 'ROW' < 2
GROUP BY
invoice_number,
Dim_customer_bill_to.bill_to_code
I can't understand the solutions given to restrict Row to =< N.
Please help.
Try this.
-- QUERY TO SHOW TOP 2 CUSTOMER INVOICES BY CUSTOMER BY MONTH
;WITH Top2Customers
AS
(
SELECT
bill_to_code,
INVOICE_NUMBER,
SUM( INVOICE_AMOUNT_CORP ) AS 'SALES',
ROW_NUMBER() OVER ( PARTITION BY bill_to_code ORDER BY SUM( INVOICE_AMOUNT_CORP ) DESC )
AS 'RowNumber'
FROM
FACT_OM_INVOICE
JOIN dim_customer_bill_to ON FACT_OM_INVOICE.dim_customer_bill_to_key = dim_customer_bill_to.dim_customer_bill_to_key
GROUP BY
invoice_number,
Dim_customer_bill_to.bill_to_code
)
SELECT * FROM Top2Customers WHERE RowNumber < 3
You have to wrap your select into another to use the value produced by row_number()
select * from (
SELECT
bill_to_code,
INVOICE_NUMBER,
SUM( INVOICE_AMOUNT_CORP ) AS SALES,
ROW_NUMBER() OVER ( PARTITION BY bill_to_code ORDER BY SUM( INVOICE_AMOUNT_CORP ) DESC ) AS RowNo
FROM
FACT_OM_INVOICE
JOIN dim_customer_bill_to ON FACT_OM_INVOICE.dim_customer_bill_to_key = dim_customer_bill_to.dim_customer_bill_to_key
--WHERE
-- 'ROW' < 2
GROUP BY
invoice_number,
Dim_customer_bill_to.bill_to_code
) base where RowNo < 2

Return max value from a SQL selection

I do have a table license_Usage where which works like a log of the usage of licenses in a day
ID User license date
1 1 A 22/2/2015
2 1 A 23/2/2015
3 1 B 22/2/2015
4 2 A 22/2/2015
Where I want to Count how many licenses per user in a day, the result shoul look like:
QuantityOfLicenses User date
2 1 22/2/2015
1 2 22/2/2015
For that I did the following query :
select count(license) as [Quantity of licenses],[user],[date]
From license_Usage
where date = '22/2/2015'
Group by [date], [user]
which works, but know I want to know which user have used the most number of licenses, for that I did the following query:
select MAX(result.[Quantity of licenses])
From (
select count(license) as [Quantity of licenses],[user],[date]
From license_Usage
Group by [date], [user]
) as result
And it returns the max value of 2, but when I want to know which user have used 2 licenses,I try this query with no success :
select result.user, MAX(result.[Quantity of licenses])
From (
select count(license) as [Quantity of licenses],[user],[date]
From license_Usage
Group by [date], [user]
) as result
Group by result.user
You can use something like this:
select top 1 *
From (
select count(license) as Quantity,[user],[date]
From license_Usage
Group by [date], [user]
) as result
order by Quantity desc
If you need to have a fetch that fetches all the rows that have max in case there's several, then you'll have to use rank() window function
Use RANK to rank the users by the number of licenses per day.
SELECT
LicPerDay.*,
RANK() OVER (PARTITION BY [date] ORDER BY Qty DESC) AS User_Rank
FROM (
SELECT
COUNT(license) AS Qty,
User,
[date]
FROM license_usage
GROUP BY User, [date]
) LicPerDay
Any user with User_Rank = 1 will have the most licenses for that day.
If you only want the top user for each day, wrap the query above as a subquery and filter on User_Rank = 1:
SELECT * FROM (
SELECT
LicPerDay.*,
RANK() OVER (PARTITION BY [date] ORDER BY Qty) AS User_Rank
FROM (
SELECT
COUNT(license) AS Qty,
User,
[date]
FROM license_usage
GROUP BY User, [date]
) LicPerDay
) LicPerDayRanks
WHERE User_Rank = 1
Use a Windowed Aggregate Function, RANK, to get the highest count:
SELECT * FROM (
SELECT
User,
[date]
COUNT(license) AS Qty,
-- rank by descending number for each day ??
--RANK() OVER (PARTITION BY [date] ORDER BY COUNT(license) DESC) AS rnk
-- rank by descending number
RANK() OVER (ORDER BY COUNT(license) DESC) AS rnk
FROM license_usage
GROUP BY User, [date]
) dt
WHERE rnk = 1