SQL query to select percentage of total - sql

I have a MSSQL table stores that has the following columns in a table:
Storeid, NumEmployees
1 125
2 154
3 10
4 698
5 54
6 98
7 87
8 100
9 58
10 897
Can someone help me with the SQL query to produce the top stores(storeID) that has 30% of the total emplyees(NumEmployees)?

WITH cte
AS (SELECT storeid,
numemployees,
( numemployees * 100 ) / SUM(numemployees) OVER (PARTITION BY 1)
AS
percentofstores
FROM stores)
SELECT *
FROM cte
WHERE percentofstores >= 30
ORDER BY numemployees desc
Working Demo
Alternative that doesn't use SUM/OVER
SELECT s.storeid, s.numemployees
FROM (SELECT SUM(numemployees) AS [tots]
FROM stores) AS t,
stores s
WHERE CAST(numemployees AS DECIMAL(15, 5)) / tots >= .3
ORDER BY s.numemployees desc
Working Demo
Note that in the second version I decided not to multiply by 100 before dividing. This requires a cast to decimal otherwise it would be implicitly converted to a int resulting in no records returned
Also I'm not completely clear that you want this, but you can add TOP 1 to both queries and it will limit the results to just the one with the greatest # of stores with more than 30%
UPDATE
Based on your comments it sounds to paraphrase Kevin
You want the rows, starting at the store with the most employees and working down until you have at least 30 %
This is difficult because it requires a running percentage and its a bin packing problem however this does work. Note I've included two other test cases (where the percent exactly equals and its just over the top two combined)
Working Demo
DECLARE #percent DECIMAL (20, 16)
SET #percent = 0.3
--Other test values
--SET #percent = 0.6992547128452433
--SET #percent = 0.6992547128452434
;WITH sums
AS (SELECT DISTINCT s.storeid,
s.numemployees,
s.numemployees + Coalesce(SUM(s2.numemployees) OVER (
PARTITION
BY
s.numemployees), 0)
runningsum
FROM stores s
LEFT JOIN stores s2
ON s.numemployees < s2.numemployees),
percents
AS (SELECT storeid,
numemployees,
runningsum,
CAST(runningsum AS DECIMAL(15, 5)) / tots.total
running_percent,
Row_number() OVER (ORDER BY runningsum, storeid ) rn
FROM sums,
(SELECT SUM(numemployees) total
FROM stores) AS tots)
SELECT p.storeID,
p.numemployees,
p.running_percent,
p.running_percent,
p.rn
FROM percents p
CROSS JOIN (SELECT MAX(rn) rn
FROM percents
WHERE running_percent = #percent) exactpercent
LEFT JOIN (SELECT MAX(rn) rn
FROM percents
WHERE running_percent <= #percent) underpercent
ON p.rn <= underpercent.rn
OR ( exactpercent.rn IS NULL
AND p.rn <= underpercent.rn + 1 )
WHERE
underpercent.rn is not null or p.rn = 1

Related

Get rows in SQL by summing up a until certain value is exceeded and stop retrieving

I have to return rows from the database when the value exceeds a certain point.
I should get enough rows to sum up to a value that is greater than my quantity and stop retrieving rows.
Is this possible and does it makes sense?
Can this be transferred into LINQ for EF core?
I am currently stuck with query that will return all the rows...
SELECT [i].[InventoryArticleId], [i].[ArticleId], [i].[ArticleQuantity], [i].[InventoryId]
FROM [InventoryArticle] AS [i]
INNER JOIN [Article] AS [a] ON [i].[ArticleId] = [a].[ArticleId]
WHERE (([i].[ArticleId] = 1) AND ([a].[ArticlePrice] <= 1500))
AND ((
SELECT COALESCE(SUM([i0].[ArticleQuantity]), 0)
FROM [InventoryArticle] AS [i0]
INNER JOIN [Article] AS [a0] ON [i0].[ArticleId] = [a0].[ArticleId]
WHERE ([i0].[ArticleId] = 1) AND ([a0].[ArticlePrice] < 1500)) > 10)
Expected result is one row. If number would be greater than 34, more rows should be added.
You can use a windowed SUM to calculate a running sum ArticleQuantity. It is likely to be far more efficient than self-joining.
The trick is that you need all rows where the running sum up to the previous row is less than the requirement.
You could utilize a ROWS clause of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with possible NULLs on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
DECLARE #requirement int = 10;
SELECT
i.InventoryArticleId,
i.ArticleId,
i.ArticleQuantity,
i.InventoryId
FROM (
SELECT
i.*,
RunningSum = SUM(i.ArticleQuantity) OVER (PARTITION BY i.ArticleId ORDER BY i.InventoryArticleId ROWS UNBOUNDED PRECEDING)
FROM InventoryArticle i
INNER JOIN Article a ON i.ArticleId = a.ArticleId
WHERE i.ArticleId = 1
AND a.ArticlePrice <= 1500
) i
WHERE i.RunningSum - i.ArticleQuantity < #requirement;
You may want to choose a better ordering clause.
EF Core cannot use window functions, unless you specifically define a SqlExpression for it.
My approach would be to:
Filter for the eligible records.
Calculate the running total.
Identify the first record where the running total satisfies your criteria.
Perform a final select of all eligible records up to that point.
Something like the following somewhat stripped down example:
-- Some useful generated data
DECLARE #Inventory TABLE (InventoryArticleId INT, ArticleId INT, ArticleQuantity INT)
INSERT #Inventory(InventoryArticleId, ArticleId, ArticleQuantity)
SELECT TOP 1000
InventoryArticleId = N.n,
ArticleId = N.n % 5,
ArticleQuantity = 5 * N.n
FROM (
-- Generate a range of integers
SELECT n = ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)
ORDER BY 1
) N
ORDER BY N.n
SELECT * FROM #Inventory
DECLARE #ArticleId INT = 2
DECLARE #QuantityNeeded INT = 500
;
WITH isum as (
SELECT i.*, runningTotalQuantity = SUM(i.ArticleQuantity) OVER(ORDER BY i.InventoryArticleId)
FROM #Inventory i
WHERE i.ArticleId = #ArticleId
)
SELECT isum.*
FROM (
SELECT TOP 1 InventoryArticleId
FROM isum
WHERE runningTotalQuantity >= #QuantityNeeded
ORDER BY InventoryArticleId
) selector
JOIN isum ON isum.InventoryArticleId <= selector.InventoryArticleId
ORDER BY isum.InventoryArticleId
Results:
InventoryArticleId
ArticleId
ArticleQuantity
runningTotalQuantity
2
2
10
10
7
2
35
45
12
2
60
105
17
2
85
190
22
2
110
300
27
2
135
435
32
2
160
595
All of the ORDER BY clauses in the running total calculation, selector, and final select must be consistent and unambiguous (no dups). If a more complex order or preference is needed, it may be necessary to assign a rank value the eligible records before calculating the running total.

SQL Get closest value to a number

I need to find the closet value of each number in column Divide from the column Quantity and put the value found in the Value column for both Quantities.
Example:
In the column Divide the value of 5166 would be closest to Quantity column value 5000. To keep from using those two values more than once I need to place the value of 5000 in the value column for both numbers, like the example below. Also, is it possible to do this without a loop?
Quantity Divide Rank Value
15500 5166 5 5000
1250 416 5 0
5000 1666 5 5000
12500 4166 4 0
164250 54750 3 0
5250 1750 3 0
6250 2083 3 0
12250 4083 3 0
1750 583 2 0
17000 5666 2 0
2500 833 2 0
11500 3833 2 0
1250 416 1 0
There are a couple of answers here but they both use ctes/complex subqueries. There is a much simpler/faster way by just doing a couple of self joins and a group-by
https://www.db-fiddle.com/f/rM268EYMWuK7yQT3gwSbGE/0
select
min(min.quantity) as minQuantityOverDivide
, t1.divide
, max(max.quantity) as maxQuantityUnderDivide
, case
when
(abs(t1.divide - coalesce(min(min.quantity),0))
<
abs(t1.divide - coalesce(max(max.quantity),0)))
then max(max.quantity)
else min(min.quantity) end as cloestQuantity
from t1
left join (select quantity from t1) min on min.quantity >= t1.divide
left join (select quantity from t1) max on max.quantity < t1.divide
group by
t1.divide
If I understood the requirements, 5166 is not closest to 5000 - it's closes to 5250 (delta of 166 vs 84)
The corresponding query, without loops, shall be (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=be434e67ba73addba119894a98657f17).
(I added a Value_Rank as it's not sure if you want Rank to be kept or recomputed)
select
Quantity, Divide, Rank, Value,
dense_rank() over(order by Value) as Value_Rank
from
(
select
Quantity, Divide, Rank,
--
case
when abs(Quantity_let_delta) < abs(Quantity_get_delta) then Divide + Quantity_let_delta
else Divide + Quantity_get_delta
end as Value
from
(
select
so.Quantity, so.Divide, so.Rank,
-- There is no LessEqualThan, assume GreaterEqualThan
max(isnull(so_let.Quantity, so_get.Quantity)) - so.Divide as Quantity_let_delta,
-- There is no GreaterEqualThan, assume LessEqualThan
min(isnull(so_get.Quantity, so_let.Quantity)) - so.Divide as Quantity_get_delta
from
SO so
left outer join SO so_let
on so_let.Quantity <= so.Divide
--
left outer join SO so_get
on so_get.Quantity >= so.Divide
group by so.Quantity, so.Divide, so.Rank
) so
) result
Or, if by closest you mean the previous closest (fiddle here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=b41fb1a3fc11039c7f82926f8816e270).
select
Quantity, Divide, Rank, Value,
dense_rank() over(order by Value) as Value_Rank
from
(
select
so.Quantity, so.Divide, so.Rank,
-- There is no LessEqualThan, assume 0
max(isnull(so_let.Quantity, 0)) as Value
from
SO so
left outer join SO so_let
on so_let.Quantity <= so.Divide
group by so.Quantity, so.Divide, so.Rank
) result
You don't need a loop, basically you need to find which is lowest difference between the divide and all the quantities (first cte). Then use this distance to find the corresponding record (second cte) and then join with your initial table to get the converted values (final select)
;with cte as (
select t.Divide, min(abs(t2.Quantity-t.Divide)) as ClosestQuantity
from #t1 as t
cross apply #t1 as t2
group by t.Divide
)
,cte2 as (
select distinct
t.Divide, t2.Quantity
from #t1 as t
cross apply #t1 as t2
where abs(t2.Quantity-t.Divide) = (select ClosestQuantity from cte as c where c.Divide = t.Divide)
)
select t.Quantity, cte2.Quantity as Divide, t.Rank, t.Value
from #t1 as t
left outer join cte2 on t.Divide = cte2.Divide

SQL Server - Grouping Combination of possibilities by fixed value

I have to create cheapest basket which inculde fixed items.
For example for a basket which have (5) items
1 and 4 = (1 * 50) + (1 * 100) = 150
2 and 3 = (1 * 60) + (1 * 80) = 140 -- this is my guy
2 and 2 and 1 = (1 * 60) + (1 * 60) + (1 * 50) = 170
3 and 3 = (1 * 80) + (1 * 80) = 160 **** this 6 items but total item can exceed min items. The important thing is total cost...
....
Also this is valid for any number of items a basket may have. Also there are lots of stores and each stores have different package may include several items.
How can handle this issue with SQL?
UPDATE
Here is example data generation code. Recursive CTE solutions are more expensive. I should finish the job under 500-600ms over 600-700 stores each time. this is a package search engine. Manual scenario creation by using ´#temp´ tables or ´UNUION´ is 15-20 times cheaper then Recursive CTE.
Also concatenating Item or PackageId is very expensive. I can found required package id or item after selecting cheapest package with join to source table.
I am expecting a megical solution which can be ultra fast and get the correct option.
Only cheapest basket required for each store. Manual scenario creation is very fast but sometimes fail for correct cheapest basket.
CREATE TABLE #storePackages(
StoreId int not null,
PackageId int not null,
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare #totalStores int
set #totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare #storeId int;
declare #packageId int;
declare #maxPackageForStore int;
declare #itemMinPrice decimal(18,8);
set #storeId = 1;
set #packageId = 1
while(#storeId <= #totalStores)
BEGIN
set #maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set #itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO #storePackages
SELECT DISTINCT
StoreId = #storeId
,PackageId = CAST(#packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = #itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND #maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = #storeId
,PackageId = CAST(#packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (#itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN #maxPackageForStore > 4 THEN 4 ELSE #maxPackageForStore END)
set #packageId = #packageId + #maxPackageForStore;
END
set #storeId =#storeId + 1;
END
SELECT * FROM #storePackages
drop table #storePackages
MY SOLUTION
First of all I am thankful for everyone who try to help me. However all suggested solutions are based on CTE. As I said before recursive CTEs cause performace problems when hunderds of stores are considered. Also multiple packages are requested for one time. This means, I request can include mutiple baskets. One is 5 items other is 3 items and another one is 7 items...
Last Solution
First of all I generates all possible scenarios in a table by item size... By this way, I have option eleminate unwanted scenarios.
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
Then I generated all possible scenario from 2 item to 25 item and insert to the ItemScenarios table. Scenarios can be genereated one time by using WHILE or recursive CTE. The advantage of this way, scenarios generated only for one time.
Resuls are like below.
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
By this way, I can restrict scenario sizes, Max different store, max different package etc.
Also I can eleminate some scenarios which matematically impossible cheapest then other. For example for 4 items request, some scenario
Scenario 1 : 2+2
Scenario 2: 2+1+1
Scenario 3: 1+1+1+1
Among these scenarios; It is impossible Scenario 2 would be cheapest basket. Because,
If Scenario 2 < Scenario 3 --> Scenario 1 would be lower then Scenario 2. Because the thing decreasing cost is 2 item price and **Scenario 1* have double 2 items
Also If Scenario 2 < Scenario 1 --> Scenario 3 would be lower then Scenario 2
Now, If I delete scenarios like Scenario 2 I would gain some performance advantages.
Now I can chose chepest item prices among stores
DECLARE #requestedItems int;
SET #requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = #requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= #requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
SUM
In my test, this way is mimimum 10 times faster then recursive CTE, especially when number of stores and requested items increased. Also It gets 100% correct results. Because recursive CTE tried milions of unrequired JOINs when number of stores and requested items increased.
If you want combinations, you'll need a recursive CTE. Preventing infinite recursion is a challenge. Here is one method:
with cte as (
select cast(packageid as nvarchar(4000)) as packs, item, cost
from t
union all
select concat(cte.packs, ',', t.packageid), cte.item + t.item, cte.cost + t.cost
from cte join
t
on cte.item + t.item < 10 -- some "reasonable" stop condition
)
select top 1 cte.*
from cte
where cte.item >= 5
order by cost desc;
I'm not 100% sure that SQL Server will accept the join condition, but this should work.
Assuming you want to compare all possible permutations of items until the total items in the basket exceeds your total basket number, something like the following would do what you want.
DECLARE #N INT = 1;
DECLARE #myTable TABLE (storeID INT DEFAULT(1), packageID INT IDENTITY(1, 1), item INT, cost INT);
INSERT #myTable (item, cost) VALUES (1, 50), (2, 60), (3, 80), (4, 100), (5, 169), (5, 165), (4, 101), (2, 61);
WITH CTE1 AS (
SELECT item, cost
FROM (
SELECT item, cost, ROW_NUMBER() OVER (PARTITION BY item ORDER BY cost) RN
FROM #myTable) T
WHERE RN = 1)
, CTE2 AS (
SELECT CAST('items'+CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)) items, C1.cost totalCost, C1.item totalItems
FROM CTE1 C1
UNION ALL
SELECT CAST(C2.items + ' + items' + CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)), C1.cost + C2.totalCost, C1.item + C2.totalItems
FROM CTE2 C2
CROSS JOIN CTE1 C1
WHERE C2.totalItems < #N)
SELECT TOP 1 *
FROM CTE2
WHERE totalItems >= #N
ORDER BY totalCost, totalItems DESC;
Edited to deal with the issue #Matt mentioned.
Firstly we'll should to find all combinations, and next select one with minimal price for seeking value
DECLARE #Table as TABLE (StoreId INT, PackageId INT, Item INT, Cost INT)
INSERT INTO #Table VALUES (1,1,1,50),(1,2,2,60),(1,3,3,80),(1,4,4,100)
DECLARE #MinItemCount INT = 5;
WITH cteCombinationTable AS (
SELECT cast(PackageId as NVARCHAR(4000)) as Package, Item, Cost
FROM #Table
UNION ALL
SELECT CONCAT(o.Package,',',c.PackageId), c.Item + o.Item, c.Cost + o.Cost FROM #Table as c join cteCombinationTable as o on CONCAT(o.Package,',',c.PackageId) <> Package
where o.Item < #MinItemCount
)
select top 1 *
from cteCombinationTable
where item >= #MinItemCount
order by cast(cost as decimal)/#MinItemCount
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE #MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= #MinItemCount) AND
SUM(ItemPrice) < MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,'<PackageId>' + CAST(PackageId AS VARCHAR(MAX)) + '</PackageId>' AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + '<PackageId>' + CAST(t.PackageId AS VARCHAR(MAX)) + '</PackageId>'
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,CAST(PackageIds AS XML) AS PackageIds
,CombinedCost
,CombinedItemCount
,DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS CombinedCostRowNum
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS PseudoCartId
FROM
cteRecursive
WHERE
CombinedItemCount >= #MinItemCount
)
SELECT DISTINCT
c.StoreId
,x.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
CROSS APPLY (
SELECT( STUFF ( (
SELECT ',' + PackageId
FROM
(SELECT T.N.value('.','VARCHAR(100)') as PackageId FROM c.PackageIds.nodes('PackageId') as T(N)) p
ORDER BY
PackageId
FOR XML PATH(''), TYPE ).value('.','NVARCHAR(MAX)'), 1, 1, '')
) as PackageIds
) x
WHERE
CombinedCostRowNum = 1
SELECT *
FROM
#TestResults
Takes about 1000-2000 MS varies widely depending on combinations that have to be considered within test data (e.g. some times more or less data is generate by your script).
this answer no doubt looks a bit more complicated than Gordon's or ZLKs but it handles Ties, repeated values, 1 package meeting the criteria and a few other things. The main difference however is really in the last query where I take the XML that was build during the recursive query split it and then re-combined in order so that you can use DISTINCT and get a unique pairing e.g. package 2 + package 3 = 140 & package 3 + package 2 = 140 would be the first 2 results in all of the queries so using the XML to split then recombine allows that to be a single row. But lets say you also had another row such as (1,5,2,60) that had 2 items and a cost of 60 this query will return that combination too.
You can cherry pick between the answers and use their method to get to the combinations and my methods to get to the final results etc.... But to explain the process of my query.
cteMaxCostToConsider - this is just a way of getting a cost to contain the recursive query to so that less records have to be considered. what it does is determines the cost of all of the packages together or the cost if you bought all of the same package to satisfy the minimum count.
cteRecursive - this is similar to ZLKs answer and a litte like Gordon's but what it does is goes out and continues to add items & item combinations until it reaches MaxCostToConsider. If I limit to look at item count it could miss a situation where 7 items would be cheaper than 5 so by constraining to the determined Combined Cost it limits the recursion and performs better.
cteCombinedCostRowNum - This simply finds the lowest Combined Cost and at least the minimum item count.
The final query is a bit trickier but the cross apply splits the XML string build in the recursive cte to different rows re-orders those rows and then concatenates them again so that the reverse combination e.g. Package 2 & Package 3 reverse Package 3 & Package 2 becomes the same record and then calls distinct.
This is a bit more flexible than SELECT top N. To see the difference add the following test cases to your test data 1 at a time:
(StoreId, PackageId, Item, Cost)
(1,5,2,60)
(1,6,1,1),(1,7,1,1)
(1,8,50,1)
Edited. The above will give you every combination of a store that will have the lowest combined cost. The bug that you noted was due to cteMaxCostToConsider. I was using SUM(ItemPrice) but sometimes SUM(ItemCount) related to it didn't have enough items in it to allow it to be considered for the MaxCostToConsider. I modified the case statement to correct that issue.
I have also modified to work with your data example your provided. NOTE you should change your PackageId in that to an IDENTITY column though because I was getting the duplicate PackageIds within a store with the method you used.
Here is a modified version of your script to see what I am talking about:
IF OBJECT_ID('storePackages') IS NOT NULL
BEGIN
DROP TABLE storePackages
END
CREATE TABLE storePackages(
StoreId int not null,
PackageId int not null IDENTITY(1,1),
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare #totalStores int
set #totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare #storeId int;
declare #packageId int;
declare #maxPackageForStore int;
declare #itemMinPrice decimal(18,8);
set #storeId = 1;
set #packageId = 1
while(#storeId <= #totalStores)
BEGIN
set #maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set #itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO storePackages (StoreId, ItemType, ItemCount, ItemPrice, MaxFullQouta, MaxItemQouta)
SELECT DISTINCT
StoreId = #storeId
--,PackageId = CAST(#packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = #itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND #maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = #storeId
--,PackageId = CAST(#packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (#itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN #maxPackageForStore > 4 THEN 4 ELSE #maxPackageForStore END)
--set #packageId = #packageId + #maxPackageForStore;
END
set #storeId =#storeId + 1;
END
SELECT * FROM storePackages
--drop table #storePackages
No PackageIds Simply StoreId and Lowest CombinedCost - ~200-300MS depending on data
Next if you don't care what Packages are in there and you only want 1 row per store you can do the following:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE #MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= #MinItemCount) AND
SUM(ItemPrice) < MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
SELECT
StoreId
,MIN(CombinedCost) as CombinedCost
INTO #TestResults
FROM
cteRecursive
WHERE
CombinedItemCount >= #MinItemCount
GROUP BY
StoreId
SELECT *
FROM
#TestResults
WITH PackageIds Only 1 Record Per StoreId - Varries widely depending on test data/combinations to consider ~600-1300MS
Or if you still want package ids but you don't care which combination you choose and you only want 1 record then you can do:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE #MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= #MinItemCount) AND
SUM(ItemPrice) < MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,CAST(PackageId AS VARCHAR(MAX)) AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + ',' + CAST(t.PackageId AS VARCHAR(MAX))
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,PackageIds
,CombinedCost
,CombinedItemCount
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS RowNumber
FROM
cteRecursive
WHERE
CombinedItemCount >= #MinItemCount
)
SELECT DISTINCT
c.StoreId
,c.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
WHERE
RowNumber = 1
SELECT *
FROM
#TestResults
Note all bench marking is done on a 4 year old laptop Intel i7-3520M CPU 2.9 GHz with 8 GB of RAM and SAMSUNG 500 GB EVO SSD. So if you run this on an appropriately resourced server I would expect exponentially faster. There is also no doubt that adding indexes on storePackages would expedite the answer as well.
MY SOLUTION
First of all I am thankful for everyone who try to help me. However all suggested solutions are based on CTE. As I said before recursive CTEs cause performace problems when hunderds of stores are considered. Also multiple packages are requested for one time. This means, A request can include mutiple baskets. One is 5 items other is 3 items and another one is 7 items...
Last Solution
First of all I generates all possible scenarios in a table by item size... By this way, I have option eleminate unwanted scenarios.
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
Then I generated all possible scenario from 2 item to 25 item and insert to the ItemScenarios table. Scenarios can be genereated one time by using WHILE or recursive CTE. The advantage of this way, scenarios generated only for one time.
Resuls are like below.
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
By this way, I can restrict scenario sizes, Max different store, max different package etc.
Also I can eleminate some scenarios which matematically impossible cheapest then other. For example for 4 items request, some scenario
Scenario 1 : 2+2
Scenario 2: 2+1+1
Scenario 3: 1+1+1+1
Among these scenarios; It is impossible Scenario 2 would be cheapest basket. Because,
If Scenario 2 < Scenario 3 --> Scenario 1 would be lower then Scenario 2. Because the thing decreasing cost is 2 item price and **Scenario 1* have double 2 items
Also If Scenario 2 < Scenario 1 --> Scenario 3 would be lower then Scenario 2
Now, If I delete scenarios like Scenario 2 I would gain some performance advantages.
Now I can chose chepest item prices among stores
DECLARE #requestedItems int;
SET #requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = #requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= #requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
SUM
In my test, this way is mimimum 10 times faster then recursive CTE, especially when number of stores and requested items increased. Also It gets 100% correct results. Because recursive CTE tried milions of unrequired JOINs when number of stores and requested items increased.

SQL Server Amount Split

I have below 2 tables in SQL Server database.
Customer Main Expense Table
ReportID CustomerID TotalExpenseAmount
1000 1 200
1001 2 600
Attendee Table
ReportID AttendeeName
1000 Mark
1000 Sam
1000 Joe
There is no amount at attendee level. I have need to manually calculate individual attendee amount as mentioned below. (i.e split TotalExpenseAmount based on number of attendees and ensure individual split figures round to 2 decimals and sums up to the TotalExpenseAmount exactly)
The final report should look like:
ReportID CustID AttendeeName TotalAmount AttendeeAmount
1000 1 Mark 200 66.66
1000 1 Sam 200 66.66
1000 1 Joe 200 66.68
The final report will have about 1,50,000 records. If you notice the attendee amount I have rounded the last one in such a way that the totals match to 200. What is the best way to write an efficient SQL query in this scenario?
You can do this using window functions:
select ReportID, CustID, AttendeeName, TotalAmount,
(case when seqnum = 1
then TotalAmount - perAttendee * (cnt - 1)
else perAttendee
end) as AttendeeAmount
from (select a.ReportID, a.CustID, a.AttendeeName, e.TotalAmount,
row_number() over (partition by reportId order by AttendeeName) as seqnum,
count(*) over (partition by reportId) as cnt,
cast(TotalAmount * 1.0 / count(*) over (partition by reportId) as decimal(10, 2)) as perAttendee
from attendee a join
expense e
on a.ReportID = e.ReportID
) ae;
The perAttendee amount is calculated in the subquery. This is rounded down by using cast() (only because floor() doesn't accept a decimal places argument). For one of the rows, the amount is the total minus the sum of all the other attendees.
Doing something similar to #Gordon's answer but using a CTE instead.
with CTECount AS (
select a.ReportId, a.AttendeeName,
ROW_NUMBER() OVER (PARTITION BY A.ReportId ORDER BY A.AttendeeName) [RowNum],
COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId) [AttendeeCount],
CAST(c.TotalExpenseAmount / (COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId)) AS DECIMAL(10,2)) [PerAmount]
FROM #Customer C INNER JOIN #Attendee A ON A.ReportId = C.ReportID
)
SELECT CT.ReportID, CT.CustomerId, AT.AttendeeName,
CASE WHEN CC.RowNum = 1 THEN CT.TotalExpenseAmount - CC.PerAmount * (CC.AttendeeCount - 1)
ELSE CC.PerAmount END [AttendeeAmount]
FROM #Customer CT INNER JOIN #Attendee AT
ON CT.ReportID = AT.ReportId
INNER JOIN CTECount CC
ON CC.ReportId = CT.ReportID AND CC.AttendeeName = AT.AttendeeName
I like the CTE because it allows me to separate the different aspects of the query. The cool thing that #Gordon used was the Case statement and the inner calculation to have the lines total correctly.

SQL query - Difference between the values from two rows and two columns

I am struggling to get this working, using T-SQL Query (SQL SERVER 2008) for the following problem:
Ky ProductID Start # End # Diff
1 100 10 12 0
2 100 14 20 2 (14 - 12)
3 100 21 25 1 (21 - 20)
4 100 30 33 5 (30 - 25)
1 110 6 16 0
2 110 20 21 4 (20 - 16)
3 110 22 38 1 (22 - 21)
as you can see I need the difference between values in two different rows and two columns.
I tried
with t1
( select ROW_NUMBER() OVER (PARTITION by ProductID ORDER BY ProductID, Start# ) as KY
, productid
, start#
, end#
from mytable)
and
select DATEDIFF(ss, T2.complete_dm, T1.start_dm)
, <Keeping it simple not including all the columns which I selected..>
FROM T1 as T2
RIGHT OUTER JOIN T1 on T2.Ky + 1 = T1.KY
and T1.ProductID = T2.ProductID
The problem with the above query is when the productID changes from 100 to 110 still it calculates the difference.
Any help in modifying the query or any simpler solution much appreciated.
Thanks
You can try below code for the required result :
select ky,Start,[End],(select [end] from table1 tt where (tt.ky)=(t.ky-1) and tt.ProductID=t.ProductID) [End_Prev_Row],
case ky when 1 then 0
else (t.start -(select [end] from table1 tt where (tt.ky)=(t.ky-1) and tt.ProductID=t.ProductID))
end as Diff
from table1 t
SQL FIDDLE
Try something like that. It should give you the difference you want. I'm getting the first row for each product in the first part and then recursively build up by using the next Ky.
with t1
as
(
select ProductID, Ky, 0 as Difference, [End#]
from mytable where ky = 1
union all
select m.ProductID, m.Ky, m.[Start#] - t1.[End#] as Difference, m.[End#]
from mytable m
inner join t1 on m.ProductID = t1.ProductID and m.Ky = t1.Ky + 1
)
select Ky, ProductID, Difference from t1
order by ProductID, Ky
As Anup has mentioned, your query seems to be working fine, I just removed DateDiff to calculate the difference, as I assume columns are not of DATE datatype from your example, I guess that was the issue, please find below the modified query
with t1
as
( select ROW_NUMBER() OVER (PARTITION by ProductID ORDER BY ProductID ) as KY
, productid
, st
, ed
from YourTable)
select T1.ProductID, t1.ST,t1.ED, ISNULL(T1.st - T2.ed,0) as Diff
FROM T1 as T2
RIGHT OUTER JOIN T1 on T2.KY+1 = T1.KY
and T1.ProductID = T2.ProductID
SELECT ROW_NUMBER() OVER (PARTITION by rc.ContractID ORDER BY rc.ID) AS ROWID,rc.ID,rc2.ID,rc.ContractID,rc2.ContractID,rc.ToDate,rc2.FromDate
FROM tbl_RenewContracts rc
LEFT OUTER JOIN tbl_RenewContracts rc2
ON rc2.ID = (SELECT MAX(ID) FROM tbl_RenewContracts rcs WHERE rcs.ID < rc.ID AND rcs.ContractID = rc.ContractID)
ORDER BY rc.ContractID
Replace your table name and columns and add calculated column to get the DATEDIFF.