Alternative: Sql - SELECT rows until the sum of a row is a certain value - sql

My question is very similar to my previous one posted here:
Sql - SELECT rows until the sum of a row is a certain value
To sum it up, I need to return the rows, until a certain sum is reached, but the difference this time, is that, I need to find the best fit for this sum, I mean, It doesn't have to be sequential. For example:
Let's say I have 5 unpaid receipts from customer 1:
Receipt_id: 1 | Amount: 110€
Receipt_id: 2 | Amount: 110€
Receipt_id: 3 | Amount: 130€
Receipt_id: 4 | Amount: 110€
Receipt_id: 5 | Amount: 190€
So, customer 1 ought to pay me 220€.
Now I need to select the receipts, until this 220€ sum is met and it might be in a straight order, like (receipt 1 + receipt 2) or not in a specific order, like (receipt 1 + receipt 4), any of these situations would be suitable.
I am using SQL Server 2016.
Any additional questions, feel free to ask.
Thanks in advance for all your help.

This query should solve it.
It is a quite dangerous query (containing a recursive CTE), so please be careful!
You can find some documentation here: https://www.essentialsql.com/recursive-ctes-explained/
WITH the_data as (
SELECT *
FROM (
VALUES (1, 1, 110),(1, 2,110),(1, 3,130),(1, 4,110),(1, 5,190),
(2, 1, 10),(2, 2,20),(2, 3,200),(2, 4,190)
) t (user_id, receipt_id, amount)
), permutation /* recursive used here */ as (
SELECT
user_id,
amount as sum_amount,
CAST(receipt_id as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id,
1 as i
FROM the_data
WHERE amount > 0 -- remove empty amount
UNION ALL
SELECT
the_data.user_id,
sum_amount + amount as sum_amount,
CAST(concat(visited_receipt_id, ',', CAST(receipt_id as varchar))as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id ,
i + 1
FROM the_data
JOIN permutation
ON the_data.user_id = permutation.user_id
WHERE i < 1000 -- max 1000 loops, means any permutation with less than 1000 different receipts
and receipt_id > max_receipt_id -- in order that sum in komutatif , we can check the sum in any unique order ( here we take the order of the reciept_id in fact we do not produce any duplicates )
-- AND sum_amount + amount <= 220 -- ignore everything that is bigger than the expected value (optional)
)
SELECT *
FROM permutation
WHERE sum_amount = 220
in order to select only one combination per user_id, replace the last three lines of the previous query by
SELECT *
FROM (
SELECT *, row_number() OVER (partition by user_id order by random() ) as r
FROM permutation
WHERE sum_amount = 220
) as t
WHERE r = 1

IF your target is to sum only 2 receipts in order to reach your value, this could be a solution:
DECLARE #TARGET INT = 220 --SET YOUR TARGET
, #DIFF INT
, #FIRSTVAL INT
SET #FIRSTVAL = (
SELECT TOP 1 AMOUNT
FROM myRECEIPTS
ORDER BY RECEIPT_ID ASC
)
SELECT TOP 1 *
FROM myRECEIPTS
WHERE AMOUNT = #TARGET - #FIRSTVAL
ORDER BY RECEIPT_ID ASC

this code will do it:
declare #sum1 int
declare #numrows int
set #numrows= 1
set #sum1 =0
while (#sum1 < 10)
begin
select top (#numrows) #sum1=sum(sum1) from receipts
set #numrows +=1
end
select top(#numrows) * from receipts

Related

Get rows in SQL by summing up a until certain value is exceeded and stop retrieving

I have to return rows from the database when the value exceeds a certain point.
I should get enough rows to sum up to a value that is greater than my quantity and stop retrieving rows.
Is this possible and does it makes sense?
Can this be transferred into LINQ for EF core?
I am currently stuck with query that will return all the rows...
SELECT [i].[InventoryArticleId], [i].[ArticleId], [i].[ArticleQuantity], [i].[InventoryId]
FROM [InventoryArticle] AS [i]
INNER JOIN [Article] AS [a] ON [i].[ArticleId] = [a].[ArticleId]
WHERE (([i].[ArticleId] = 1) AND ([a].[ArticlePrice] <= 1500))
AND ((
SELECT COALESCE(SUM([i0].[ArticleQuantity]), 0)
FROM [InventoryArticle] AS [i0]
INNER JOIN [Article] AS [a0] ON [i0].[ArticleId] = [a0].[ArticleId]
WHERE ([i0].[ArticleId] = 1) AND ([a0].[ArticlePrice] < 1500)) > 10)
Expected result is one row. If number would be greater than 34, more rows should be added.
You can use a windowed SUM to calculate a running sum ArticleQuantity. It is likely to be far more efficient than self-joining.
The trick is that you need all rows where the running sum up to the previous row is less than the requirement.
You could utilize a ROWS clause of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with possible NULLs on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
DECLARE #requirement int = 10;
SELECT
i.InventoryArticleId,
i.ArticleId,
i.ArticleQuantity,
i.InventoryId
FROM (
SELECT
i.*,
RunningSum = SUM(i.ArticleQuantity) OVER (PARTITION BY i.ArticleId ORDER BY i.InventoryArticleId ROWS UNBOUNDED PRECEDING)
FROM InventoryArticle i
INNER JOIN Article a ON i.ArticleId = a.ArticleId
WHERE i.ArticleId = 1
AND a.ArticlePrice <= 1500
) i
WHERE i.RunningSum - i.ArticleQuantity < #requirement;
You may want to choose a better ordering clause.
EF Core cannot use window functions, unless you specifically define a SqlExpression for it.
My approach would be to:
Filter for the eligible records.
Calculate the running total.
Identify the first record where the running total satisfies your criteria.
Perform a final select of all eligible records up to that point.
Something like the following somewhat stripped down example:
-- Some useful generated data
DECLARE #Inventory TABLE (InventoryArticleId INT, ArticleId INT, ArticleQuantity INT)
INSERT #Inventory(InventoryArticleId, ArticleId, ArticleQuantity)
SELECT TOP 1000
InventoryArticleId = N.n,
ArticleId = N.n % 5,
ArticleQuantity = 5 * N.n
FROM (
-- Generate a range of integers
SELECT n = ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)
ORDER BY 1
) N
ORDER BY N.n
SELECT * FROM #Inventory
DECLARE #ArticleId INT = 2
DECLARE #QuantityNeeded INT = 500
;
WITH isum as (
SELECT i.*, runningTotalQuantity = SUM(i.ArticleQuantity) OVER(ORDER BY i.InventoryArticleId)
FROM #Inventory i
WHERE i.ArticleId = #ArticleId
)
SELECT isum.*
FROM (
SELECT TOP 1 InventoryArticleId
FROM isum
WHERE runningTotalQuantity >= #QuantityNeeded
ORDER BY InventoryArticleId
) selector
JOIN isum ON isum.InventoryArticleId <= selector.InventoryArticleId
ORDER BY isum.InventoryArticleId
Results:
InventoryArticleId
ArticleId
ArticleQuantity
runningTotalQuantity
2
2
10
10
7
2
35
45
12
2
60
105
17
2
85
190
22
2
110
300
27
2
135
435
32
2
160
595
All of the ORDER BY clauses in the running total calculation, selector, and final select must be consistent and unambiguous (no dups). If a more complex order or preference is needed, it may be necessary to assign a rank value the eligible records before calculating the running total.

Accounting Calculate Debit credit in SQL(ssms)

I have an accounting calculation problem. I want to write it with SQL Query (in ssms).
I have two groups of documents related to one person (creditor and debtor)
Creditor documents cover debtor documents.
Consider the following example: (How can the result be achieved?)
USE [master]
GO
DROP TABLE IF EXISTS #credit/*creditor=0*/,#debit/*Debtor=1*/
SELECT *
INTO #debit
FROM (values
(88,'2/14',1,5,1),(88,'2/15',2,5,1)
)A (personID,DocDate,DocID,Fee,IsDebit)
SELECT *
INTO #credit
FROM (values
(88,'2/16',3,3,0),(88,'2/17',4,7,0)
)A (personID,DocDate,DocID,Fee,ISDeb)
SELECT * FROM #credit
SELECT * FROM #debit
--result:
;WITH res AS
(
SELECT 88 AS personID ,1 deb_DocID ,5 deb_Fee , 3 Cre_DocID ,3 Cre_Fee, 0 remain_Cre_Fee
UNION
SELECT 88 AS personID ,1 deb_DocID ,5 deb_Fee , 4 Cre_DocID ,7 Cre_Fee, 5 remain_Cre_Fee
UNION
SELECT 88 AS personID ,2 deb_DocID ,5 deb_Fee , 4 Cre_DocID ,7 Cre_Fee, 0 remain_Cre_Fee
)
SELECT *
FROM res
Sample data
Using an ISO date format to avoid any confusion.
The docdate and isdebit columns will not be used in the solution...
I ignored the docdate under the assumptions that the values are incremental and that it is allow to deposit a credit fee before any debit fee.
The isdebit flag seems redundant if you are going to store debit and credit transactions in separate tables anyway.
Updated sample data:
create table debit
(
personid int,
docdate date,
docid int,
fee int,
isdebit bit
);
insert into debit (personid, docdate, docid, fee, isdebit) values
(88, '2021-02-14', 1, 5, 1),
(88, '2021-02-15', 2, 5, 1);
create table credit
(
personid int,
docdate date,
docid int,
fee int,
isdebit bit
);
insert into credit (personid, docdate, docid, fee, isdebit) values
(88, '2021-02-16', 3, 3, 0),
(88, '2021-02-17', 4, 7, 0);
Solution
Couple steps here:
Construct a rolling sum for the debit fees. Done with a first common table expression (cte_debit).
Construct a rolling sum for the credit fees. Done with a second common table expression (cte_credit).
Take all debit info (select * from cte_debit)
Find the first credit info that applies to the current debit info. Done with a first cross apply (cc1). This contains the docid of the first document that applies to the debit document.
Find the last credit info that applies to the current debit info. Done with a second cross apply (cc2). This contains the docid of the last document that applies to the debit document.
Find all credit info that applies to the current debit info by selecting all documents between the first and last applicable document (join cte_credit cc on cc.docid >= cc1.docid and cc.docid <= cc2.docid).
Combine the rolling sum numbers to calculate the remaining credit fees (cc.credit_sum - cd.debit_sum). Use a case expression to filter out negative values.
Full solution:
with cte_debit as
(
select d.personid,
d.docid,
d.fee,
sum(d.fee) over(order by d.docid rows between unbounded preceding and current row) as debit_sum
from debit d
),
cte_credit as
(
select c.personid,
c.docid,
c.fee,
sum(c.fee) over(order by c.docid rows between unbounded preceding and current row) as credit_sum
from credit c
)
select cd.personid,
cd.docid as deb_docid,
cd.fee as deb_fee,
cc.docid as cre_docid,
cc.fee as cre_fee,
case
when cc.credit_sum - cd.debit_sum >= 0
then cc.credit_sum - cd.debit_sum
else 0
end as cre_fee_remaining
from cte_debit cd
cross apply ( select top 1 cc1.docid, cc1.credit_sum
from cte_credit cc1
where cc1.personid = cd.personid
and cc1.credit_sum <= cd.debit_sum
order by cc1.credit_sum desc ) cc1
cross apply ( select top 1 cc2.docid, cc2.credit_sum
from cte_credit cc2
where cc2.personid = cd.personid
and cc2.credit_sum >= cd.debit_sum
order by cc2.credit_sum desc ) cc2
join cte_credit cc
on cc.personid = cd.personid
and cc.docid >= cc1.docid
and cc.docid <= cc2.docid
order by cd.personid,
cd.docid,
cc.docid;
Result
personid deb_docid deb_fee cre_docid cre_fee cre_fee_remaining
-------- --------- ------- --------- ------- -----------------
88 1 5 3 3 0
88 1 5 4 7 5
88 2 5 4 7 0
Fiddle to see things in action. This also contains the intermediate CTE results and some commented helper columns that can be uncommented to help to further understand the solution.

SQL Implementation with Sliding-Window functions or Recursive CTEs

I have a problem that it's very easy to be solved in C# code for example, but I have no idea how to write in a SQL query with Recursive CTE-s or Sliding-Windows functions.
Here is the situation: let's say I have a table with 3 columns (ID, Date, Amount), and here is some data:
ID Date Amount
-----------------------
1 01.01.2016 -500
2 01.02.2016 1000
3 01.03.2016 -200
4 01.04.2016 300
5 01.05.2016 500
6 01.06.2016 1000
7 01.07.2016 -100
8 01.08.2016 200
The result I want to get from the table is this (ID, Amount .... Order By Date):
ID Amount
-----------------------
2 300
4 300
5 500
6 900
8 200
The idea is to distribute the amounts into installments, for each client separately, but the thing is when negative amount comes into play you need to remove amount from the last installment. I don't know how clear I am, so here is an example:
Let's say I have 3 Invoices for one client with amounts 500, 200, -300.
If i start distribute these Invoices, first i distribute the amount 500, then 200. But when i come to the third one -300, then i need to remove from the last Invoice. In other words 200 - 300 = -100, so the amount from second Invoice will disappear, but there are still -100 that needs to be substracted from first Invoice. So 500 - 100 = 400. The result i need is result with one row (first invoice with amount 400)
Another example when the first invoice is with negative amount (-500, 300, 500).
In this case, the first (-500) invoice will make the second disappear and another 200 will be substracted from the third. So the result will be: Third Invoice with amount 300.
This is something like Stack implementation in programming language, but i need to make it with sliding-window functions in SQL Server.
I need an implementation with Sliding Function or Recursive CTEs.
Not with cycles ...
Thanks.
Ok, think this is what you want. there are two recursive queries. One for upward propagation and the second one for the downward propagation.
with your_data_rn as
(
select *, row_number() over (order by date) rn
from your_data
), rcte_up(id, date, ammount, rn, running) as
(
select *, ammount as running
from your_data_rn
union all
select d.*,
d.ammount + rcte_up.running
from your_data_rn d
join rcte_up on rcte_up.running < 0 and d.rn = rcte_up.rn - 1
), data2 as
(
select id, date, min(running) ammount,
row_number() over (order by date) rn
from rcte_up
group by id, date, rn
having min(running) > 0 or rn = 1
), rcte_down(id, date, ammount, rn, running) as
(
select *, ammount as running
from data2
union all
select d.*, d.ammount + rcte_down.running
from data2 d
join rcte_down on rcte_down.running < 0 and d.rn = rcte_down.rn + 1
)
select id, date, min(running) ammount
from rcte_down
group by id, date
having min(running) > 0
demo
I can imagine that you use just the upward propagation and the downward propagation of the first row is done in some procedural language. Downward propagation is one scan through few first rows, therefore, the recursive query may be a hammer on a mosquito.
I add client ID in table for more general solution. Then I implemented the stack stored as XML in query field. And emulated a program cycle with Recursive-CTE:
with Data as( -- Numbering rows for iteration on CTE
select Client, id, Amount,
cast(row_number() over(partition by Client order by Date) as int) n
from TabW
),
CTE(Client, n, stack) as( -- Recursive CTE
select Client, 1, cast(NULL as xml) from Data where n=1
UNION ALL
select D.Client, D.n+1, (
-- Stack operations to process current row (D)
select row_number() over(order by n) n,
-- Use calculated amount in first positive and oldest stack cell
-- Else preserve value stored in stack
case when n=1 or (n=0 and last=1) then new else Amount end Amount,
-- Set ID in stack cell for positive and new data
case when n=1 and D.Amount>0 then D.id else id end id
from (
select Y.Amount, Y.id, new,
-- Count positive stack entries
sum(case when new<=0 or (n=0 and Amount<0) then 0 else 1 end) over (order by n) n,
row_number() over(order by n desc) last -- mark oldest stack cell by 1
from (
select X.*,sum(Amount) over(order by n) new
from (
select case when C.stack.value('(/row/#Amount)[1]','int')<0 then -1 else 0 end n,
D.Amount, D.id -- Data from new record
union all -- And expand current stack in XML to table
select node.value('#n','int') n, node.value('#Amount','int'), node.value('#id','int')
from C.stack.nodes('//row') N(node)
) X
) Y where n>=0 -- Suppress new cell if the stack contained a negative amount
) Z
where n>0 or (n=0 and last=1)
for xml raw, type
)
from Data D, CTE C
where D.n=C.n and D.Client=C.Client
) -- Expand stack into result table
select CTE.Client, node.value('#id','int') id, node.value('#Amount','int')
from CTE join (select Client, max(n) max_n from Data group by Client) X on CTE.Client=X.Client and CTE.n=X.max_n+1
cross apply stack.nodes('//row') N(node)
order by CTE.Client, node.value('#n','int') desc
Test on sqlfiddle.com
I think this method is slower than #RadimBača. And it is shown to demonstrate the possibilities of implementing a sequential algorithm on SQL.

SQL Server - Grouping Combination of possibilities by fixed value

I have to create cheapest basket which inculde fixed items.
For example for a basket which have (5) items
1 and 4 = (1 * 50) + (1 * 100) = 150
2 and 3 = (1 * 60) + (1 * 80) = 140 -- this is my guy
2 and 2 and 1 = (1 * 60) + (1 * 60) + (1 * 50) = 170
3 and 3 = (1 * 80) + (1 * 80) = 160 **** this 6 items but total item can exceed min items. The important thing is total cost...
....
Also this is valid for any number of items a basket may have. Also there are lots of stores and each stores have different package may include several items.
How can handle this issue with SQL?
UPDATE
Here is example data generation code. Recursive CTE solutions are more expensive. I should finish the job under 500-600ms over 600-700 stores each time. this is a package search engine. Manual scenario creation by using ´#temp´ tables or ´UNUION´ is 15-20 times cheaper then Recursive CTE.
Also concatenating Item or PackageId is very expensive. I can found required package id or item after selecting cheapest package with join to source table.
I am expecting a megical solution which can be ultra fast and get the correct option.
Only cheapest basket required for each store. Manual scenario creation is very fast but sometimes fail for correct cheapest basket.
CREATE TABLE #storePackages(
StoreId int not null,
PackageId int not null,
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare #totalStores int
set #totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare #storeId int;
declare #packageId int;
declare #maxPackageForStore int;
declare #itemMinPrice decimal(18,8);
set #storeId = 1;
set #packageId = 1
while(#storeId <= #totalStores)
BEGIN
set #maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set #itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO #storePackages
SELECT DISTINCT
StoreId = #storeId
,PackageId = CAST(#packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = #itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND #maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = #storeId
,PackageId = CAST(#packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (#itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN #maxPackageForStore > 4 THEN 4 ELSE #maxPackageForStore END)
set #packageId = #packageId + #maxPackageForStore;
END
set #storeId =#storeId + 1;
END
SELECT * FROM #storePackages
drop table #storePackages
MY SOLUTION
First of all I am thankful for everyone who try to help me. However all suggested solutions are based on CTE. As I said before recursive CTEs cause performace problems when hunderds of stores are considered. Also multiple packages are requested for one time. This means, I request can include mutiple baskets. One is 5 items other is 3 items and another one is 7 items...
Last Solution
First of all I generates all possible scenarios in a table by item size... By this way, I have option eleminate unwanted scenarios.
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
Then I generated all possible scenario from 2 item to 25 item and insert to the ItemScenarios table. Scenarios can be genereated one time by using WHILE or recursive CTE. The advantage of this way, scenarios generated only for one time.
Resuls are like below.
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
By this way, I can restrict scenario sizes, Max different store, max different package etc.
Also I can eleminate some scenarios which matematically impossible cheapest then other. For example for 4 items request, some scenario
Scenario 1 : 2+2
Scenario 2: 2+1+1
Scenario 3: 1+1+1+1
Among these scenarios; It is impossible Scenario 2 would be cheapest basket. Because,
If Scenario 2 < Scenario 3 --> Scenario 1 would be lower then Scenario 2. Because the thing decreasing cost is 2 item price and **Scenario 1* have double 2 items
Also If Scenario 2 < Scenario 1 --> Scenario 3 would be lower then Scenario 2
Now, If I delete scenarios like Scenario 2 I would gain some performance advantages.
Now I can chose chepest item prices among stores
DECLARE #requestedItems int;
SET #requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = #requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= #requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
SUM
In my test, this way is mimimum 10 times faster then recursive CTE, especially when number of stores and requested items increased. Also It gets 100% correct results. Because recursive CTE tried milions of unrequired JOINs when number of stores and requested items increased.
If you want combinations, you'll need a recursive CTE. Preventing infinite recursion is a challenge. Here is one method:
with cte as (
select cast(packageid as nvarchar(4000)) as packs, item, cost
from t
union all
select concat(cte.packs, ',', t.packageid), cte.item + t.item, cte.cost + t.cost
from cte join
t
on cte.item + t.item < 10 -- some "reasonable" stop condition
)
select top 1 cte.*
from cte
where cte.item >= 5
order by cost desc;
I'm not 100% sure that SQL Server will accept the join condition, but this should work.
Assuming you want to compare all possible permutations of items until the total items in the basket exceeds your total basket number, something like the following would do what you want.
DECLARE #N INT = 1;
DECLARE #myTable TABLE (storeID INT DEFAULT(1), packageID INT IDENTITY(1, 1), item INT, cost INT);
INSERT #myTable (item, cost) VALUES (1, 50), (2, 60), (3, 80), (4, 100), (5, 169), (5, 165), (4, 101), (2, 61);
WITH CTE1 AS (
SELECT item, cost
FROM (
SELECT item, cost, ROW_NUMBER() OVER (PARTITION BY item ORDER BY cost) RN
FROM #myTable) T
WHERE RN = 1)
, CTE2 AS (
SELECT CAST('items'+CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)) items, C1.cost totalCost, C1.item totalItems
FROM CTE1 C1
UNION ALL
SELECT CAST(C2.items + ' + items' + CAST(C1.item AS VARCHAR(10)) AS VARCHAR(4000)), C1.cost + C2.totalCost, C1.item + C2.totalItems
FROM CTE2 C2
CROSS JOIN CTE1 C1
WHERE C2.totalItems < #N)
SELECT TOP 1 *
FROM CTE2
WHERE totalItems >= #N
ORDER BY totalCost, totalItems DESC;
Edited to deal with the issue #Matt mentioned.
Firstly we'll should to find all combinations, and next select one with minimal price for seeking value
DECLARE #Table as TABLE (StoreId INT, PackageId INT, Item INT, Cost INT)
INSERT INTO #Table VALUES (1,1,1,50),(1,2,2,60),(1,3,3,80),(1,4,4,100)
DECLARE #MinItemCount INT = 5;
WITH cteCombinationTable AS (
SELECT cast(PackageId as NVARCHAR(4000)) as Package, Item, Cost
FROM #Table
UNION ALL
SELECT CONCAT(o.Package,',',c.PackageId), c.Item + o.Item, c.Cost + o.Cost FROM #Table as c join cteCombinationTable as o on CONCAT(o.Package,',',c.PackageId) <> Package
where o.Item < #MinItemCount
)
select top 1 *
from cteCombinationTable
where item >= #MinItemCount
order by cast(cost as decimal)/#MinItemCount
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE #MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= #MinItemCount) AND
SUM(ItemPrice) < MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,'<PackageId>' + CAST(PackageId AS VARCHAR(MAX)) + '</PackageId>' AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + '<PackageId>' + CAST(t.PackageId AS VARCHAR(MAX)) + '</PackageId>'
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,CAST(PackageIds AS XML) AS PackageIds
,CombinedCost
,CombinedItemCount
,DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS CombinedCostRowNum
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS PseudoCartId
FROM
cteRecursive
WHERE
CombinedItemCount >= #MinItemCount
)
SELECT DISTINCT
c.StoreId
,x.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
CROSS APPLY (
SELECT( STUFF ( (
SELECT ',' + PackageId
FROM
(SELECT T.N.value('.','VARCHAR(100)') as PackageId FROM c.PackageIds.nodes('PackageId') as T(N)) p
ORDER BY
PackageId
FOR XML PATH(''), TYPE ).value('.','NVARCHAR(MAX)'), 1, 1, '')
) as PackageIds
) x
WHERE
CombinedCostRowNum = 1
SELECT *
FROM
#TestResults
Takes about 1000-2000 MS varies widely depending on combinations that have to be considered within test data (e.g. some times more or less data is generate by your script).
this answer no doubt looks a bit more complicated than Gordon's or ZLKs but it handles Ties, repeated values, 1 package meeting the criteria and a few other things. The main difference however is really in the last query where I take the XML that was build during the recursive query split it and then re-combined in order so that you can use DISTINCT and get a unique pairing e.g. package 2 + package 3 = 140 & package 3 + package 2 = 140 would be the first 2 results in all of the queries so using the XML to split then recombine allows that to be a single row. But lets say you also had another row such as (1,5,2,60) that had 2 items and a cost of 60 this query will return that combination too.
You can cherry pick between the answers and use their method to get to the combinations and my methods to get to the final results etc.... But to explain the process of my query.
cteMaxCostToConsider - this is just a way of getting a cost to contain the recursive query to so that less records have to be considered. what it does is determines the cost of all of the packages together or the cost if you bought all of the same package to satisfy the minimum count.
cteRecursive - this is similar to ZLKs answer and a litte like Gordon's but what it does is goes out and continues to add items & item combinations until it reaches MaxCostToConsider. If I limit to look at item count it could miss a situation where 7 items would be cheaper than 5 so by constraining to the determined Combined Cost it limits the recursion and performs better.
cteCombinedCostRowNum - This simply finds the lowest Combined Cost and at least the minimum item count.
The final query is a bit trickier but the cross apply splits the XML string build in the recursive cte to different rows re-orders those rows and then concatenates them again so that the reverse combination e.g. Package 2 & Package 3 reverse Package 3 & Package 2 becomes the same record and then calls distinct.
This is a bit more flexible than SELECT top N. To see the difference add the following test cases to your test data 1 at a time:
(StoreId, PackageId, Item, Cost)
(1,5,2,60)
(1,6,1,1),(1,7,1,1)
(1,8,50,1)
Edited. The above will give you every combination of a store that will have the lowest combined cost. The bug that you noted was due to cteMaxCostToConsider. I was using SUM(ItemPrice) but sometimes SUM(ItemCount) related to it didn't have enough items in it to allow it to be considered for the MaxCostToConsider. I modified the case statement to correct that issue.
I have also modified to work with your data example your provided. NOTE you should change your PackageId in that to an IDENTITY column though because I was getting the duplicate PackageIds within a store with the method you used.
Here is a modified version of your script to see what I am talking about:
IF OBJECT_ID('storePackages') IS NOT NULL
BEGIN
DROP TABLE storePackages
END
CREATE TABLE storePackages(
StoreId int not null,
PackageId int not null IDENTITY(1,1),
ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
ItemCount int not null,
ItemPrice decimal(18,8) not null,
MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
)
declare #totalStores int
set #totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())
declare #storeId int;
declare #packageId int;
declare #maxPackageForStore int;
declare #itemMinPrice decimal(18,8);
set #storeId = 1;
set #packageId = 1
while(#storeId <= #totalStores)
BEGIN
set #maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
set #itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
BEGIN
INSERT INTO storePackages (StoreId, ItemType, ItemCount, ItemPrice, MaxFullQouta, MaxItemQouta)
SELECT DISTINCT
StoreId = #storeId
--,PackageId = CAST(#packageId + number AS int)
,ItemType = 0
,ItemCount = number
,ItemPrice = #itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 1 AND #maxPackageForStore
UNION ALL
SELECT DISTINCT
StoreId = #storeId
--,PackageId = CAST(#packageId + number AS int)
,ItemType = 1
,ItemCount = 1
,ItemPrice = (#itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2 ORDER BY NEWID()))
,MaxItemQouta = #maxPackageForStore
,MaxFullQouta = #maxPackageForStore + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
FROM master..[spt_values] pkgNo
WHERE number BETWEEN 2 AND (CASE WHEN #maxPackageForStore > 4 THEN 4 ELSE #maxPackageForStore END)
--set #packageId = #packageId + #maxPackageForStore;
END
set #storeId =#storeId + 1;
END
SELECT * FROM storePackages
--drop table #storePackages
No PackageIds Simply StoreId and Lowest CombinedCost - ~200-300MS depending on data
Next if you don't care what Packages are in there and you only want 1 row per store you can do the following:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE #MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= #MinItemCount) AND
SUM(ItemPrice) < MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
SELECT
StoreId
,MIN(CombinedCost) as CombinedCost
INTO #TestResults
FROM
cteRecursive
WHERE
CombinedItemCount >= #MinItemCount
GROUP BY
StoreId
SELECT *
FROM
#TestResults
WITH PackageIds Only 1 Record Per StoreId - Varries widely depending on test data/combinations to consider ~600-1300MS
Or if you still want package ids but you don't care which combination you choose and you only want 1 record then you can do:
IF OBJECT_ID('tempdb..#TestResults') IS NOT NULL
BEGIN
DROP TABLE #TestResults
END
DECLARE #MinItemCount INT = 5
;WITH cteMaxCostToConsider AS (
SELECT
StoreId
,CASE
WHEN (SUM(ItemCount) >= #MinItemCount) AND
SUM(ItemPrice) < MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice) THEN SUM(ItemPrice)
ELSE MIN(((#MinItemCount / ItemCount) + IIF((#MinItemCount % ItemCount) > 0, 1,0)) * ItemPrice)
END AS MaxCostToConsider
FROM
storePackages
GROUP BY
StoreId
)
, cteRecursive AS (
SELECT
StoreId
,CAST(PackageId AS VARCHAR(MAX)) AS PackageIds
,ItemCount AS CombinedItemCount
,CAST(ItemPrice AS decimal(18,8)) AS CombinedCost
FROM
storePackages
UNION ALL
SELECT
r.StoreId
,r.PackageIds + ',' + CAST(t.PackageId AS VARCHAR(MAX))
,r.CombinedItemCount + t.ItemCount
,CAST(r.CombinedCost + t.ItemPrice AS decimal(18,8))
FROM
cteRecursive r
INNER JOIN storePackages t
ON r.StoreId = t.StoreId
INNER JOIN cteMaxCostToConsider m
ON r.StoreId = m.StoreId
AND r.CombinedCost + t.ItemPrice <= m.MaxCostToConsider
)
, cteCombinedCostRowNum AS (
SELECT
StoreId
,PackageIds
,CombinedCost
,CombinedItemCount
,ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY CombinedCost) AS RowNumber
FROM
cteRecursive
WHERE
CombinedItemCount >= #MinItemCount
)
SELECT DISTINCT
c.StoreId
,c.PackageIds
,c.CombinedItemCount
,c.CombinedCost
INTO #TestResults
FROM
cteCombinedCostRowNum c
WHERE
RowNumber = 1
SELECT *
FROM
#TestResults
Note all bench marking is done on a 4 year old laptop Intel i7-3520M CPU 2.9 GHz with 8 GB of RAM and SAMSUNG 500 GB EVO SSD. So if you run this on an appropriately resourced server I would expect exponentially faster. There is also no doubt that adding indexes on storePackages would expedite the answer as well.
MY SOLUTION
First of all I am thankful for everyone who try to help me. However all suggested solutions are based on CTE. As I said before recursive CTEs cause performace problems when hunderds of stores are considered. Also multiple packages are requested for one time. This means, A request can include mutiple baskets. One is 5 items other is 3 items and another one is 7 items...
Last Solution
First of all I generates all possible scenarios in a table by item size... By this way, I have option eleminate unwanted scenarios.
CREATE TABLE ItemScenarios(
Item int,
ScenarioId int,
CalculatedItem int --this will be joined with Store Item
)
Then I generated all possible scenario from 2 item to 25 item and insert to the ItemScenarios table. Scenarios can be genereated one time by using WHILE or recursive CTE. The advantage of this way, scenarios generated only for one time.
Resuls are like below.
Item | ScenarioId | CalculatedItem
--------------------------------------------------------
2 1 2
2 2 3
2 3 1
2 3 1
3 4 5
3 5 4
3 6 3
3 7 2
3 7 2
3 8 2
3 8 1
3 9 1
3 9 1
3 9 1
....
.....
......
25 993 10
By this way, I can restrict scenario sizes, Max different store, max different package etc.
Also I can eleminate some scenarios which matematically impossible cheapest then other. For example for 4 items request, some scenario
Scenario 1 : 2+2
Scenario 2: 2+1+1
Scenario 3: 1+1+1+1
Among these scenarios; It is impossible Scenario 2 would be cheapest basket. Because,
If Scenario 2 < Scenario 3 --> Scenario 1 would be lower then Scenario 2. Because the thing decreasing cost is 2 item price and **Scenario 1* have double 2 items
Also If Scenario 2 < Scenario 1 --> Scenario 3 would be lower then Scenario 2
Now, If I delete scenarios like Scenario 2 I would gain some performance advantages.
Now I can chose chepest item prices among stores
DECLARE #requestedItems int;
SET #requestedItems = 5;
CREATE TABLE #JoinedPackageItemWithScenarios(
StoreId int not null,
PackageId int not null,
ItemCount int not null,
ItemPrice decimal(18,8)
ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
SELECT
SPM.StoreId
,SPM.PackageId
,SPM.ItemCount
,SPM.ItemPrice
,SPM.ScenarioId
FROM (
SELECT
SP.StoreId
,SP.PackageId
,SP.ItemCount
,SP.ItemPrice
,SC.ScenarioId
,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice)
FROM ItemScenarios SC
LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
WHERE SC.Item = #requestedItems
) SPM
WHERE SPM.RowNumber = 1
-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
CREATE TABLE #selectedScenarios(
StoreId int not null,
ScenarioId int not null,
TotalItem int not null,
TotalCost decimal(18,8)
)
INSERT INTO #selectedScenarios
SELECT
StoreId
,ScenarioId
,TotalItem
,TotalCost
FROM (
SELECT
StoreId
,ScenarioId
--,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
,TotalItem = SUM(ItemCount)
,TotalCost = SUM(ItemPrice)
,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
FROM #JoinedPackageItemWithScenarios JPS
GROUP BY StoreId,ScenarioId
HAVING(SUM(ItemCount) >= #requestedItems)
) SLECTED
WHERE RowNumber = 1
-- NOW WE CAN POPULATE PackageIds if needed
SELECT
SS.StoreId
,SS.ScenarioId
,TotalItem = MAX(SS.TotalItem)
,TotalCost = MAX(SS.TotalCost)
,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
FROM #selectedScenarios SS
JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
GROUP BY SS.StoreId,SS.ScenarioId
SUM
In my test, this way is mimimum 10 times faster then recursive CTE, especially when number of stores and requested items increased. Also It gets 100% correct results. Because recursive CTE tried milions of unrequired JOINs when number of stores and requested items increased.

SQL query to select percentage of total

I have a MSSQL table stores that has the following columns in a table:
Storeid, NumEmployees
1 125
2 154
3 10
4 698
5 54
6 98
7 87
8 100
9 58
10 897
Can someone help me with the SQL query to produce the top stores(storeID) that has 30% of the total emplyees(NumEmployees)?
WITH cte
AS (SELECT storeid,
numemployees,
( numemployees * 100 ) / SUM(numemployees) OVER (PARTITION BY 1)
AS
percentofstores
FROM stores)
SELECT *
FROM cte
WHERE percentofstores >= 30
ORDER BY numemployees desc
Working Demo
Alternative that doesn't use SUM/OVER
SELECT s.storeid, s.numemployees
FROM (SELECT SUM(numemployees) AS [tots]
FROM stores) AS t,
stores s
WHERE CAST(numemployees AS DECIMAL(15, 5)) / tots >= .3
ORDER BY s.numemployees desc
Working Demo
Note that in the second version I decided not to multiply by 100 before dividing. This requires a cast to decimal otherwise it would be implicitly converted to a int resulting in no records returned
Also I'm not completely clear that you want this, but you can add TOP 1 to both queries and it will limit the results to just the one with the greatest # of stores with more than 30%
UPDATE
Based on your comments it sounds to paraphrase Kevin
You want the rows, starting at the store with the most employees and working down until you have at least 30 %
This is difficult because it requires a running percentage and its a bin packing problem however this does work. Note I've included two other test cases (where the percent exactly equals and its just over the top two combined)
Working Demo
DECLARE #percent DECIMAL (20, 16)
SET #percent = 0.3
--Other test values
--SET #percent = 0.6992547128452433
--SET #percent = 0.6992547128452434
;WITH sums
AS (SELECT DISTINCT s.storeid,
s.numemployees,
s.numemployees + Coalesce(SUM(s2.numemployees) OVER (
PARTITION
BY
s.numemployees), 0)
runningsum
FROM stores s
LEFT JOIN stores s2
ON s.numemployees < s2.numemployees),
percents
AS (SELECT storeid,
numemployees,
runningsum,
CAST(runningsum AS DECIMAL(15, 5)) / tots.total
running_percent,
Row_number() OVER (ORDER BY runningsum, storeid ) rn
FROM sums,
(SELECT SUM(numemployees) total
FROM stores) AS tots)
SELECT p.storeID,
p.numemployees,
p.running_percent,
p.running_percent,
p.rn
FROM percents p
CROSS JOIN (SELECT MAX(rn) rn
FROM percents
WHERE running_percent = #percent) exactpercent
LEFT JOIN (SELECT MAX(rn) rn
FROM percents
WHERE running_percent <= #percent) underpercent
ON p.rn <= underpercent.rn
OR ( exactpercent.rn IS NULL
AND p.rn <= underpercent.rn + 1 )
WHERE
underpercent.rn is not null or p.rn = 1