Distribute large quantities over multiple rows - sql

I have a simple Order table and one order can have different products with Quantity and it's Product's weight as below
OrderID
ProductName
Qty
Weight
101
ProductA
2
24
101
ProductB
1
24
101
ProductC
1
48
101
ProductD
1
12
101
ProductE
1
12
102
ProductA
5
60
102
ProductB
1
12
I am trying to partition and group the products in such a way that for an order, grouped products weight should not exceed 48.
Expected table look as below
OrderID
ProductName
Qty
Weight
GroupedID
101
ProductA
2
24
1
101
ProductB
1
24
1
101
ProductC
1
48
2
101
ProductD
1
12
3
101
ProductE
1
12
3
102
ProductA
4
48
1
102
ProductA
1
12
2
102
ProductB
1
12
2
Kindly let me know if this is possible.
Thank you.

This is a bin packing problem which is non-trivial in general. It's not just NP-complete but superexponential, ie the time increase as complexity increases is worse than exponential. Dai posted a link to Hugo Kornelis's article series which is referenced by everyone trying to solve this problem. The set-based solution performs really bad. For realistic scenarios you need iteration and preferably, using bin packing libraries eg in Python.
For production work it would be better to take advantage of SQL Server 2017+'s support for Python scripts and use a bin packing library like Google's OR Tools or the binpacking module. Even if you don't want to use sp_execute_external_script you can use a Python script to read the data from the database and split them.
The question's numbers are so regular though you could cheat a bit (actually quite a lot) and distribute all order lines into individual items, calculate the running total per order and then divide the total by the limit to produce the group number.
This works only because the running totals are guaranteed to align with the bin size.
Distributing into items can be done using a Tally/Numbers table, a table with a single Number column storing numbers from 0 to eg 1M.
Given the question's data:
declare #OrderItems table(id int identity(1,1) primary key, OrderID int,ProductName varchar(20),Qty int,Weight int)
insert into #OrderItems(OrderId,ProductName,Qty,Weight)
values
(101,'ProductA',2,24),
(101,'ProductB',1,24),
(101,'ProductC',1,48),
(101,'ProductD',1,12),
(101,'ProductE',1,12),
(102,'ProductA',5,60),
(102,'ProductB',1,12);
The following query will split each order item into individual items. It repeats each order item row as there are individual items and calculates the individual item weight
select o.*, Weight/Qty as ItemWeight
from #OrderItems o inner join Numbers ON Qty >Numbers.Number;
This row:
1 101 ProductA 2 24
Becomes
1 101 ProductA 2 24 12
1 101 ProductA 2 24 12
Calculating the running total inside a query can be done with :
SUM(ItemWeight) OVER(Partition By OrderId
Order By Itemweight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
The Order By Itemweight claus means the smallest items are picked first, ie it's a Worst fit algorithm.
The overall query calculating the total and Group ID is
with items as (
select o.*, Weight/Qty as ItemWeight
from #OrderItems o INNER JOIN Numbers ON Qty > Numbers.Number
)
select Id,OrderId,ProductName,Qty,Weight, ItemWeight,
ceiling(SUM(ItemWeight) OVER(Partition By OrderId
Order By Itemweight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/48.0)
As GroupId
from items;
After that, individual items need to be grouped back into order items and groups. This produces the final query:
with items as (
select o.*, Weight/Qty as ItemWeight
from #OrderItems o INNER JOIN Numbers ON Qty > Numbers.Number
)
,bins as(
select Id,OrderId,ProductName,Qty,Weight, ItemWeight,
ceiling(SUM(ItemWeight) OVER(Partition By OrderId
Order By Itemweight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/48.0) As GroupId
from items
)
select
max(OrderId) as orderid,
max(productname) as ProductName,
count(*) as Qty,
sum(ItemWeight) as Weight,
max(GroupId) as GroupId
from bins
group by id,groupid
order by orderid,groupid
This returns
orderid
ProductName
Qty
Weight
GroupId
101
ProductA
2
24
1
101
ProductD
1
12
1
101
ProductE
1
12
1
101
ProductB
1
24
2
101
ProductC
1
48
3
102
ProductA
4
48
1
102
ProductA
1
12
2
102
ProductB
1
12
2

Related

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

Access sql Moving Average of Top N With 2 criterias

I have been searching the forum and found a single post that is a little smilair to my problem here: Calculate average for Top n combined with SQL Group By.
My situation is:
I have a table tblWEIGHT that contains: ID, Date, idPONR, Weight
I have a second table tblSALES that contains: ID, Date, Sales, idPONR
I have a third table tblPONR that contains: ID, PONR, idProduct
And a fouth table tblPRODUCT that contais: ID, Product
The linking:
tblWEIGHT.idPONR = tblPONR.ID
tblSALES.idPONR = tblPONR.ID
tblPONR.idProduct = tblPRODUCT.ID
The maintable of my query is tblSALES. I want to all my sales listed, with the moving average of the top5
weights of the PRODUCT where the date of the weight is less than the sales date, and the product is the same as the sold product. Its IMPORTANT that the result isn't grouped by the date. I need all the records of tblSALES.
i have gotten as far as to get the top 1 weight, but im not able to get the moving average instread.
The query that gest the top 1 is the following, and i am guessing that the query i need is going to look a lot like it.
SELECT tblSALES.ID, tblSALES.Dato, tblPONR.idPRODUCT,
(
SELECT top 1 Weight FROM tblWEIGHT INNER JOIN tblPONR ON tblWeight.idPONR = tblPONR.ID
WHERE tblPONR.idPRODUCT = idPRODUCT AND
SALES.Date > tblWEIGHT.Date
ORDER BY tblWEIGHT.Date desc
) AS LatestWeight
FROM tblSALES INNER JOIN VtblPONR ON tblSALES.idPONR = tblPONR.ID
this is not my exact query since im danish and i wouldnt make sense. I know im not supposed to use Date as a fieldname.
i imagine the filan query would be something like:
SELECT tblSALES.ID..... avg(SELECT TOP 5 weight .........)
but doing this i keep getting error at max 1 record can be returned by this subquery
Final Question.
How do i make a query that creates a moving average of the top 5 weights of my sold product, where the date of the weight is earlier than the date i sold the product?
EDIT Sampledata:
DATEFORMAT: dd/mm/yyyy
tblWEIGHT
ID Date idPONR Weight
1 01-01-2020 1 100
2 02-01-2020 2 200
3 03-01-2020 3 200
4 04-01-2020 3 400
5 05-01-2020 2 250
6 06-01-2020 1 150
7 07-01-2020 2 200
tblSALES
ID Date Sales(amt) idPONR
1 05-01-2020 30 1
2 06-01-2020 15 2
3 10-01-2020 20 3
tblPONR
ID PONR(production Number) idProduct
1 2521 1
2 1548 1
3 5484 2
tblPRODUCT
ID Product
1 Bricks
2 Tiles
Desired outcome read comments for AvgWeight
tblSALES.ID tblSALES.Date tblSales.Sales(amt) AvgWeigt
1 05-01-2020 30 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<05-01-2020)
2 06-01-2020 15 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<06-01-2020)
3 10-01-2020 20 123 -->avg(top 5 newest weight of idPONR 3 since thats the only idPONR with that product, and where tblWeight.Date<10-01-2020)
Consider:
Query1
SELECT tblWeight.ID AS WeightID, tblWeight.Date AS WtDate,
tblWeight.idPONR, tblPONR.PONR, tblPONR.idProduct, tblWeight.Weight, tblSales.SalesAmt,
tblSales.ID AS SalesID, tblSales.Date AS SalesDate
FROM (tblPONR INNER JOIN tblWeight ON tblPONR.ID = tblWeight.idPONR)
INNER JOIN tblSales ON tblPONR.ID = tblSales.idPONR;
Query2
SELECT * FROM Query1 WHERE WeightID IN (
SELECT TOP 5 WeightID FROM Query1 AS Dupe WHERE Dupe.idProduct = Query1.idProduct
AND Dupe.WtDate<Query1.SalesDate ORDER BY Dupe.WtDate);
Query3
SELECT Query2.SalesID, Query2.SalesDate, Query2.SalesAmt,
First(DAvg("Weight","Query2","idProduct=" & [idProduct] & " AND WtDate<#" & [SalesDate] & "#")) AS AvgWt
FROM Query2
GROUP BY Query2.SalesID, Query2.SalesDate, Query2.SalesAmt;

Dividing a sum value into multiple rows due to field length constraint

I am migrating financial data from a very large table (100 million+ of rows) by summarizing the amount and insert them into summary table. I ran into problem when the summary amount (3 billions) is larger than what the field in the summary table can hold (can only hold up to 999 millions.) Changing the field size is not an option as it requires a change process.
The only option I have is to divide the amount (the one that breach the size limit) into smaller ones so it can be inserted into the table.
I came across this SQL - I need to divide a total value into multiple rows in another table which is similar except the number of rows I need to insert is dynamic.
For simplicity, this is how the source table might look like
account_table
acct_num | amt
-------------------------------
101 125.00
101 550.00
101 650.00
101 375.00
101 475.00
102 15.00
103 325.00
103 875.00
104 200.00
104 275.00
The summary records are as follows
select acct_num, sum(amt)
from account_table
group by acct_num
Account Summary
acct_num | amt
-------------------------------
101 2175.00
102 15.00
103 1200.00
104 475.00
Assuming the maximum value in the destination table is 1000.00, the expected output will be
summary_table
acct_num | amt
-------------------------------
101 1000.00
101 1000.00
101 175.00
102 15.00
103 1000.00
103 200.00
104 475.00
How do I create a query to get the expected result? Thanks in advance.
You need a numbers table. If you have a handful of values, you can define it manually. Otherwise, you might have one on hand or use a similar logic:
with n as (
select (rownum - 1) as n
from account_table
where rownum <= 10
),
a as (
select acct_num, sum(amt) as amt
from account_table
group by acct_num
)
select acct_num,
(case when (n.n + 1) * 1000 < amt then 1000
else amt - n.n * 1000
end) as amt
from a join
n
on n.n * 1000 < amt ;
A variation along these lines might give some ideas (using the 1,000 of your sample data):
WITH summary AS (
SELECT acct_num
,TRUNC(SUM(amt) / 1000) AS times
,MOD(SUM(amt), 1000) AS remainder
FROM account_table
GROUP BY acct_num
), x(acct_num, times, remainder) AS (
SELECT acct_num, times, remainder
FROM summary
UNION ALL
SELECT s.acct_num, x.times - 1, s.remainder
FROM summary s
,x
WHERE s.acct_num = x.acct_num
AND x.times > 0
)
SELECT acct_num
,CASE WHEN times = 0 THEN remainder ELSE 1000 END AS amt
FROM x
ORDER BY acct_num, amt DESC
The idea is to first build a summary table with div and modulo:
ACCT_NUM TIMES REMAINDER
101 2 175
102 0 15
103 1 200
104 0 475
Then perform a hierarchical query on the summary table based on the number of "times" (i.e. rows) you want, with an extra for the remainder.
ACCT_NUM AMT
101 1000
101 1000
101 175
102 15
103 1000
103 200
104 475

Selecting ID's based on multiple subquery

This my my first post to Stack Overflow, I appreciate and will take in any positive criticism to better form any future questions.
Question:
I'm trying to create a Select query where to gather all orders which have only the top 8 items in them.
I'm working with MS-Access 2013.
My current Query, which doesn't work, looks like this.
SELECT OrderID
From DirectOrders
WHERE OrderID <> ANY
(
SELECT OrderID
FROM DirectOrders
WHERE SKU <> ANY
(
SELECT TOP 8 SKU
FROM DirectOrders
GROUP BY SKU
ORDER BY COUNT(SKU) DESC
)
)
The single table that is below.
OrderID Customer SKU Qty
177622 CustomerA 1001 20
177622 CustomerA 1002 2
177624 CustomerB 1001 200
177626 CustomerC 1003 50
177626 CustomerC 1004 150
177630 CustomerC 1005 1000
177632 CustomerA 1006 1
177632 CustomerA 1007 3
177632 CustomerA 1008 9
177632 CustomerA 1009 1
177632 CustomerA 1010 4
177632 CustomerA 1011 3
177634 CustomerC 1012 5
177634 CustomerC 1013 5
177640 CustomerD 1014 4
177642 CustomerA 1015 4
177642 CustomerA 1016 48
177642 CustomerA 1017 15
177644 CustomerB 1018 50
Here was the flow that I was trying to accomplish.
Select Top 8 SKU's by Count
Select All OrderID's that do not have one of those 8 SKU's
Select All OrderID's That are not part of the selected OrderID's in List 2.
I would do this with aggregation:
SELECT do.OrderID
FROM DirectOrders as do LEFT JOIN
(SELECT TOP 8 SKU
FROM DirectOrders
GROUP BY SKU
ORDER BY COUNT(SKU) DESC, SKU
) as s8
ON do.SKU = s8.SKU
GROUP BY do.OrderId
HAVING COUNT(*) = COUNT(s8.SKU);
Notes:
In MS Access, TOP is really TOP WITH TIES. To get exactly 8 values you need a tie breaker. This query uses SKU for that purpose.
The LEFT JOIN determines if there is a match between each item in an order and the top 8 items.
The HAVING clause is saying: The count of rows with items is the same as the count of rows that match one of the top 8. Hence, all are in the order.
I think you need something like this. However, you might get some strange results if you are using count SKU because apart from 1001 the count of the other SKUs is 1. So apart from 1001, all the other SKUs are in the top 8 based on the count(SKU)
SELECT * FROM DirectOrderswhere SKU in
(select top 8 SKU from DirectOrders group by SKU order by count(SKU) desc);
Access's TOP function does not break ties, so instead of reporting just the top 8, it will order the items per your order by and then report enough to cover the top value you put in, and all ties. For example, with your sample data, it will report the same 17 records if you do top 8 or just top 2, since all but one of your SKUs have only 1 order.
If you want to report only the top 8, you should add to the query to make the ordering unique. In this case, I would probably order by COUNT(SKU) DESC, COUNT(QTY) DESC, MAX(ORDERID) desc, SKU So that it prioritizes highest number of orders, highest quantity, and then makes a choice based on the latest OrderID with that sku, and if all else fails, order by the SKU itself. Only the SKU is guaranteed to be unique for each row, but just ordering by SKU might not give the best result if you are looking for the truly relevant "top 8".
SELECT OrderID
From DirectOrders
WHERE OrderID NOT IN
(
SELECT OrderID
FROM DirectOrders
WHERE SKU NOT IN
(
SELECT TOP 8 SKU
FROM DirectOrders
GROUP BY SKU
ORDER BY COUNT(SKU) DESC, SUM(QTY) DESC, MAX(ORDERID) DESC, SKU
)
)

SQL summary by ID with period to period comparison

I am a beginner in SQL, hope someone can help me on this:
I have a Items Category Table:
ItemID | ItemName | ItemCategory | Active/Inactive
100 Carrot Veg Yes
101 Apple Fruit Yes
102 Beef Meat No
103 Pineapple Fruit Yes
And I have a sales table:
Date | ItemID | Sales
01/01/2010 100 50
05/01/2010 101 200
06/01/2010 101 250
06/01/2010 102 300
07/01/2010 103 50
08/01/2010 100 100
10/01/2010 102 250
How Can I achieve a sales summary table by Item By Period as below (with only active item)
ItemID | ItemName | ItemCategory | (01/01/2010 – 07/01/2010) | (08/01/2010 – 14/01/1020)
100 Carrot Veg 50 100
101 Apple Fruit 450 0
103 Pineapple Fruit 0 0
A very dirty solution
SELECT s.ItemId,
(SELECT ItemName FROM Items WHERE ItemId = s.ItemId) ItemName,
ISNULL((SELECT Sum(Sales)FROM sales
WHERE [Date] BETWEEN '2010/01/01' AND '2010/01/07'
AND itemid = s.itemid
GROUP BY ItemId),0) as firstdaterange,
ISNULL((SELECT Sum(Sales)FROM sales
WHERE [Date] BETWEEN '2010/01/08' AND '2010/01/14'
AND itemid = s.itemid
GROUP BY ItemId), 0) seconddaterange
FROM Sales s
INNER JOIN Items i ON s.ItemId = i.ItemId
WHERE i.IsActive = 'Yes'
GROUP BY s.ItemId
Again a dirty solution, also the dates are hardcoded. You can probably turn this into a stored procedure taking in the dates as parameters.
I'm not too clued up on PIVOT command but maybe that will be worth a google.
You can pivot the data using the SQL PIVOT operator. Unfortunately, that operator has limited scope due to the requirement to pre-specify the output columns.
You normally achieve this by grouping on a calculated column (in this case, one that computes the week number or first day of the week in which each row falls). You can then either generate SQL on-the-fly with columns derived using SELECT DISTINCT week FROM result, or just drop the result into Excel and use its pivot table facility.