Approach to a Bin Packing sql problem - sql

I have a problem in sql where I need to generate a packing list from a list of transactions.
Data Model
The transactions are stored in a table that contains:
transaction id
item id
item quantity
Each transaction can have multiple items (and coincidentally multiple rows with the same transaction id). Each item then has a quantity from 1 to N.
Business Problem
The business requires that we create a packing list, where each line item in the packing list contains the count of each item in the box.
Each box can only contain 160 items (they all happen to be the same size/weight). Based on the total count of the order we need to split items into different boxes (sometimes splitting even the individual item's collection into two boxes)
So the challenge is to take that data schema and come up with the result set that includes how many of each item belong in each box.
I am currently brute forcing this in some not so pretty ways and wondering if anyone has an elegant/simple solution that I've overlooked.
Example In/Out
We really need to isolate how many of each item end up in each box...for example:
Order 1:
100 of item A100 of item B140 of item C
This should result in three rows in the result set:
Box 1: A (100), B (60) Box 2: B(40), C (120) Box 3: C(20)
Ideally the query would be smart enough to put all of C together, but at this point - we're not too concerned with that.

How about something like
SELECT SUM([Item quantity]) as totalItems
, SUM([Item quantity]) / 160 as totalBoxes
, MOD(SUM([Item Quantity), 160) amountInLastBox
FROM [Transactions]
GROUP BY [Transaction Id]
Let me know what fields in the resultset you're looking for and I could come up with a better one

I was looking for something similar and all I could achieve was expanding the rows to the number of item counts in a transaction, and grouping them into bins. Not very elegant though.. Moreover, because string aggregation is still very cumbersome in SQL Server (Oracle, i miss you!), I have to leave the last part out. I mean putting the counts in one single row..
My solution is as follows:
Example transactions table:
INSERT INTO transactions
(trans_id, item, cnt) VALUES
('1','A','50'),
('2','A','140'),
('3','B','100'),
('4','C','80');
GO
Create a dummy sequence table, which contains numbers from 1 to 1000 (I assume that maximum number allowed for an item in a single transaction is 1000):
CREATE TABLE numseq (n INT NOT NULL IDENTITY) ;
GO
INSERT numseq DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 1000 INSERT numseq DEFAULT VALUES ;
GO
Now we can generate a temporary table from transactions table, in which each transaction and item exist "cnt" times in a subquery, and then give numbers to the bins using division, and group by bin number:
SELECT bin_nr, item, count(*) count_in_bin
INTO result
FROM (
SELECT t.item, ((row_number() over (order by t.item, s.n) - 1) / 160) + 1 as bin_nr
FROM transactions t
INNER JOIN numseq s
ON t.cnt >= s.n -- join conditionally to repeat transaction rows "cnt" times
) a
GROUP BY bin_id, item
ORDER BY bin_id, item
GO
Result is:
bin_id item count_in_bin
1 A 160
2 A 30
2 B 100
2 C 30
3 C 50
In Oracle, the last step would be as simple as that:
SELECT bin_id, WM_CONCAT(CONCAT(item,'(',count_in_bin,')')) contents
FROM result
GROUP BY bin_id

This isn't the prettiest answer but I am using a similar method to keep track of stock items through an order process, and it is easy to understand, and may lead to you developing a better method than I have.
I would create a table called "PackedItem" or something similar. The columns would be:
packed_item_id (int) - Primary Key, Identity column
trans_id (int)
item_id (int)
box_number (int)
Each record in this table represents 1 physical unit you will ship.
Lets say someone adds a line to transaction 4 with 20 of item 12, I would add 20 records to the PackedItem table, all with the transaction ID, the Item ID, and a NULL box number. If a line is updated, you need to add or remove records from the PackedItem table so that there is always a 1:1 correlation.
When the time comes to ship, you can simply
SELECT TOP 160 FROM PackedItem WHERE trans_id = 4 AND box_number IS NULL
and set the box_number on those records to the next available box number, until no records remain where the box_number is NULL. This is possible using one fairly complicated UPDATE statement inside a WHILE loop - which I don't have the time to construct fully.
You can now easily get your desired packing list by querying this table as follows:
SELECT box_number, item_id, COUNT(*) AS Qty
FROM PackedItem
WHERE trans_id = 4
GROUP BY box_number, item_id
Advantages - easy to understand, fairly easy to implement.
Pitfalls - if the table gets out of sync with the lines on the Transaction, the final result can be wrong; This table will get many records in it and will be extra work for the server. Will need each ID field to be indexed to keep performance good.

Related

Limiting output of rows based on count of values in another table?

As a base example, I have a query that effectively produces a table with a list of values (ID numbers), each of which is attached to a specific category. As a simplified example, it would produce something like this (but at a much larger scale):
IDS
Categories
12345
type 1
12456
type 6
77689
type 3
32456
type 4
67431
type 2
13356
type 2
.....
.....
Using this table, I want to populate another table that gives me a list of ID numbers, with a limit placed on how many of each category are in that list, cross referenced against a sort of range based chart. For instance, if there are 5-15 IDS of type 1 in my first table, I want the new table with the column of IDS to have 3 type 1 IDS in it, if there are 15-30 type 1 IDS in the first table, I want to have 6 type 1 IDS in the new table.
This sort of range based limit would apply to each category, and the IDS would all populate the same column in the new table. The order, or specific IDS that end up in the final table don't matter, as long as the correct number of IDS end up as a part of that final list of ID numbers. This is being used to provide a semi-random sampling of ID numbers based on categories for a sort of QA related process.
If parts of this are unclear I can do my best to explain more. My initial thought was using a variable for a limit clause, but that isnt possible. I have been trying to sort out how to do this with a case statement but I'm really just not making any headway there but I feel like I am at this sort of paper thin wall I just can't break through.
You can use two window functions:
COUNT to keep track of the amount of ids for each category
ROW_NUMBER to uniquely identify each id within each category
Once you have collected these information, it's sufficient to keep all those rows that satisfy either of the following conditions:
count of rows less or equal to 30 >> ranking less or equal to 6
count of rows less or equal to 15 >> ranking less or equal to 3
WITH cte AS (
SELECT IDS,
Categories,
ROW_NUMBER() OVER(ORDER BY IDS PARTITION BY Categories) AS rn
COUNT(IDS) OVER(PARTITION BY Categories) AS cnt
FROM tab
)
SELECT *
FROM cte
WHERE (rn <= 3 AND cnt <= 15)
OR (rn <= 6 AND cnt <= 30)
Note: If you have concerns regarding a specific ordering, you need to fix the ORDER BY clause inside the ROW_NUMBER window function.

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.
The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP

How to return sample row from database one by one

Web page should show one product image for specific product category from PostgreSql database.
This image should changed automatically to other image after every 25 seconds.
Returned product may be random or in some sequence. Some product may be missing and some repeated but most of the products in criteria should returned.
Total available image count may change slightly between sample retrieval
Currently code below is used which is executed after every 25 seconds.
This requires two queries to database: one for count which may be slwo and second for
single image retrieval. In both cases where clauses are duplicated, in real application where clause is very big and changing it requires changes in two places.
How to improve this so that single query returns sample ?
Column types cannot changed, natural primary keys are used. Additional columns, triggers, indexes, sequences can added if this helps.
ASP.NET/Mono MVC3 , npgsql are used.
$count = select count(*)
from products
where prodtype=$sometype and productid in (select productid from images);
$random = next random integer between 0 .. $count-1;
-- $productsample is result: desired sample product
$productsample = select product
from products
where prodtype=$sometype and productid in (select productid from images)
offset $random
limit 1;
create table products ( productid char(20) primary key,
prodtype char(10) references producttype
);
create table images(
id serial primary key,
productid char(20) references products,
mainimage bool
);
An order by will always be expensive specially if the expression in the order by is not indexed. So don't order. In instead do a random offset in the count() as in your queries, but do it all at once.
with t as (
select *
from
products p
inner join
images i using (productid)
where
prodtype = $sometype
)
select *
from t
offset floor(random() * (select count(*) from t))
limit 1
This version might be faster
with t as (
select *, count(*) over() total
from
products p
inner join
images i using (productid)
where
prodtype = $sometype
)
select *
from t
offset floor(random() * (select total from t limit 1))
limit 1
PosgreSQL:
SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1
This gives you one, random row. You can of course add back in your WHERE filter to make sure it is the right category.
This removes your requirement to do a count first; and also has the advantage of letting the database engine do the selection, reducing round trips.
Note: For people looking at ways to do this in other SQL engines: http://www.petefreitag.com/item/466.cfm

finding consecutive date pairs in SQL

I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.
I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.
The data looks something like this:
id user_id effective_date important_value useless_value
1 1 1/3/2007 3 0
2 1 1/4/2007 3 1
3 1 1/6/2007 NULL 1
4 1 2/1/2007 3 0
5 2 1/5/2007 12 1
6 3 1/1/1899 7 0
With this sample set, we would consider two consecutive rows duplicates if the user_id and the important_value are the same. From this sample set, we would only delete row with id=2, preserving the information from 1-3-2007, showing that the important_value changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.
My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the user_id values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, UDF_SameOrNull is a function that returns 1 if the two values passed in are the same or if they are both NULL.
WHILE (##fetch_status <> -1)
BEGIN
SELECT * FROM History INTO #history WHERE user_id = #UserId
--return entries to delete
SELECT h2.id
INTO #delete_history_ids
FROM #history h1
JOIN #history h2 ON
h1.effective_date < h2.effective_date
AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1
WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date)
DELETE h1
FROM History h1
JOIN #delete_history_ids dh ON
h1.id = dh.id
FETCH NEXT FROM UserCursor INTO #UserId
END
It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.
Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.
Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.
OK, I think I figured this one out, excellent question!
First, I made the assumption that the effective_date column will not be duplicated for a user_id. I think it can be modified to work if that is not the case - so let me know if we need to account for that.
The process basically takes the table of values and self-joins on equal user_id and important_value and prior effective_date. Then, we do 1 more self-join on user_id that effectively checks to see if the 2 joined records above are sequential by verifying that there is no effective_date record that occurs between those 2 records.
It's just a select statement for now - it should select all records that are to be deleted. So if you verify that it is returning the correct data, simply change the select * to delete tcheck.
Let me know if you have questions.
select
*
from
History tcheck
inner join History tprev
on tprev.[user_id] = tcheck.[user_id]
and tprev.important_value = tcheck.important_value
and tprev.effective_date < tcheck.effective_date
left join History checkbtwn
on tcheck.[user_id] = checkbtwn.[user_id]
and checkbtwn.effective_date < tcheck.effective_date
and checkbtwn.effective_date > tprev.effective_date
where
checkbtwn.[user_id] is null
OK guys, I did some thinking last night and I think I found the answer. I hope this helps someone else who has to match consecutive pairs in data and for some reason is also stuck in SQL Server 2000.
I was inspired by the other results that say to use ROW_NUMBER(), and I used a very similar approach, but with an identity column.
--create table with identity column
CREATE TABLE #history (
id int,
user_id int,
effective_date datetime,
important_value int,
useless_value int,
idx int IDENTITY(1,1)
)
--insert rows ordered by effective_date and now indexed in order
INSERT INTO #history
SELECT * FROM History
WHERE user_id = #user_id
ORDER BY effective_date
--get pairs where consecutive values match
SELECT *
FROM #history h1
JOIN #history h2 ON
h1.idx+1 = h2.idx
WHERE h1.important_value = h2.important_value
With this approach, I still have to iterate over the results until it returns nothing, but I can't think of any way around that and this approach is miles ahead of my last one.

SQL Update - Commit each row in order

Good morning. I'll do my best to explain my question without posting the SQL (it's 650 lines). Let me know if more information is needed.
We have an in-house fulfillment system that is allocating inventory in real time. For allocation to work properly, we need to know how much inventory is available each time a user asks what they should be working on (by loading/reloading their task list). The data would look something like this:
ID ItemID QtyOrdered QtyAvailableAfterAllocation ParentID
1 1234 5 500 NULL
2 1234 15 485 1
3 1234 10 475 2
Currently a while loop is being used to set the QtyAvailableAfterAllocation column. The example above demonstrates the need for the loop. Row 2's QtyAvailableAfterAllocation is dependent on the value of row 1's QtyAvailableAfterAllocation. Row 3 is dependent on row 2 and so on.
This is the (very) simplified version of the logic. It gets infinitely more complicated when you take into account kits (groups of inventory items that belong to a single parent item). There are times that inventory does not need to be allocated to the item because it exists inside of a kit that has sufficient inventory to fulfill the order. This is why we can't do a running total. Also, kits could be nested inside of kits to the Nth level. Therein lies the problem. When dealing with a large amount of orders that have nested kits, the performance of the query is very poor. I believe that the loop is to blame (testing has proved this). So, here's the question:
Is it possible to commit an update, one row at a time and in a specific order (without a loop), so that the child record(s) below can access the updated column (QtyAvailAfterOrder_AllocationScope) in the parent record?
EDIT
Here is a small portion of the SQL. It's the actual while loop. Maybe this will help show the logic that's needed to determine the allocation for each record.
http://pastebin.com/VM9iasq9
Can you cheat and do something like this?
DECLARE #CurrentCount int
SELECT #CurrentCount = QtyAvailableAfterAllocation
FROM blah
WHERE <select the parent of the first row>
UPDATE blah
SET QtyAvailableAfterAllocation = #CurrentCount - QtyOrdered,
#CurrentCount = #CurrentCount - QtyOrdered
WHERE <it is valid to deduct the count>
This should allow you to keep the update as set based and count downwards from a starting quantity. The crux of the problem here is the WHERE clause.
One method we have been doing is to flatten a hierarchy of values (in your case, the Nth kits idea) into a table, then you can join onto this flat table. The flattening of the hierarchy and the single join should help alleviate some of the performance quirks. Perhaps use a view to flatten the data.
Sorry this isn't a direct answer and only ideas.
If you can provide a sample data structure showing how the kits fit in, I'm sure someone can help thrash out a more specific solution.
If you do have requests queued up in some structure, you wouldn't employ a SQL statement to process the queue; "queue" and "SQL", conceptually, are at odds: SQL is set-based, not procedural.
So, forget about using a query to manage the queued requests, and process the queue in a procedure, wrapping each part requisition in a transaction:
pseudo:
WHILE REQUESTS_REMAIN_IN_QUEUE
begin trans
execute requisition SQL statements
commit
LOOP
Your requisition statements (simplified) might look like this:
update inventory
set QOH = QOH- {requested amount}
where partno = ? and QOH >= {requested amount}
insert orderdetail
(customer, orderheaderid, partno, requestedamount)
values
(custid, orderheaderid, partno, requested_amount)
Now in a complicated system involving kits and custom business logic, you might have a rule that says not to decrement inventory if every component in a kit is not avaialable. Then you'd have to wrap your kit requisition in a transaction and rollback if you encounter a situation where an individual component in the kit is backordered, say.
I think this problem can be solved using purely set-based approach.
Basically, you need to perform these steps:
Obtain the table of currently available quantity for every item.
Obtain the running totals from the ordered quantity due to be processed.
Get QtyAvailableAfterAllocation for every item as the result of subtraction of its running total from its available quantity.
Here's a sample solution:
/* sample data definition & initialisation */
DECLARE #LastQty TABLE (Item int, Qty int);
INSERT INTO #LastQty (Item, Qty)
SELECT 0123, 404 UNION ALL
SELECT 1234, 505 UNION ALL
SELECT 2345, 606 UNION ALL
SELECT 3456, 707 UNION ALL
SELECT 4567, 808 UNION ALL
SELECT 5678, 909;
DECLARE #Orders TABLE (ID int, Item int, OrderedQty int);
INSERT INTO #Orders (ID, Item, OrderedQty)
SELECT 1, 1234, 5 UNION ALL
SELECT 2, 1234, 15 UNION ALL
SELECT 3, 2345, 3 UNION ALL
SELECT 4, 1234, 10 UNION ALL
SELECT 5, 2345, 37 UNION ALL
SELECT 6, 2345, 45 UNION ALL
SELECT 7, 3456, 50 UNION ALL
SELECT 8, 4567, 25 UNION ALL
SELECT 9, 2345, 30;
/* the actuall query begins here */
WITH RankedOrders AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ID)
FROM #Orders
),
RunningOrderTotals AS (
SELECT
ID,
Item,
OrderedQty,
RunningTotalQty = OrderedQty,
rn
FROM RankedOrders
WHERE rn = 1
UNION ALL
SELECT
o.ID,
o.Item,
o.OrderedQty,
RunningTotalQty = r.RunningTotalQty + o.OrderedQty,
o.rn
FROM RankedOrders o
INNER JOIN RunningOrderTotals r ON o.Item = r.Item AND o.rn = r.rn + 1
)
SELECT
t.ID,
t.Item,
t.OrderedQty,
QtyAvailableAfterAllocation = oh.Qty - t.RunningTotalQty
FROM RunningOrderTotals t
INNER JOIN #LastQty oh ON t.Item = oh.Item
ORDER BY t.ID;
Note: For the purpose of my example I initialised the available item quantity table (#LastQty) manually. However, you are most probably going to derive it from your data.
Based on the comments/answers above and my inability to accurately represent this complicated issue properly, I've rewritten the processing in C#. Using PLINQ, I've reduced the processing time from 15 seconds to 4. Thanks to all those who tried to help!
If this isn't the appropriate way to close a question, let me know (and let me know the appropriate way so I can do that instead).