Selecting ID's based on multiple subquery - sql

This my my first post to Stack Overflow, I appreciate and will take in any positive criticism to better form any future questions.
Question:
I'm trying to create a Select query where to gather all orders which have only the top 8 items in them.
I'm working with MS-Access 2013.
My current Query, which doesn't work, looks like this.
SELECT OrderID
From DirectOrders
WHERE OrderID <> ANY
(
SELECT OrderID
FROM DirectOrders
WHERE SKU <> ANY
(
SELECT TOP 8 SKU
FROM DirectOrders
GROUP BY SKU
ORDER BY COUNT(SKU) DESC
)
)
The single table that is below.
OrderID Customer SKU Qty
177622 CustomerA 1001 20
177622 CustomerA 1002 2
177624 CustomerB 1001 200
177626 CustomerC 1003 50
177626 CustomerC 1004 150
177630 CustomerC 1005 1000
177632 CustomerA 1006 1
177632 CustomerA 1007 3
177632 CustomerA 1008 9
177632 CustomerA 1009 1
177632 CustomerA 1010 4
177632 CustomerA 1011 3
177634 CustomerC 1012 5
177634 CustomerC 1013 5
177640 CustomerD 1014 4
177642 CustomerA 1015 4
177642 CustomerA 1016 48
177642 CustomerA 1017 15
177644 CustomerB 1018 50
Here was the flow that I was trying to accomplish.
Select Top 8 SKU's by Count
Select All OrderID's that do not have one of those 8 SKU's
Select All OrderID's That are not part of the selected OrderID's in List 2.

I would do this with aggregation:
SELECT do.OrderID
FROM DirectOrders as do LEFT JOIN
(SELECT TOP 8 SKU
FROM DirectOrders
GROUP BY SKU
ORDER BY COUNT(SKU) DESC, SKU
) as s8
ON do.SKU = s8.SKU
GROUP BY do.OrderId
HAVING COUNT(*) = COUNT(s8.SKU);
Notes:
In MS Access, TOP is really TOP WITH TIES. To get exactly 8 values you need a tie breaker. This query uses SKU for that purpose.
The LEFT JOIN determines if there is a match between each item in an order and the top 8 items.
The HAVING clause is saying: The count of rows with items is the same as the count of rows that match one of the top 8. Hence, all are in the order.

I think you need something like this. However, you might get some strange results if you are using count SKU because apart from 1001 the count of the other SKUs is 1. So apart from 1001, all the other SKUs are in the top 8 based on the count(SKU)
SELECT * FROM DirectOrderswhere SKU in
(select top 8 SKU from DirectOrders group by SKU order by count(SKU) desc);

Access's TOP function does not break ties, so instead of reporting just the top 8, it will order the items per your order by and then report enough to cover the top value you put in, and all ties. For example, with your sample data, it will report the same 17 records if you do top 8 or just top 2, since all but one of your SKUs have only 1 order.
If you want to report only the top 8, you should add to the query to make the ordering unique. In this case, I would probably order by COUNT(SKU) DESC, COUNT(QTY) DESC, MAX(ORDERID) desc, SKU So that it prioritizes highest number of orders, highest quantity, and then makes a choice based on the latest OrderID with that sku, and if all else fails, order by the SKU itself. Only the SKU is guaranteed to be unique for each row, but just ordering by SKU might not give the best result if you are looking for the truly relevant "top 8".
SELECT OrderID
From DirectOrders
WHERE OrderID NOT IN
(
SELECT OrderID
FROM DirectOrders
WHERE SKU NOT IN
(
SELECT TOP 8 SKU
FROM DirectOrders
GROUP BY SKU
ORDER BY COUNT(SKU) DESC, SUM(QTY) DESC, MAX(ORDERID) DESC, SKU
)
)

Related

Distribute large quantities over multiple rows

I have a simple Order table and one order can have different products with Quantity and it's Product's weight as below
OrderID
ProductName
Qty
Weight
101
ProductA
2
24
101
ProductB
1
24
101
ProductC
1
48
101
ProductD
1
12
101
ProductE
1
12
102
ProductA
5
60
102
ProductB
1
12
I am trying to partition and group the products in such a way that for an order, grouped products weight should not exceed 48.
Expected table look as below
OrderID
ProductName
Qty
Weight
GroupedID
101
ProductA
2
24
1
101
ProductB
1
24
1
101
ProductC
1
48
2
101
ProductD
1
12
3
101
ProductE
1
12
3
102
ProductA
4
48
1
102
ProductA
1
12
2
102
ProductB
1
12
2
Kindly let me know if this is possible.
Thank you.
This is a bin packing problem which is non-trivial in general. It's not just NP-complete but superexponential, ie the time increase as complexity increases is worse than exponential. Dai posted a link to Hugo Kornelis's article series which is referenced by everyone trying to solve this problem. The set-based solution performs really bad. For realistic scenarios you need iteration and preferably, using bin packing libraries eg in Python.
For production work it would be better to take advantage of SQL Server 2017+'s support for Python scripts and use a bin packing library like Google's OR Tools or the binpacking module. Even if you don't want to use sp_execute_external_script you can use a Python script to read the data from the database and split them.
The question's numbers are so regular though you could cheat a bit (actually quite a lot) and distribute all order lines into individual items, calculate the running total per order and then divide the total by the limit to produce the group number.
This works only because the running totals are guaranteed to align with the bin size.
Distributing into items can be done using a Tally/Numbers table, a table with a single Number column storing numbers from 0 to eg 1M.
Given the question's data:
declare #OrderItems table(id int identity(1,1) primary key, OrderID int,ProductName varchar(20),Qty int,Weight int)
insert into #OrderItems(OrderId,ProductName,Qty,Weight)
values
(101,'ProductA',2,24),
(101,'ProductB',1,24),
(101,'ProductC',1,48),
(101,'ProductD',1,12),
(101,'ProductE',1,12),
(102,'ProductA',5,60),
(102,'ProductB',1,12);
The following query will split each order item into individual items. It repeats each order item row as there are individual items and calculates the individual item weight
select o.*, Weight/Qty as ItemWeight
from #OrderItems o inner join Numbers ON Qty >Numbers.Number;
This row:
1 101 ProductA 2 24
Becomes
1 101 ProductA 2 24 12
1 101 ProductA 2 24 12
Calculating the running total inside a query can be done with :
SUM(ItemWeight) OVER(Partition By OrderId
Order By Itemweight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
The Order By Itemweight claus means the smallest items are picked first, ie it's a Worst fit algorithm.
The overall query calculating the total and Group ID is
with items as (
select o.*, Weight/Qty as ItemWeight
from #OrderItems o INNER JOIN Numbers ON Qty > Numbers.Number
)
select Id,OrderId,ProductName,Qty,Weight, ItemWeight,
ceiling(SUM(ItemWeight) OVER(Partition By OrderId
Order By Itemweight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/48.0)
As GroupId
from items;
After that, individual items need to be grouped back into order items and groups. This produces the final query:
with items as (
select o.*, Weight/Qty as ItemWeight
from #OrderItems o INNER JOIN Numbers ON Qty > Numbers.Number
)
,bins as(
select Id,OrderId,ProductName,Qty,Weight, ItemWeight,
ceiling(SUM(ItemWeight) OVER(Partition By OrderId
Order By Itemweight
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/48.0) As GroupId
from items
)
select
max(OrderId) as orderid,
max(productname) as ProductName,
count(*) as Qty,
sum(ItemWeight) as Weight,
max(GroupId) as GroupId
from bins
group by id,groupid
order by orderid,groupid
This returns
orderid
ProductName
Qty
Weight
GroupId
101
ProductA
2
24
1
101
ProductD
1
12
1
101
ProductE
1
12
1
101
ProductB
1
24
2
101
ProductC
1
48
3
102
ProductA
4
48
1
102
ProductA
1
12
2
102
ProductB
1
12
2

Select the best selling product ID

What if I have table like this and I want to select the best selling product_id.
id
transaction_id
product_id
qty_sold
1
21
2
5
2
22
3
2
3
23
4
2
3
24
2
1
3
25
2
4
I want the best selling product_id with the highest qty_sold
Using SQLS, you can group by the productID, add up the number of sold, and order by the total descending. If we also take the minimum transaction ID per product, if two products come out to have the same total qty, we can take the minimum tran ID to split the tie
SELECT TOP 1 product_id, SUM(qty_sold) as sellcount, MIN(transaction_id) as firsttran
FROM t
GROUP BY product_id
ORDER BY SUM(qty_sold) DESC, MIN(transaction_id)
Once you're happy the sums are right etc, you can remove the , SUM(qty_sold) as sellcount, MIN(transaction_id) from the SELECT if you want/if you only need the prod ID

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

Select all rows based on alternative publisher

I want list all the rows by alternative publisher with price ascending, see the example table below.
id publisher price
1 ABC 100.00
2 ABC 150.00
3 ABC 105.00
4 XYZ 135.00
5 XYZ 110.00
6 PQR 105.00
7 PQR 125.00
The expected result would be:
id publisher price
1 ABC 100.00
6 PQR 105.00
5 XYZ 110.00
3 ABC 105.00
7 PQR 125.00
4 XYZ 135.00
2 ABC 150.00
What would be the required SQL?
This should do it:
select id, publisher, price
from (
select id, publisher, price,
row_number() over (partition by publisher order by price) as rn
from publisher
) t
order by rn, publisher, price
The window functions assigns unique numbers for each publisher price. Based on that the outer order by will then first display all rows with rn = 1 which are the rows for each publisher with the lowest price. The second row for each publisher has the second lowest price and so on.
SQLFiddle example: http://sqlfiddle.com/#!4/06ece/2
SELECT id, publisher, price
FROM tbl
ORDER BY row_number() OVER (PARTITION BY publisher ORDER BY price), publisher;
One cannot use the output of window functions in the WHERE or HAVING BY clauses because window functions are applied after those. But one can use window functions in the ORDER BY clause.
SQL Fiddle.
Not sure what your table name is - I have called it publishertable. But the following will order the result by price in ascending order - which is the result you are looking for:
select id, publisher, price from publishertable order by price asc
if I've got it right. You should use ROW_NUMBER() function to range prices inside of each publisher and then order by this range and publisher.
SELECT ID,
Publisher,
Price,
Row_number() OVER (PARTITION BY Publisher ORDER BY Price) as rn
FROM T
ORDER BY RN,Publisher
SQLFiddle demo

Find the the value of one field that matches the maximum value of data in another field

I'm trying to write a query that gets the value of one field that's associated with the maximum value of another field (or fields). Let's say I have the following table of data:
OrderID CustomerID OrderDate LocationID
1 4 1/1/2001 1001
2 4 1/2/2001 1003
3 4 1/3/2001 1001
4 5 1/4/2001 1001
5 5 1/5/2001 1001
6 5 1/6/2001 1003
7 5 1/7/2001 1002
8 5 1/8/2001 1003
9 5 1/8/2001 1002
Grouping by CustomerID, I want to get the maximum OrderDate and then the LocationID associated with whatever is the maximum OrderDate. If there are several records that share the maximum order date, then take the LocationID associated with the maximum OrderID from among those records with the maximum date.
The final set of data should look like this:
CustomerID OrderDate LocationID
4 1/3/2001 1001
5 1/8/2001 1002
I had been trying to write a query with lots of nested subqueries and ugly joins, but I'm not really getting anywhere. What SQL do I need to write to help me get this result.
with cte As
(
select *,
row_number() over (partition by CustomerID
order by OrderDate desc, OrderId desc) as rn
from yourtable
)
select CustomerID, OrderDate,LocationID
from cte
where rn=1;
SELECT
C.Name,
C.CustomerID,
X.*
FROM
Customers C
CROSS APPLY (
SELECT TOP 1 OrderDate, LocationID
FROM Orders O
WHERE C.CustomerID = O.CustomerID
ORDER BY OrderDate Desc, OrderID Desc
) X
If you will pull any columns from the Customers table, this will probably outperform other methods.
If not, then the Row_Number answer, pulling only from Orders, will probably be best. But if you restrict by Customer in any way, then the CROSS APPLY will again be best. Possibly by a big margin.
The trick is to use a subquery as a value, not as a join:
select customerId,orderDate,locationId
from orders o1
where orderDate = (
select top 1 orderdate
from orders o2
where o1.customerId = o2.customerId
order by orderdate desc
)