one to many relation between table columns. Grouping and finding combinations - sql

In sample table t0 :
OrderID | ProductID
0001 1254
0001 1252
0002 0038
0003 1254
0003 1252
0003 1432
0004 0038
0004 1254
0004 1252
I need to find the most popular combination of two ProductIDs under one OrderID. The purpose is to decide which products are more likely to be sold together in one order e.g phone - handsfree. I think the logic is to group by OrderID, calculate every possible combination of productID pairs, count them per OrderID and select the TOP 2, but i realy can't tell if it is doable..

A "self-join" may be used but ensuring that one of the product ids is greater then than the other so that we get get "pairs" of products per order. Then it is simple to count:
Demo
CREATE TABLE OrderDetail
([OrderID] int, [ProductID] int)
;
INSERT INTO OrderDetail
([OrderID], [ProductID])
VALUES
(0001, 1254), (0001, 1252), (0002, 0038), (0003, 1254), (0003, 1252), (0003, 1432), (0004, 0038), (0004, 1254), (0004, 1252)
;
Query 1:
select -- top(2)
od1.ProductID, od2.ProductID, count(*) count_of
from OrderDetail od1
inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
group by
od1.ProductID, od2.ProductID
order by
count_of DESC
Results:
| ProductID | ProductID | count_of |
|-----------|-----------|----------|
| 1252 | 1254 | 3 |
| 1252 | 1432 | 1 |
| 1254 | 1432 | 1 |
| 38 | 1252 | 1 |
| 38 | 1254 | 1 |
----
With respect to displaying the "top 2" or whatever. You are likely to get "equal top" results so I would suggest you need to use dense_rank() and you may even want to "unpivot" the result so you have a single column of productids with their associated rank. How often you perform this and/or store this I leave to you.
with ProductPairs as (
select
p1, p2, count_pair
, dense_rank() over(order by count_pair DESC) as ranked
from (
select
od1.ProductID p1, od2.ProductID p2, count(*) count_pair
from OrderDetail od1
inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
group by
od1.ProductID, od2.ProductID
) d
)
, RankedProducts as (
select p1 as ProductID, ranked, count_pair
from ProductPairs
union all
select p2 as ProductID, ranked, count_pair
from ProductPairs
)
select *
from RankedProducts
where ranked <= 2
order by ranked, ProductID

WITH products as (
SELECT DISTINCT ProductID
FROM orders
), permutation as (
SELECT p1.ProductID as pidA,
p2.ProductID as pidB
FROM products p1
JOIN products p2
ON p1.ProductID < p2.ProductID
), check_frequency as (
SELECT pidA, pidB, COUNT (o2.orderID) total_orders
FROM permutations p
LEFT JOIN orders o1
ON p.pidA = o1.ProductID
LEFT JOIN orders o2
ON p.pidB = o2.ProductID
AND o1.orderID = o2.orderID
GROUP BY pidA, pidB
)
SELECT TOP 2 *
FROM check_frequency
ORDER BY total_orders DESC

The following query calculates the number of two-way combinations
among all orders in Orderline:
SELECT SUM(numprods * (numprods - 1)/2) as numcombo2
FROM ( SELECT orderid, COUNT(DISTINCT productid) as numprods
FROM orderline ol
GROUP BY orderid ) o
Notice that this query counts distinct products rather than order lines, so
orders with the same product on multiple lines do not affect the count.
The number of two-way combinations is 185,791. This is useful because the
number of combinations pretty much determines how quickly the query generating
them runs. A single order with a large number of products can seriously
degrade performance. For instance, if one order contains a thousand
products, there would be about five hundred thousand two-way combinations
in just that one order—versus 185,791 in all the orders data. As the number of
products in the largest order increases, the number of combinations increases
much faster.subject to the conditions:
The two products in the pair are different
No two combinations have the same two products.
The approach for calculating the combinations is to do a self-join on the Orderline
table, with duplicate product pairs removed. The goal is to get all pairs of
products
The first condition is easily met by filtering out any pairs where the two products
are equal. The second condition is also easily met, by requiring that the
first product id be smaller than the second product id. The following query
generates all the combinations in a subquery and counts the number of orders
containing each one:
SELECT p1, p2, COUNT(*) as numorders
FROM (SELECT op1.orderid, op1.productid as p1, op2.productid as p2
FROM (SELECT DISTINCT orderid, productid FROM orderline) op1 JOIN
(SELECT DISTINCT orderid, productid FROM orderline) op2
ON op1.orderid = op2.orderid AND
op1.productid < op2.productid
) combinations
GROUP BY p1, p2
source Data Analysis Using SQL and Excel

Try using the following commnand:
SELECT T1.orderID,T1.productId,T2.productID,Count(*) as Occurence
FROM TBL T1 INNER JOIN TBL T2
ON T1.orderid = T2.orderid
WHERE t1.productid > T2.productId
GROUP BY T1.orderID,T1.productId,T2.productID
ORDER BY Occurence DESC
SQL fiddle

Related

SQL subqueries using SELECTs only

I need to write a query with subqueries using SELECT and aggregation functions only, e.g.:
select distinct m_name
from MANUFACT
where m_id in (select TOP 1 m_id
from PRODUCT
where p_id = (select p_id
from PRODUCT
where p_desc = 'Bronze Sculpture'));
The question is about query similar to this one, but using SUM(). The data I have:
Table SPERSON:
sp_id | sp_name
---------------
10 | Jones
39 | Matsu
23 | Atsuma
Table SALE:
sp_id | qty
-----------
10 | 20
23 | 30
10 | 10
39 | 20
etc.
The task is to return the sp_name s whose total number of products is <= 75.
The teacher says we're not allowed to use join, but I doubt whether is any way not to use it.
This is what I have so far:
select sp_name
from SPERSON
where sp_id in (select sp_id from SALE
where qty in (select sum(qty) group by sp_id));
Anyway, I only got the 'Each GROUP BY expression must contain at least one column that is not an outer reference' error, but can't really get the thing.
You can use correlated subquery :
SELECT q.sp_name
FROM( SELECT sp_name,
(SELECT SUM(qty) FROM sale s WHERE s.sp_id = p.sp_id ) AS qty
FROM SPERSON p
GROUP BY sp_name
) q
GROUP BY q.sp_name
HAVING SUM(q.qty) <= 75
Mostly, using correlated subqueries, which may contains a reference to the outer query and so produces different results for each row of the outer query, is not suggested. But I suggested to use it as an alternative method depending on your case for not being permitted to use JOIN. Btw, it is more straightforward to use JOIN .
You can try to approach a problem from different direction.
Create a query to calculate total quantity grouped by sp_id
SELECT s.sp_id, SUM(s.qty)
FROM SALE s
GROUP BY s.sp_id
Filter persons id which has quantity less or equal to 75
SELECT s.sp_id, SUM(s.qty)
FROM SALE s
GROUP BY s.sp_id
HAVING SUM(s.qty) <= 75
Because joins not allowed, "inject" name as a subquery
SELECT
(SELECT p.sp_name FROM SPERSON p WHERE p.sp_id = s.sp_id) AS name
FROM SALE s
GROUP BY s.sp_id
HAVING SUM(s.qty) <= 75

SQL join to and from dates - return most recent if no match found

I have two tables that I need to join. I have:
LEFT JOIN AutoBAF on (GETDATE() BETWEEN AutoBAF.FromDate and AutoBAF.ToDate)
and I get the expected result. Now if no matching record is found between the two dates (AutoBAF.FromDate and AutoBAF.ToDate) I would like to join the most recent matching record instead.
Can anyone point me in the right direction.
I am using a MS SQL database hosted in Azure.
Small example:
a small example of what I am trying to achieve:
Table Product:
Product | Description
A | Product A
Table Price
Product | FromDate | ToDate | Price
A | 01-01-20 | 31-01-20 | 100
A | 01-02-20 | 28-02-20 | 110
I need a query that will return the price according to the date returned by GETDATE().
If I run the query 15-01-20 I should get:
Product | Description | Price
A | Product A | 100
If I run the query 15-02-20 I should get:
Product | Description | Price
A | Product A | 110
and finally if I run the query 15-03-20 I will have no price in the Price table. Instead of returning null I would like to "fall back" to the most recent known price instead which in this example is 110
This is not the fastest query cause it joins products with all records with future dates. But if your tables are small, it works.
SELECT product.product, product.description, isnull(pr_curr.price, pr_fut.price) as price
FROM product
left join PRICE pr_curr on product.product=pr_curr.product
and GETDATE() BETWEEN pr_curr.FromDate and pr_curr.ToDate
left join PRICE pr_fut on product.product=pr_fut.product
and GETDATE() < pr_fut.FromDate
where pr_fut.FromDate = (
select min(FromDate) from PRICE dates
where dates.product=pr_fut.product and dates.FromDate>GETDATE()
) or pr_fut.FromDate is null
This looks like SQL Server code, which supports lateral joins via the apply keyword. Assuming you want only one match:
from product p outer apply
(select top (1) ab.*
from autobaf ab
where ab.product = p.product and
getdate() <= ab.todate
order by ab.todate desc
) ab
Note that this correlates on the product, which is not part of your question.
If that is not necessary, then you can use:
from t left join
(select top (1) ab.*
from autobaf ab
where getdate() <= ab.todate
order by ab.todate desc
) ab
on 1 = 1
If you know that there is some record in the past, then you can use cross join instead of left join and dispense with the on clause.
SELECT product.product, product.description, isnull(pr_curr.price, pr_fut.price) as price
FROM product
left join PRICE pr_curr on product.product=pr_curr.product
and GETDATE() BETWEEN pr_curr.FromDate and pr_curr.ToDate
left join PRICE pr_fut on product.product=pr_fut.product
and GETDATE() > pr_fut.FromDate
where pr_fut.FromDate = (
select max(FromDate) from PRICE dates
where dates.product=pr_fut.product and dates.FromDate<GETDATE()
) or pr_fut.FromDate is null

Join tables based on dates with check

I have two tables in PostgreSQL:
Demans_for_parts:
demandid partid demanddate quantity
40 125 01.01.17 10
41 125 05.01.17 30
42 123 20.06.17 10
Orders_for_parts:
orderid partid orderdate quantity
1 125 07.01.17 15
54 125 10.06.17 25
14 122 05.01.17 30
Basicly Demans_for_parts says what to buy and Orders_for_parts says what we bought. We can buy parts which do not list on Demans_for_parts.
I need a report which shows me all parts in Demans_for_parts and how many weeks past since the most recent matching row in Orders_for_parts. note quantity field is irrelevent here,
The expected result is (if more than one row per part show the oldes):
partid demanddate weeks_since_recent_order
125 01.01.17 2 (last order is on 10.06.17)
123 20.06.17 Unhandled
I think the tricky part is getting one row per table. But that is easy using distinct on. Then you need to calculate the months. You can use age() for this purpose:
select dp.partid, dp.date,
(extract(year from age(dp.date, op.date))*12 +
extract(month from age(dp.date, op.date))
) as months
from (select distinct on (dp.partid) dp.*
from demans_for_parts dp
order by dp.partid, dp.date desc
) dp left join
(select distinct on (op.partid) op.*
from Orders_for_parts op
order by op.partid, op.date desc
) op
on dp.partid = op.partid;
smth like?
with o as (
select distinct partid, max(orderdate) over (partition by partid)
from Orders_for_parts
)
, p as (
select distinct partid, min(demanddate) over (partition by partid)
from Demans_for_parts
)
select p.partid, min as demanddate, date_part('day',o.max - p.min)/7
from p
left outer join o on (p.partid = o.partid)
;

How to join two tables with max value not greater than another value of column

Apologies for the confusing question title, but I'm not exactly sure how to describe the issue at hand.
I have two tables in Oracle 9i:
Pricing
-------
SKU
ApplicableTime
CostPerUnit
Inventory
---------
SKU
LastUpdatedTime
NumberOfUnits
Pricing contains incremental updates to the costs of each particular SKU item, at a specific Unix time. For example, if I have records:
SKU ApplicableTime CostPerUnit
------------------------------------
12345 1000 1.00
12345 1500 1.50
, then item 12345 is $1.00 per unit for any time between 1000 and 1500, and $1.50 for any time after 1500.
Inventory contains SKU, last updated time, and number of units.
What I'm trying to do is construct a query such that for each row in Inventory, I join the two tables based on SKU, I find the largest value for Pricing.ApplicableTime that is NOT greater than Inventory.LastUpdatedTime, get the CostPerUnit of that particular record from Pricing, and calculate TotalCost = CostPerUnit * NumberOfUnits:
SKU TotalCost
-----------------------------------------------------------------------------------
12345 (CostPerUnit at most recent ApplicableTime <= LastUpdatedTime)*NumberOfUnits
12346 <same>
... ...
How would I do this?
SELECT *
FROM
(select p.SKU,
p.ApplicableTime,
p.CostPerUnit*i.NumberOfUnits as cost,
row_number over (partition by p.SKU order by p.ApplicableTime desc) as rnk
from Pricing p
join
Inventory i on (p.sku = i.sku and i.LastUpdatedTime > p.ApplicableTime)
)
where rnk=1
select SKU, i1.NumberOfUnits * p1.CostPerUnit as TotalCost
from Inventory i1,
join (
select SKU, max(ApplicableTime) as ApplicableTime, max(i.LastUpdatedTime) as LastUpdatedTime
from Pricing p
join Inventory i on p.sku = i.sku
where p.ApplicableTime < i.LastUpdatedTime
group by SKU
) t on i1.sku = t.sku and i1.LastUpdatedTime = t.LastUpdatedTime
join Pricing p1 on p1.sku = t.sku and p1.ApplicableTime = t.ApplicableTime

Getting products from SQL query

I'm currently working on a proprietary shopping cart system and was having a few problems with getting products out with the correct pricing.
Basically my table structure is as follows:
Products table: (Only relevant columns are represented)
----------------------------------------------------
productid | product | descr | disporder| list_price|
----------------------------------------------------
1 name desc 1 0.00
2 name desc 4 0.00
3 name desc 2 2.45
Pricing table:
----------------------------------------
priceid | productid | price | variantid|
----------------------------------------
1 1 13.91 1
2 2 54.25 4
3 2 47.23 2
Variants Table:
-------------------------------
variantid | productid | active|
-------------------------------
1 1 Y
2 2 Y
3 2 Y
So, each product can have - and in most cases does have - multiple variants. My current SQL query I have managed to create thus far is:
SELECT
products.productid, product, descr, p.price, i.image_path
FROM
products
LEFT JOIN
pricing AS p
ON
p.variantid = (SELECT variantid FROM variants
WHERE productid = products.productid LIMIT 1)
LEFT JOIN
images_T AS i
ON
i.id = products.productid
GROUP BY
products.productid
ORDER BY
products.disporder
However, my problem arises when a product does not have a variant. If a product does not have a variant associated with it, the price will be in the list_price column of the products table. How would I go about performing a check to see if a product does indeed have a variant. If not, it should effectively bypass the variants table and get the pricing from list_price within the products table.
Yes, CASE is an option, or COALESCE:
SELECT
products.productid, product, descr,
COALESCE(products.list_price, p.price) AS price,
i.image_path
...
Just join both prices and when the first is NULL the other will be selected.
The simplest way is to use a CASE in the SELECT clause, like so:
SELECT
products.productid, product, descr,
CASE
WHEN p.price IS NULL
THEN products.list_price
ELSE p.price
END AS price,
i.image_path
[...]
Since you're left-joining on pricing/variants, p.price should reliably be NULL for products with no variants.
Hopefully that's what you meant by "bypassing" the variants table. :)
You can do a full join with variants table (which will ONLY give you producs which have variants), and then UNION it with a join of producs and pricing where there exists no varian (using AND NOT EXISTS (SELECT 1 from variants WHERE p.productid=v.productid and p.variantid =v.variantid)
Otherwise, use CASE on pricing.price