Joining two tables + third table as a conditional column - sql

If i have three tables called warehouse , warehouse_order and table called warehouse_fulfillment. A warehouse order is created by warehouse admin, initially has 0 fulfillment records, can have many warehouse_fulfillment records with failed/rejected statuses, and only one success state (it's done):
-- Warehouse
+---------------------------------------+-----------+----------+
| id | name | location |
+---------------------------------------+-----------+----------+
| 9bcae08e-ad36-4d97-b9ec-4857714e902a | "big" | "MLB" |
+---------------------------------------+-----------+----------+
| b442e783-4725-41e9-af83-f75004ee1b38 | "bigger" | "MLB" |
+---------------------------------------+-----------+----------+
| 986d5aa9-0523-42d8-b183-dfd546d3e682 | "biggest" | "MLB" |
+---------------------------------------+-----------+----------+
-- Warehouse_order Table
+---------------------------------------+--------------------------------------+--------+----------+
| id | warehouse_id | type | quantity |
+---------------------------------------+--------------------------------------+--------+----------+
| 9cb99fd9-9e5e-4240-8162-d28747be01cd | b442e783-4725-41e9-af83-f75004ee1b38 | BN_100 | 100 |
+---------------------------------------+-------------------------------------+--------+-----------+
| eceb0b5a-5afa-40e4-ac62-efb686e3bdae | 9bcae08e-ad36-4d97-b9ec-4857714e902a | BN_200 | 400 |
+---------------------------------------+--------------------------------------+--------+----------+
| 13370467-cf0c-47f2-8fea-a215500607e6 | 986d5aa9-0523-42d8-b183-dfd546d3e68 | BN_300 | 10 |
+---------------------------------------+--------------------------------------+--------+----------+
-- Warhouse_fulfillment Table
+---------------------------------------+---------------------------------------+------------+
| id | order_id | status |
+---------------------------------------+---------------------------------------+------------+
| 8a69edde-2346-48b8-96d0-6c4e25527f38 | 9cb99fd9-9e5e-4240-8162-d28747be01cd | "FAILLED" |
+---------------------------------------+---------------------------------------+------------+
| a2006a64-9bdc-4bfa-ba14-a44769aeb4a2 | 9cb99fd9-9e5e-4240-8162-d28747be01cd | "REJECTED" |
+---------------------------------------+---------------------------------------+------------+
| bf0aa1fc-6dfc-4fd0-ba20-be101b1985d1 | 9cb99fd9-9e5e-4240-8162-d28747be01cd | "FAILED" |
+---------------------------------------+---------------------------------------+------------+
| 48c7d747-2f9b-4535-8f27-210a43cf5c30 | 9cb99fd9-9e5e-4240-8162-d28747be01cd | "SUCCESS" |
+---------------------------------------+---------------------------------------+------------+
| 7f8e18c9-4322-428a-9370-9ecd1c5ef286 | 13370467-cf0c-47f2-8fea-a215500607e6 | "FAILED" |
+---------------------------------------+---------------------------------------+------------+
I want to query the above records in such a way that result looks like so:
+--------------------------------------+-----------+----------+---------------------------------------+------------+----------------+--------------------------------------+
| id | name | location | order_id | order_type | order_quantity | fulfillment_id |
+--------------------------------------+-----------+----------+---------------------------------------+------------+----------------+--------------------------------------+
| 9bcae08e-ad36-4d97-b9ec-4857714e902a | "big" | "MLB" | eceb0b5a-5afa-40e4-ac62-efb686e3bdae | "BN_100" | 100 | NULL |
+--------------------------------------+-----------+----------+---------------------------------------+------------+----------------+--------------------------------------+
| b442e783-4725-41e9-af83-f75004ee1b38 | "bigger" | "MLB" | 9cb99fd9-9e5e-4240-8162-d28747be01cd | "BN_200" | 400 | 48c7d747-2f9b-4535-8f27-210a43cf5c30 |
+--------------------------------------+-----------+----------+---------------------------------------+------------+----------------+--------------------------------------+
| 986d5aa9-0523-42d8-b183-dfd546d3e682 | "biggest" | "MLB" | 13370467-cf0c-47f2-8fea-a215500607e6 | "BN_300" | 10 | NULL |
+--------------------------------------+-----------+----------+---------------------------------------+------------+----------------+--------------------------------------+
I couldn't do this without the repeated rows in cases where an order has multiple failed statuses.

Did you try SELECT DISTINCT? Because you don't have the status column (which causes the duplicates) in the select list, this should work.
SELECT DISTINCT W.id, W.name, W.location, WO.id order_id, WO.type order_type, WO.quantity order_quantity, WF.id fulfillment_id
FROM warehouse W
LEFT JOIN warehouse_order WO ON W.id = WO.warehouse_id
LEFT JOIN warehouse_fulfillment WF on WF.order_id = WO.id
Otherwise, I would need to know what DBMS the SQL is for, but every flavor I've worked with has some way to rank/order results with a partition so that you can take just the first record based on some key, for example:
SELECT id, name, location, order_id, order_type, order_quantity
FROM (
SELECT W.id, W.name, W.location, WO.id order_id, WO.type order_type, WO.quantity order_quantity, WF.id fulfillment_id, ROW_NUMBER() OVER (PARTITION BY WO.id ORDER BY WF.ID) rNum
FROM warehouse W
LEFT JOIN warehouse_order WO ON W.id = WO.warehouse_id
LEFT JOIN warehouse_fulfillment WF on WF.order_id = WO.id
) A
WHERE rNum = 1
It would be better to order by date DESC or something like that to get the most recent record.

Without some reliable method within the fulfillment table to determine "latest status", such as a timestamp, you will need to choose some arbitrary method to arrive at a priority order amongst the possible status values. Below I have used a case expression within the over clause so that "success" will be get the row number 1 if that status exists for an order. Adjust the case expression as you see fit for the other possible values of that column.
When the subquery containing the row number is joined to the main query that join includes and rn=1 so only fulfillment row per order will be possible.
Please note that in the sample data there is a missing warehouse row so I had to use a left join, but I expect it would be an inner join in the real db.
SQL Fiddle Demo
CREATE TABLE Warehouse
(ID varchar(36), Name varchar(9), Location varchar(5))
;
INSERT INTO Warehouse
("id", "name", "location")
VALUES
('9bcae08e-ad36-4d97-b9ec-4857714e902a', 'big', 'MLB'),
('b442e783-4725-41e9-af83-f75004ee1b38', 'bigger', 'MLB'),
('986d5aa9-0523-42d8-b183-dfd546d3e682', 'biggest', 'MLB')
;
CREATE TABLE Warehouse_order
(ID varchar(36), Warehouse_id varchar(36), type varchar(6), quantity int)
;
INSERT INTO Warehouse_order
("id", "warehouse_id", "type", "quantity")
VALUES
('9cb99fd9-9e5e-4240-8162-d28747be01cd', 'b442e783-4725-41e9-af83-f75004ee1b38', 'BN_100', 100),
('eceb0b5a-5afa-40e4-ac62-efb686e3bdae', '9bcae08e-ad36-4d97-b9ec-4857714e902a', 'BN_200', 400),
('13370467-cf0c-47f2-8fea-a215500607e6', '986d5aa9-0523-42d8-b183-dfd546d3e68', 'BN_300', 10)
;
CREATE TABLE Warehouse_fulfillment
(ID varchar(36), Order_id varchar(36), Status varchar(10))
;
INSERT INTO Warehouse_fulfillment
("id", "order_id", "status")
VALUES
('8a69edde-2346-48b8-96d0-6c4e25527f38', '9cb99fd9-9e5e-4240-8162-d28747be01cd', 'FAILLED'),
('a2006a64-9bdc-4bfa-ba14-a44769aeb4a2', '9cb99fd9-9e5e-4240-8162-d28747be01cd', 'REJECTED'),
('bf0aa1fc-6dfc-4fd0-ba20-be101b1985d1', '9cb99fd9-9e5e-4240-8162-d28747be01cd', 'FAILED'),
('48c7d747-2f9b-4535-8f27-210a43cf5c30', '9cb99fd9-9e5e-4240-8162-d28747be01cd', 'SUCCESS'),
('7f8e18c9-4322-428a-9370-9ecd1c5ef286', '13370467-cf0c-47f2-8fea-a215500607e6', 'FAILED')
;
Query 1:
select
o.*, w.name, s.status, s.rn
from Warehouse_order o
left join Warehouse w on o.Warehouse_id = w.id
left join (
select id, order_id, status
, row_number() over(partition by order_id
order by case when status = 'SUCCESS' then 1
when status = 'FAILED' then 2
when status = 'REJECTED' then 3
else 4 end) as rn
from Warehouse_fulfillment
) s on o.id = s.Order_id and rn=1
Results:
| id | warehouse_id | type | quantity | name | status | rn |
|--------------------------------------|--------------------------------------|--------|----------|--------|---------|--------|
| eceb0b5a-5afa-40e4-ac62-efb686e3bdae | 9bcae08e-ad36-4d97-b9ec-4857714e902a | BN_200 | 400 | big | (null) | (null) |
| 9cb99fd9-9e5e-4240-8162-d28747be01cd | b442e783-4725-41e9-af83-f75004ee1b38 | BN_100 | 100 | bigger | SUCCESS | 1 |
| 13370467-cf0c-47f2-8fea-a215500607e6 | 986d5aa9-0523-42d8-b183-dfd546d3e68 | BN_300 | 10 | (null) | FAILED | 1 |

I'm not entirely sure on this, but it sounds like you just want a left join on the order_fulfillment table with the "success" condition in the join:
select
w.id, w.name, w.location,
o.id as order_id, o.type as order_type,
o.quantity as order_quantity,
f.id as fulfillment_id
from
warehouse w
join warehouse_order o on
w.id = o.warehouse_id
left join warhouse_fulfillment f on
o.id = f.order_id and
f.status = 'SUCCESS'
Since you don't seem to care about non-successful records, and there is guarantee that there will only be one 'SUCCESS' fulfillment record, this should avoid any duplicates.

Related

SQL left join with latest record

I want to left join a table with the latest record only.
I have Customer1 table:
+--------+----------+
| CustID | CustName |
+--------+----------+
| 1 | ABC123 |
| 2 | 456XYZ |
| 3 | 5PQR3 |
| 4 | 789XYZ |
| 5 | 789A |
+--------+----------+
SalesInvoice table:
+------------+--------+-----------+
| InvDate | CustID | InvNumber |
+------------+--------+-----------+
| 2020-03-01 | 1 | IV236 |
| 2020-04-07 | 1 | IV644 |
| 2020-06-13 | 2 | IV869 |
| 2020-03-29 | 3 | IV436 |
| 2020-02-06 | 3 | IV126 |
+------------+--------+-----------+
And I want this required output:
+--------+------------+-----------+
| CustID | InvDate | InvNumber |
+--------+------------+-----------+
| 1 | 2020-04-07 | IV644 |
| 2 | 2020-06-13 | IV869 |
| 3 | 2020-03-29 | IV436 |
| 4 | | |
| 5 | | |
+--------+------------+-----------+
For quick and easy, below is the sample code.
drop table if exists #Customer1
create table #Customer1(CustID int, CustName varchar (100))
insert into #Customer1 values
(1,'ABC123'),
(2,'456XYZ'),
(3,'5PQR3'),
(4,'789XYZ'),
(5,'789A')
drop table if exists #SalesInvoice
create table #SalesInvoice(InvDate DATE, CustID INT, InvNumber varchar (100))
insert into #SalesInvoice values
('2020-03-01',1,'IV236'),
('2020-04-07',1,'IV644'),
('2020-06-13',2,'IV869'),
('2020-03-29',3,'IV436'),
('2020-02-06',3,'IV126')
I like using TOP 1 WITH TIES in this case:
SELECT TOP 1 WITH TIES c.CustID, i.InvDate, i.InvNumber
FROM #Customer1 c
LEFT JOIN #Invoices i ON c.CustID = i.CustID
ORDER BY ROW_NUMBER() OVER (PARTITION BY c.CustID ORDER BY i.InvDate DESC);
Demo
The top 1 trick here is to order by row number, assigning a sequence to each customer, with the sequence descending by invoice date. Then, this approach retains just the most recent invoice record for each customer.
I recommend outer apply:
select c.*, i.*
from #c c outer apply
(select top (1) i.*
from #invoices i
where i.custId = c.custId
order by i.invDate desc
) i;
outer apply implements a special type of join called a "lateral join". This is a very powerful construct. But when learning about them, you can think of a lateral join as a correlated subquery that can return more than one column and more than one row.
You can try ROW_NUMBER window function instead of lateral joins with this simple self-explaining T-SQL
SELECT c.CustID
, d.InvDate
, d.InvNumber
FROM #C c
LEFT JOIN (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY InvDate DESC) AS RowNo
FROM #D
) d
ON c.CustID = d.CustID
AND d.RowNo = 1
Basically ROW_NUMBER is used to filter the "last" invoice in one table scan, instead of performing SELECT TOP 1 ... ORDER BY in the correlated query which has to be executed multiple times -- as much as the number of customers.

Return rows from a table and add field for that row if the ID has a relationship with another table

DBMS used: Amazon Aurora
I have a table that I store a list of all my products, let's call it products
+----+--------------+
| id | product_name |
+----+--------------+
| 1 | Product 1 |
+----+--------------+
| 2 | Product 2 |
+----+--------------+
| | |
+----+--------------+
Another table called redeemed_products stores the ID of the product that the user has redeemed.
+----+---------+------------+
| id | user_id | product_id |
+----+---------+------------+
| 1 | 1 | 1 |
+----+---------+------------+
| | | |
+----+---------+------------+
| | | |
+----+---------+------------+
I would like to retrieve all rows of products and add an extra field to the row which has a relation in the redeemed_products
+----+--------------+----------+
| id | product_name | redeemed |
+----+--------------+----------+
| 1 | Product 1 | true |
+----+--------------+----------+
| 2 | Product 2 | |
+----+--------------+----------+
| | | |
+----+--------------+----------+
The purpose of this is to retrieve the list of products and it will show which of the product has already been redeemed by the user. I do not know how I should approach this problem.
Use an outer join:
select p.id, p.product_name, rp.product_id is not null as redeemed
from products p
left join redeemed_products rp on rp.product_id = p.id;
Note that this will repeat rows from the products table if the product_id occurs more than once in the redeemed_products table (e.g. the same product_id for multiple user_ids).
If that is the case you could use a scalar sub-select:
select p.id, p.product_name,
exists (select *
redeemed_products rp
where rp.product_id = p.id) as redeemed
from products p;
You haven't tagged your DBMS, but the above is standard ANSI SQL, but not all DBMS products actually support boolean expressions like that in the SELECT list.
One option would be using a conditional within a LEFT JOIN query :
SELECT p.*, CASE WHEN r.product_id IS NOT NULL THEN 'true' END AS redeemed
FROM products p
LEFT JOIN redeemed_products r
ON r.product_id = p.id

SQL Select foo if all match condition, return foo

Long buildup prob simple answer...
I know this is going to require a subquery of some kind...
But I am joining 3 tables and trying to get an output...
table one 'Status'
Contains many pk_tickNum
id | pk_tickNum | Status | time
/*table two 'Order'
Only One Order*/
id | pk_order_num | tickNum | taker
/*table three 'Transaction'
Many Transactions, Many Item_num, One location p/item*/
id | pk_transaction | tickNum | item_num | Location
I have a statement that says...
Select
ticket1.pk_tickNum,ticket1.status,ticket1.time,order.pk_order_num
From
Status ticket1 left join Status ticket2
ON
(ticket1.pk_tickNum = ticket2.pk_tickNum AND ticket1.ID < ticket2.ID)
Inner Join
order
ticket1.pk_tickNum = order.tickNum
WHERE
(ticket2.ID IS NULL)
This will give me the most current status of the order....
Works perfectly!!! However, we have Bins, ie: Locations. and every order has multiple items...
As the item moves through the warehouse, every location is recorded. So for every order, there are multiple items and each item has a location to include the 'shipped' location which marks the end.
If I run the above query to left join the third Transaction table I get as many entries as there are item_num on a single transaction. I don't need that!
All I am looking for is a single output for the current status of a ticket if ALL items on a ticket are NOT in location='shipped'
Edit -
Content
Status
id | pk_tickNum | Status |
1 | 123456 | Green |
2 | 123457 | Blue |
3 | 123456 | Yellow |
4 | 123456 | Red |
5 | 123457 | Green |
Order
id | pk_order_num | tickNum |
1 | 987654 | 123456
2 | 987656 | 123457
Transaction
id | pk_transaction | tickNum | item_num | Location
1 | 5555555555 | 123456 | Some | Floor
2 | 5555555556 | 123456 | Thing | Floor
3 | 5555555557 | 123456 | Smart | Shipped
4 | 5555555558 | 123456 | or | Shipped
5 | 5555555559 | 123457 | Really | Shipped
6 | 5555555560 | 123457 | Noth | Shipped
7 | 5555555561 | 123457 | ing | Shipped
Output -
pk_order_num | pk_tickNum | Status |
987654 | 123456 | Red |
/*987656 | 123457 | Green |*/ This should not show!
Answer! - Posted By #Used_By_Already And sample code supplied available at SQLfiddle
Thank you!
I really do hope you don't have tables called "order" and "transaction", if you do make sure they are contained in [] or "" for my sanity I used "s" on the end of those names.
To achieve this result (available at SQLFiddle):
| pk_order_num | tickNum | Status |
|--------------|---------|--------|
| 987654 | 123456 | Red |
I have assumed that the "most recent" row in the status table is determined by the reverse order of the ID column (this isn't a great way to do it, but that's the only available columns to work with). A better column would be a "last updated" datetime value to base this on, perhaps that is the column [time] in that table, but no data was supplied for it.
SELECT
o.pk_order_num
, o.tickNum
, s.Status
FROM [orders] o
INNER JOIN (
select pk_tickNum, Status
, row_number() over(partition by pk_tickNum
order by id desc) rn
from status
) s ON o.ticknum = s.pk_tickNum and s.rn = 1
INNER JOIN (
SELECT
ticknum
FROM [transactions]
GROUP BY ticknum
HAVING COUNT(*) <> SUM(CASE WHEN Location = 'shipped' THEN 1 ELSE 0 END)
) t ON s.pk_tickNum = t.ticknum
;
Also note that the final subquery using the having clause determines if all details in the transactions have been shipped or not. Only orders with unshipped transactions will be returned by that subquery.
Select
s.pk_tickNum, s.status, s.time, o.pk_order_num
From Status s
-- actually this join already multiplies rows: ticket 123456 has more than one record in Status table in your sample data
Inner Join order o ON s.pk_tickNum = o.tickNum
WHERE NOT EXISTS
(
-- why is it named `pk_tickNum` if this is not a PK?
SELECT 1 FROM Status ticket2
WHERE s.pk_tickNum = ticket2.pk_tickNum AND s.ID < ticket2.ID
)
AND NOT EXISTS
(
-- might catch "empty orders" if any
SELECT 1 FROM Transaction t
WHERE t.tickNum = s.pk_tickNum
and t.Location = 'shipped'
)
Note, output from your sample data would be empty, because ticket 123456 has two items with location 'shipped' which violates conditions you described.

Get nth level on self-referencing table

I have a self-referencing table which has at max 5 levels
groupid | parentid | detail
--------- | --------- | ---------
Group A | Highest | nope
Group B | Group A | i need this
Highest | NULL | nope
Group C | Group B | nope
Group D | Group C | nope
I have a transaction table which lookups to the groupid on the table above to retrieve the detail value where groupid = Group B. The values of the groupid on the transaction table is only between Group B to D and will never go any higher.
txnid | groupid | desired | desired
--------- | --------- | --------- | ---------
1 | Group D | Group B | i need this
2 | Group B | Group B | i need this
3 | Group C | Group B | i need this
4 | Group B | Group B | i need this
How should my T-SQL script be like to attain the desired column? I can left join to the self referencing table multiple times to get until group B it's not consistent on how many time I need to join back.
Greatly appreciate any thoughts!
Still not clear to me how do you know which is the GROUP B, I suppose it's the record where the parent of it parent is null.
create table org(groupid char(1), parentid char(1), details varchar(20));
insert into org values
('a', null, 'nope'),('b', 'a', 'I need this'),('c', 'b', 'nope'),('d', 'c', 'nope'),('e', 'd', 'nope');
create table trans(id int, groupid char(1));
insert into trans values
(1, 'b'),(2, 'c'),(3, 'c'),(4, 'd'),(5, 'e');
GO
10 rows affected
with all_levels as
(
select ob.groupid groupid_b, oc.groupid groupid_c,
od.groupid groupid_d, oe.groupid groupid_e,
ob.details
from org ob
inner join org oc
on oc.parentid = ob.groupid
inner join org od
on od.parentid = oc.groupid
inner join org oe
on oe.parentid = od.groupid
where ob.parentid is not null
) select * from all_levels;
GO
groupid_b | groupid_c | groupid_d | groupid_e | details
:-------- | :-------- | :-------- | :-------- | :----------
b | c | d | e | I need this
--= build a 4 levels row
with all_levels as
(
select ob.groupid groupid_b, oc.groupid groupid_c,
od.groupid groupid_d, oe.groupid groupid_e,
ob.details
from org ob
inner join org oc
on oc.parentid = ob.groupid
inner join org od
on od.parentid = oc.groupid
inner join org oe
on oe.parentid = od.groupid
where ob.parentid is not null
)
--= no matter what groupid returns b group details
, only_b as
(
select groupid_b as groupid, groupid_b, details from all_levels
union all
select groupid_c as groupid, groupid_b, details from all_levels
union all
select groupid_d as groupid, groupid_b, details from all_levels
union all
select groupid_e as groupid, groupid_b, details from all_levels
)
--= join with transactions table
select id, t.groupid, groupid_b, ob.details
from trans t
inner join only_b ob
on ob.groupid = t.groupid;
GO
id | groupid | groupid_b | details
-: | :------ | :-------- | :----------
1 | b | b | I need this
2 | c | b | I need this
3 | c | b | I need this
4 | d | b | I need this
5 | e | b | I need this
dbfiddle here
You can deal with a recursive function too, but I don't believe it can be better on terms of performance.
create function findDetails(#groupid char(1))
returns varchar(100)
as
begin
declare #parentid char(1) = '1';
declare #next_parentid char(1) = '1';
declare #details varchar(100) = '';
while #next_parentid is not null
begin
select #details = org.details, #parentid = org.parentid, #next_parentid = op.parentid
from org
inner join org op
on op.groupid = org.parentid
where org.groupid = #groupid
set #groupid = #parentid;
end
return #details;
end
GO
✓
select id, groupid, dbo.findDetails(groupid) as details_b
from trans;
GO
id | groupid | details_b
-: | :------ | :----------
1 | b | I need this
2 | c | I need this
3 | c | I need this
4 | d | I need this
5 | e | I need this
dbfiddle here

Doing a market basket analysis on the order details

I have a table that looks (abbreviated) like:
| order_id | item_id | amount | qty | date |
|---------- |--------- |-------- |----- |------------ |
| 1 | 1 | 10 | 1 | 10-10-2014 |
| 1 | 2 | 20 | 2 | 10-10-2014 |
| 2 | 1 | 10 | 1 | 10-12-2014 |
| 2 | 2 | 20 | 1 | 10-12-2014 |
| 2 | 3 | 45 | 1 | 10-12-2014 |
| 3 | 1 | 10 | 1 | 9-9-2014 |
| 3 | 3 | 45 | 1 | 9-9-2014 |
| 4 | 2 | 20 | 1 | 11-11-2014 |
I would like to run a query that would calculate the list of items
that most frequently occur together.
In this case the result would be:
|items|frequency|
|-----|---------|
|1,2, |2 |
|1,3 |1 |
|2,3 |1 |
|2 |1 |
Ideally, first presenting orders with more than one items, then presenting
the most frequently ordered single items.
Could anyone please provide an example for how to structure this SQL?
This query generate all of the requested output, in the cases where 2 items occur together. It doesn't include the last item of the requested output since a single value (2) technically doesn't occur together with anything... although you could easily add a UNION query to include values that happen alone.
This is written for PostgreSQL 9.3
create table orders(
order_id int,
item_id int,
amount int,
qty int,
date timestamp
);
INSERT INTO ORDERS VALUES(1,1,10,1,'10-10-2014');
INSERT INTO ORDERS VALUES(1,2,20,1,'10-10-2014');
INSERT INTO ORDERS VALUES(2,1,10,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,2,20,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,3,45,1,'10-12-2014');
INSERT INTO ORDERS VALUES(3,1,10,1,'9-9-2014');
INSERT INTO ORDERS VALUES(3,3,45,1,'9-9-2014');
INSERT INTO ORDERS VALUES(4,2,10,1,'11-11-2014');
with order_pairs as (
select (pg1.item_id, pg2.item_id) as items, pg1.date
from
(select distinct item_id, date
from orders) as pg1
join
(select distinct item_id, date
from orders) as pg2
ON
(
pg1.date = pg2.date AND
pg1.item_id != pg2.item_id AND
pg1.item_id < pg2.item_id
)
)
SELECT items, count(*) as frequency
FROM order_pairs
GROUP by items
ORDER by items;
output
items | frequency
-------+-----------
(1,2) | 2
(1,3) | 2
(2,3) | 1
(3 rows)
Market Basket Analysis with Join.
Join on order_id and compare if item_id < self.item_id. So for every item_id you get its associated items sold. And then group by items and count the number of rows for each combinations.
select items,count(*) as 'Freq' from
(select concat(x.item_id,',',y.item_id) as items from orders x
JOIN orders y ON x.order_id = y.order_id and
x.item_id != y.item_id and x.item_id < y.item_id) A
group by A.items order by A.items;