How to join tables together based on subselects? - sql

I'm stuck to figure out how to write a query. Basically I've three tables (Orders, Products, Orders_Products) which I want to join together and apply some filtering.
Orders table:
ORDER_ID CUSTOMER_ID
1 1
2 2
Products table:
PRODUCT_ID PRODUCT_NAME PRODUCT_TITLE
1 'P1' 'T1'
2 'P1' 'T2'
3 'P2' 'T3'
4 'P2' 'T4'
5 'P2' 'T5'
6 'P3' 'T6'
Orders_Products table:
ORDER_ID PRODUCT_ID
1 1
1 3
2 1
2 3
2 6
For example I want to get all Orders which consists (exactly) of the products P1/T1 and P2/T3. I tried something like this, but that doesn't work:
SELECT * FROM Orders
LEFT JOIN Orders_Products ON Orders_Products.ORDER_ID = Orders.ORDER_ID
LEFT JOIN Products ON Orders_Products.PRODUCT_ID = Products.PRODUCT_ID
WHERE EXISTS (SELECT * FROM Product WHERE PRODUCT_NAME = 'P1' AND PRODUCT_TITLE = 'T1')
AND EXISTS (SELECT * FROM Product WHERE PRODUCT_NAME = 'P2' AND PRODUCT_TITLE = 'T3');
EDIT: To clarify what I really have to achieve. The user should be able to search for orders matching the given products. The user enters one or more product name / product title combinations and gets all the orders which have exactly this products associated. What I get (from a web application) are only the name/title combinations and I have to use those in a query to get the ORDER_ID.

SELECT OrderID, COUNT(*) AS ProductsCount
FROM Orders_Products
WHERE (PRODUCT_ID = 1 OR PRODUCT_ID = 3)
GROUP BY OrderID
HAVING COUNT(*) = 2
EDIT: Please ignore the above statement. See if the following works.
SELECT OrderID,
SUM(CASE PRODUCT_ID WHEN 1 THEN 1 WHEN 3 THEN 1 ELSE 3 END)
AS ProductsCount
FROM Orders_Products
GROUP BY OrderID
HAVING SUM(CASE PRODUCT_ID WHEN 1 THEN 1 WHEN 3 THEN 1 ELSE 3 END) = 2
I guess this should get you Orders which has only these 2 products.

You probably cannot write simple queries in MySQL to achieve this. But ANSI SQL supports table value constructor which simplifies this type of query.
This basic query returns the full list of orders (5 rows):
SELECT * FROM Products
JOIN Orders_Products ON Orders_Products.PRODUCT_ID = Products.PRODUCT_ID
JOIN Orders ON Orders_Products.ORDER_ID = Orders.ORDER_ID
This query with table value constructor returns the orders that you need:
SELECT * FROM Products
JOIN Orders_Products ON Orders_Products.PRODUCT_ID = Products.PRODUCT_ID
JOIN Orders ON Orders_Products.ORDER_ID = Orders.ORDER_ID
LEFT JOIN (VALUES('P1', 'T1'), ('P2', 'T3')) V(P_NAME, P_TITLE) ON PRODUCT_NAME = P_NAME AND PRODUCT_TITLE=P_TITLE
This query groups the above to returns the ORDER_ID where there is no order outside the required list (eliminating the orders that have rows containing null):
SELECT ORDER_ID FROM Products
JOIN Orders_Products ON Orders_Products.PRODUCT_ID = Products.PRODUCT_ID
JOIN Orders ON Orders_Products.ORDER_ID = Orders.ORDER_ID
LEFT JOIN (VALUES('P1', 'T1'), ('P2', 'T3')) V(P_NAME, P_TITLE) ON PRODUCT_NAME = P_NAME AND PRODUCT_TITLE=P_TITLE
GROUP BY ORDER_ID HAVING COUNT(*) = 2
Among open source databases, HSQLDB is one that supports table value constructor and other user friendly features of ANSI SQL:2008

Related

Using MS Access SQL - how would I select the top 5 food items by group (restaurant) based on the number of orders?

Relationship Diagram
What I want to be able to do is return a query that shows the top 5 items/products on the menu for each of the 3 restaurants in the dataset. I've attached an example of the relationship diagram for some more context. The columns I would like to see in the query are:
RestaurantName
ItemName
NumberofOrders (alias column)
This is what I have at the moment but it doesn't work as expected for the top 5.
SELECT RestaurantName, ItemName, COUNT(Orders.OrderNumber) AS NumberofOrders
FROM ((((Restaurants INNER JOIN
Orders ON Restaurants.RestID = Orders.RestID) INNER JOIN
OrderDetails ON Orders.OrderNumber=OrderDetails.OrderNumber) INNER JOIN
Products ON OrderDetails.ItemID = Products.ItemID) INNER JOIN
FoodType ON Products.ProdTypeID = FoodType.ProdType)
WHERE ItemName IN
(SELECT TOP 5 ItemName
FROM Products
WHERE ItemName IS NOT NULL)
GROUP BY RestaurantName, ItemName
ORDER BY COUNT(Orders.OrderNumber) DESC;
This just repeats the same 5 items across all the restaurants. Any point in the right direction would be awesome.
EDIT 1:
Based on a response I got yesterday, I have made some amendments to the code. This the query is returning the full list, as though ignoring the top 5 in the subquery. I can see all the items are sorted by Total Orders (I have also changed the formula for this) Any ideas what I am doing wrong here?
SELECT RestaurantName, ItemName, SUM(Quantity)*COUNT(Orders.OrderNumber) AS TotalOrders
FROM ((((Restaurants INNER JOIN
Orders ON Restaurants.RestID = Orders.RestID) INNER JOIN
OrderDetails ON Orders.OrderNumber=OrderDetails.OrderNumber) INNER JOIN
Products ON OrderDetails.ItemID = Products.ItemID) INNER JOIN
FoodType ON Products.ProdTypeID = FoodType.ProdType)
WHERE ItemName IN
(SELECT TOP 5 p2.ItemName
FROM Products AS p2
WHERE p2.ItemName = Products.ItemName
GROUP BY p2.ItemName
ORDER BY COUNT(*) DESC)
GROUP BY RestaurantName, ItemName
ORDER BY RestaurantName, SUM(Quantity) DESC;
Thanks
You want a correlated subquery:
WHERE ItemName IN (SELECT TOP 5 p2.ItemName
FROM Products as p2
WHERE p2.RestaurantName = products.RestaurantName
GROUP BY p2.ItemName
ORDER BY COUNT(*) DESC
)
It seems really odd to me that a table called products would have a column called RestaurantName. But you claim that your query works and it has the same reference.
Your filter in the outer WHERE is only using the ItemName field. For your purpose it should contain both fields.
Like so:
SELECT
RestaurantName,
ItemName,
COUNT(Orders.OrderNumber) AS NumberofOrders
FROM Restaurants
INNER JOIN Orders ON Restaurants.RestID = Orders.RestID
INNER JOIN OrderDetails ON Orders.OrderNumber=OrderDetails.OrderNumber
INNER JOIN Products ON OrderDetails.ItemID = Products.ItemID
INNER JOIN FoodType ON Products.ProdTypeID = FoodType.ProdType
WHERE (RestaurantName, ItemName) IN
(SELECT TOP 5 RestaurantName, ItemName
FROM Products
WHERE RestaurantName IS NOT NULL)
GROUP BY RestaurantName, ItemName
ORDER BY COUNT(Orders.OrderNumber) DESC;

qrying DB where two customers in one order

lets say I have two tables,
Order
order_id (PK)
ordered_date
CustomerOrders
Customer_order_id (PK)
order_id (FK)
customer_type(char1) ( can be S, T and M)
If one or more different types of customers involved in an order, the table will look like
Order
order_id 5
order_date '05-06-2020'
CusotmerOrder
customer_order_id 1
order_id 5
type 'M'
customer_order_id 2
order_id 5
type 'S'
and so on
How can I write a qry that will return all unique order_ids that have combination of S and M type customers?
It is easy self join query:
SELECT DISTINCT M.order_id
FROM CustomerOrders AS M
INNER JOIN CustomerOrders AS S
ON M.order_id = S.order_id
WHERE M.customer_type = 'M'
AND S.customer_type = 'S'
You can use exists:
Select distinct order_id
from CustomerOrder co
where exists (select * from CustomerOrder co1
where co.order_id = co1.order_id and co1.Type = 'M') and
exists (select * from CustomerOrder co1
where co.order_id = co1.order_id and co1.Type = 'S');

SQL aggregate query with one-to-many relationship with postgres

I'm trying to aggregate a list of product skus with a query that relates through a line_items table. I've abstracted a simple example of my use case:
my expected result would look like this:
id name skus
1 mike bar sku1,sku2,sku3
2 bort baz sku4
given a schema and data like:
products
id sku
1 sku1
2 sku2
3 sku3
4 sku4
line_items
id order_id product_id
1 1 1
2 1 2
3 1 3
4 2 4
addresses
id name
1 'bill foo'
2 'mike bar'
3 'bort baz'
orders
id address_id total
1 2 66
2 3 99
here's a working query, but it's not correct, i'm getting ALL products for each order. my WHERE should be using orders.id
http://sqlfiddle.com/#!15/70cd7/3/0
however, i can't seem to use orders.id? i'm guessing i need to use a JOIN or LEFT JOIN or somehow change the order of things in my query...
http://sqlfiddle.com/#!15/70cd7/4
http://sqlfiddle.com/#!15/70cd7/12
SELECT orders.id,
addresses.name,
array_agg(DISTINCT products.sku )
FROM orders
LEFT JOIN addresses
ON orders.address_id = addresses.id
LEFT JOIN line_items
ON line_items.order_id = orders.id
LEFT JOIN products
ON products.id = line_items.product_id
GROUP BY orders.id,addresses.name
You can use a correlated subquery with a JOIN to get the list of skus for each order
SELECT
o.id,
a.name,
(SELECT array_to_string(array_agg(sku), ',') AS Skus
FROM products p
INNER JOIN line_items li
ON li.product_id = p.id
WHERE li.order_id = o.id
) AS Skus
FROM orders o
INNER JOIN addresses a
ON a.id = o.address_id
ONLINE DEMO
One solution could be
SELECT orders.id,
addresses.name,
(SELECT string_agg(sku,',') AS skus
FROM products
WHERE id IN
(SELECT DISTINCT line_items.product_id
FROM line_items
WHERE line_items.order_id = orders.id))
FROM orders
inner join addresses
on orders.address_id = addresses.id
;
SQLFiddle

Segment purchases based on new vs returning

I'm trying to write a query that can select a particular date and count how many of those customers have placed orders previously and how many are new. For simplicity, here is the table layout:
id (auto) | cust_id | purchase_date
-----------------------------------
1 | 1 | 2010-11-15
2 | 2 | 2010-11-15
3 | 3 | 2010-11-14
4 | 1 | 2010-11-13
5 | 3 | 2010-11-12
I was trying to select orders by a date and then join any previous orders on the same user_id from previous dates, then count how many had orders, vs how many didnt. This was my failed attempt:
SELECT SUM(
CASE WHEN id IS NULL
THEN 1
ELSE 0
END ) AS new, SUM(
CASE WHEN id IS NOT NULL
THEN 1
ELSE 0
END ) AS returning
FROM (
SELECT o1 . *
FROM orders AS o
LEFT JOIN orders AS o1 ON ( o1.user_id = o.user_id
AND DATE( o1.created ) = "2010-11-15" )
WHERE DATE( o.created ) < "2010-11-15"
GROUP BY o.user_id
) AS t
Given a reference data (2010-11-15), then we are interested in the number of distinct customers who placed an order on that date (A), and we are interested in how many of those have placed an order previously (B), and how many did not (C). And clearly, A = B + C.
Q1: Count of orders placed on reference date
SELECT COUNT(DISTINCT Cust_ID)
FROM Orders
WHERE Purchase_Date = '2010-11-15';
Q2: List of customers placing order on reference date
SELECT DISTINCT Cust_ID
FROM Orders
WHERE Purchase_Date = '2010-11-15';
Q3: List of customers who placed an order on reference date who had ordered before
SELECT DISTINCT o1.Cust_ID
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15';
Q4: Count of customers who placed an order on reference data who had ordered before
SELECT COUNT(DISTINCT o1.Cust_ID)
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15';
Q5: Combining Q1 and Q4
There are several ways to do the combining. One is to use Q1 and Q4 as (complicated) expressions in the select-list; another is to use them as tables in the FROM clause which don't need a join between them because each is a single-row, single-column table that can be joined in a Cartesian product. Another would be a UNION, where each row is tagged with what it calculates.
SELECT (SELECT COUNT(DISTINCT Cust_ID)
FROM Orders
WHERE Purchase_Date = '2010-11-15') AS Total_Customers,
(SELECT COUNT(DISTINCT o1.Cust_ID)
FROM Orders AS o1
JOIN (SELECT DISTINCT o2.Cust_ID
FROM Orders AS o2
WHERE o2.Purchase_Date = '2010-11-15') AS c1
ON o1.Cust_ID = c1.Cust_ID
WHERE o1.Purchase_Date < '2010-11-15') AS Returning_Customers
FROM Dual;
(I'm blithely assuming MySQL has a DUAL table - similar to Oracle's. If not, it is trivial to create a table with a single column containing a single row of data. Update 2: bashing the MySQL 5.5 Manual shows that 'FROM Dual' is supported but not needed; MySQL is happy without a FROM clause.)
Update 1: added qualifier 'o1.Cust_ID' in key locations to avoid 'ambiguous column name' as indicated in the comment.
How about
SELECT * FROM
(SELECT * FROM
(SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 1 AS OLD_CUSTOMER, 0 AS NEW_CUSTOMER
FROM ORDERS
GROUP BY CUST_ID
HAVING ORDER_COUNT > 1)
UNION ALL
(SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 0 AS OLD_CUSTOMER, 1 AS NEW_CUSTOMER
FROM ORDERS
GROUP BY CUST_ID
HAVING ORDER_COUNT = 1)) G
INNER JOIN
(SELECT CUST_ID, ORDER_DATE
FROM ORDERS) O
USING (CUST_ID)
WHERE ORDER_DATE = [date of interest] AND
OLD_CUSTOMER = [0 or 1, depending on what you want] AND
NEW_CUSTOMER = [0 or 1, depending on what you want]
Not sure if that'll do the whole thing, but it might provide a starting point.
Share and enjoy.
select count(distinct o1.cust_id) as repeat_count,
count(distinct o.cust_id)-count(distinct o1.cust_id) as new_count
from orders o
left join (select cust_id
from orders
where purchase_date < "2010-11-15"
group by cust_id) o1
on o.cust_id = o1.cust_id
where o.purchase_date = "2010-11-15"

Optimize SQL query for canceled orders

Here is a subset of my tables:
orders:
- order_id
- customer_id
order_products:
- order_id
- order_product_id (unique key)
- canceled
I want to select all orders (order_id) for a given customer(customer_id), where ALL of the products in the order are canceled, not just some of the products. Is there a more elegantly or efficient way of doing it than this:
select order_id from orders
where order_id in (
select order_id from orders
inner join order_products on orders.order_id = order_products.order_id
where order_products.customer_id = 1234 and order_products.canceled = 1
)
and order_id not in (
select order_id from orders
inner join order_products on orders.order_id = order_products.order_id
where order_products.customer_id = 1234 and order_products.canceled = 0
)
If all orders have at least one row in order_products, Try this
Select order_id from orders o
Where Not Exists
(Select * From order_products
Where order_id = o.order_id
And cancelled = 1)
If the above assumption is not true, then you also need:
Select order_id from orders o
Where Exists
(Select * From order_products
Where order_id = o.order_id)
And Not Exists
(Select * From order_products
Where order_id = o.order_id
And cancelled = 1)
The fastest way will be this:
SELECT order_id
FROM orders o
WHERE customer_id = 1234
AND
(
SELECT canceled
FROM order_products op
WHERE op.order_id = o.order_id
ORDER BY
canceled DESC
LIMIT 1
) = 0
The subquery will return 0 if and only if there had been some products and they all had been canceled.
If there were no products at all, the subquery will return NULL; if there is at least one uncanceled product, the subquery will return 1.
Make sure you have an index on order_products (order_id, canceled)
Something like this? This assumes that every order has at least one product, otherwise this query will return also orders without any products.
select order_id
from orders o
where not exists (select 1 from order_products op
where canceled = 0
and op.order_id = o.order_id
)
and o.customer_id = 1234
SELECT customer_id, order_id, count(*) AS product_count, sum(canceled) AS canceled_count
FROM orders JOIN order_products
ON orders.order_id = order_products.order_id
WHERE customer_id = <<VALUE>>
GROUP BY customer_id, order_id
HAVING product_count = canceled_count
You can try something like this
select orders.order_id
from #orders orders inner join
#order_products order_products on orders.order_id = order_products.order_id
where order_products.customer_id = 1234
GROUP BY orders.order_id
HAVING SUM(order_products.canceled) = COUNT(order_products.canceled)
Since we don't know the database platform, here's an ANSI standard approach. Note that this assumes nothing about the schema (i.e. data type of the cancelled field, how the cancelled flag is set (i.e. 'YES',1,etc.)) and uses nothing specific to a given database platform (which would likely be a more efficient approach if you could give us the platform and version you are using):
select op1.order_id
from (
select op.order_id, cast( case when op.cancelled is not null then 1 else 0 end as tinyint) as is_cancelled
from #order_products op
) op1
group by op1.order_id
having count(*) = sum(op1.is_cancelled);