sql select orders with similar items - sql

I need to select those orders in pairs who have the same products in them. ORDER_ITEMS contain the product and a foreign key to reference the parent ORDER row. Order rows need to be different.
I've managed to list out pairs with count how many matching products they have in them, but that's only a similarity count. I need to exclude orders from pairs who have different products in them.
Can have Oracle specific stuff in it.
The two tables are:
Order(order_id, customer_id...)
Order_Item(item_id, order_id FK, product_id,...)
I need tose order_id-s that have all Order_Item childs with matching product_id-s.
Ex. in Orders
{ (ord1, cust1)
(ord2, cust2)}
and in Order_Items
{ (item1, ord1, product_id=3),
(item2, ord1, product_id=6),
(item3, ord2, product_id=3),
(item4, ord2, product_id=6) }
So basically, two people bought exactly the same two things. They are a pair. Those orders whose ordered products don't match exactly are not listed.

You haven't specified db version so I'm assuming 11g - not tested, but I think it will give you the general idea:
SELECT * FROM (
WITH qry AS (
SELECT DISTINCT
order_id
,LISTAGG(product_id,'+')
WITHIN GROUP (ORDER BY product_id)
AS order_signature
FROM order_items
GROUP BY order_id)
SELECT order_id
,order_signature
,COUNT(DISTINCT order_id)
OVER (PARTITION BY order_signature)
count_same
FROM qry
) WHERE count_same > 1;
Limitation: it won't work if some orders are very big, e.g. 100s or 1000s of product IDs.

I'm not sure what your final data set needs to look like, but selecting from Customers with an EXISTS expression in the WHERE clause to look for order matches will get you there.

Related

What's the use of this WHERE clause

this is an answer to the question : We need a list of customer IDs with the total amount they have ordered. Write a SQL statement to return customer ID (cust_id in the Orders table) and total_ordered using a subquery to return the total of orders for each customer. Sort the results by amount spent from greatest to the least. Hint: you’ve used the SUM() to calculate order totals previously.
SELECT prod_name,
(SELECT Sum(quantity)
FROM OrderItems
WHERE Products.prod_id=OrderItems.prod_id) AS quant_sold
FROM Products
;
So there is this simple code up here, and I know that this WHERE clause is comparing two columns in two different tables. But since We are calculating the SUM of that quantity, why do need that WHERE clause exactly. I really couldn't get it. Why the product_id exactly and not any other column ( p.s: the only shared column between those two tables is prod_id column ) I am still a beginner. Thank you!
First you would want to know the sum for each product - so need to adjust the subquery similar to this:
(SELECT prod_id, Sum(quantity) qty
FROM OrderItems
group by prod_id
) AS quant_sold
then once you know how much for each product, then you can link that
SELECT prod_name,
(SELECT prod_id, Sum(quantity) qty
FROM OrderItems
group by prod_id
) AS quant_sold
FROM Products p
WHERE p.prod_id = quant_sold.prod_id
Run it without the where clause and compare the results. You'll learn a lot that way. specifically focus on two different product Ids ensuring they both have order items and quantities.
You have two different tables involved. There are multiple products. You don't want the sum of all orders on each product; which is what you would get without the where clause. So the where clause correlates the two tables ensuring you only SUM the quantity of each order item for each product between the tables. Personally, I'd use a join, sum, and a group by as I find it easier to read and I'm not a fan of sub selects in the select of another query; but that's me.
SELECT prod_name,
(SELECT Sum(quantity)
FROM OrderItems
WHERE Products.prod_id=OrderItems.prod_id) AS quant_sold
FROM Products
Should be the same as:
SELECT prod_name, Sum(coalesce(P.quantity,0))
FROM Products P
LEFT JOIN orderItems OI
on P.prod_id=OI.prod_id
GROUP BY Prod_Name
'Notes
the above is untested.
a left join is needed because all products should be listed and if a product doesn't have an order, the quantity would be zero.
if we use an inner join, the product would be excluded.
We use coalesce because you'd have a "Null" quantity instead of zero for such lines without an order item.
as to which is "right" well it depends and varies on different cases. each has it's own merits and in different cases, one will perform better than another, and in a different case, vice-versa. See --> Join vs. sub-query
As an example:
Say you have Products A & B
"A" has Order Item Quantities of 1 & 2
"B" has order item Quantities of 10 & 20
If we don't have the where clause every result record would have qty 33
If we have the where product "A" would have 3
product "B" would have qty 30.

Return all data when grouping on a field

I have the following 2 tables (there are more fields in the real tables):
create table publisher(id serial not null primary key,
name text not null);
create table product(id serial not null primary key,
name text not null,
publisherRef int not null references publisher(id));
Sample data:
insert into publisher (id,name) values (1,'pub1'),(2,'pub2'),(3,'pub3');
insert into product (name,publisherRef) values('p1',1),('p2',2),('p3',2),('p4',2),('p5',3),('p6',3);
And I would like the query to return:
name, numProducts
pub2, 3
pub3, 2
pub1, 1
A product is published by a publisher. Now I need a list of name, id of all publishers which have at least one product, ordered by the total number of products each publisher has.
I can get the id of the publishers ordered by number of products with:
select publisherRef AS id, count(*)
from product
order by count(*) desc;
But I also need the name of each publisher in the result. I thought I could use a subquery like:
select *
from publisher
where id in (
select publisherRef
from product
order by count(*) desc)
But the order of rows in the subquery is lost in the outer SELECT.
Is there any way to do this with a single sql query?
SELECT pub.name, pro.num_products
FROM (
SELECT publisherref AS id, count(*) AS num_products
FROM product
GROUP BY 1
) pro
JOIN publisher pub USING (id)
ORDER BY 2 DESC;
db<>fiddle here
Or (since the title mentions "all data") return all columns of the publisher with pub.*. After products have been aggregated in the subquery, you are free to list anything in the outer SELECT.
This only lists publisher which
have at least one product
And the result is ordered by
the total number of products each publisher has
It's typically faster to aggregate the "n"-table before joining to the "1"-table. Then use an [INNER] JOIN (not a LEFT JOIN) to exclude publishers without products.
Note that the order of rows in an IN expression (or items in the given list - there are two syntax variants) is insignificant.
The column alias in publisherref AS id is totally optional to use the simpler USING clause for identical column names in the following join condition.
Aside: avoid CaMeL-case names in Postgres. Use unquoted, legal, lowercase names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?

GROUP BY clause with logical functions

I'm using Oracle 11g Application Express, and executing these commands within the SQL Plus CLI. This is for a class, and I cannot get past this problem. I don't know how to add the total quantity of the items on the orders - I get confused as I don't know how to take the SUM of the QUANTITY per ORDER (customers have multiple orders).
For each customer having an order, list the customer number, the number of orders
that customer has, the total quantity of items on those orders, and the total price for
the items. Order the output by customer number. (Hint: You must use a GROUP BY
clause in this query).
Tables (we will use):
CUSTOMER: contains customer_num
ORDERS: contains order_num, customer_num
ITEMS: contains order_num, quantity, total_price
My logic: I need to be able to calculate the sum of the quantity per order number per customer. I have sat here for over an hour and cannot figure it out.
So far this is what I can formulate..
SELECT customer_num, count(customer_num)
FROM orders
GROUP BY customer_num;
I don't understand how to GROUP BY very well (yes, I have googled and researched it for a bit, and it just isn't clicking), and I have no clue how to take the SUM of the QUANTITY per ORDER per CUSTOMER.
Not looking for someone to do my work for me, just some guidance - thanks!
select o.customer_num,
count(distinct o.order_num) as num_orders,
sum(i.quantity) as total_qty,
sum(i.total_price) as total_price
from orders o
join items i
on o.order_num = i.order_num
group by o.customer_num
order by o.customer_num
First thing:
you have to join the two tables necessary to solve the problem (orders and items). The related field appears to be order_num
Second thing:
Your group by clause is fine, you want one row per customer. But because of the join to the items table, you will have to count DISTINCT orders (because there may be a one to many relationship between orders and items). Otherwise an order with 2 different associated items would be counted twice.
Next, sum the quantity and total price, you can do this now because you've joined to the needed table.
This is also solved by using WHERE:
SELECT orders.customer_num, /*customer number*/
COUNT(DISTINCT orders.order_num) AS "num_orders", /*number of orders/c*/
SUM(items.quantity) as "total_qty", /*total quantity/c*/
SUM(items.total_price) as "total_price" /*total price/items*/
/* For each customer having an order, list the customer number,
the number of orders that customer has, the total quantity of items
on those orders, and the total price for the items.
Order the output by customer number.
(Hint: You must use a GROUP BY clause in this query). */
FROM orders, items
WHERE orders.order_num = items.order_num
GROUP BY orders.customer_num
ORDER BY orders.customer_num
;

SQL Optimized query to get suggested products [duplicate]

I will try now to better explain what is the purpose of the following question and query.
Let's assume we are talking about a ecommerce environment and database. We have, among many others, three tables: products, orders and orders_data. The ORDERS table will handle all placed orders and a sub-table, which we will call ORDERS_DATA will store all products recorded within an order.
Following the tables definition, without those unuseful fields for my question:
ORDERS (*id*, date, totale, ...);
ORDERS_DATA (id_order, id_product, ...);
PRODUCTS (id, name, ...);
The primary key in ORDERS is id, while in ORDERS_DATA the key is (id_order, id_product).
What I would like to do with a query is to retrive, given a PRODUCT ID (while surfing on a product page or cart page as well), suggested products based on ORDERS_DATA table. The query will return all id_product which lives in those orders, where, at least one of these products is the given PRODUCT ID
For the sake of semplicity I will report an example. Let's assume we have these ROWs in ORDERS_DATA:
R1(1, 1); R2(1, 2); R3(1, 3); R4(2, 2), R5(2, 5); R6(3, 3);
I want suggestions for product id = 2. My query will return ids: 1 and 3 (from R1 and R3 - they share the same order id and this order have also product 2 - from R2 - btw its products) and product 5 thanks to the order number 2. Order number 3 will be ignored.
This is the query I wrote, but I'm quite sure it is not the best one in performance and style.
SELECT
A.id_product, COUNT( A.id_product ) AS num
FROM
orders_data A, orders_data B
WHERE
A.id_order = B.id_order
AND B.id_product IN (*IDs-HERE*)
AND A.id_product NOT IN (*IDs-HERE*)
GROUP BY A.id_product
I've already use the INNER JOIN syntax but nothing change in performance.
The query take 0,0022sec with just one product id in the IN clausole. Performance will decrese exponetially with multiple products id (for instance during the cart page, with more products in the basket).
Thanks.
Try replacing the JOIN with an EXISTS test:
SELECT
id_product,
COUNT(1) As num
FROM
orders_data As A
WHERE
id_product NOT IN (*IDs HERE*)
And
Exists
(
SELECT 1
FROM orders_data As B
WHERE B.id_order = A.id_order
And B.id_product IN (*IDs HERE*)
)
GROUP BY
id_product

How to join table in many-to-many relationship?

Here is a simplified version of my problem. I have two tables. Each table has a unique ID field, but it's irrelevant in this case.
shipments has 3 fields: shipment_id, receive_by_datetime, and qty.
deliveries has 4 fields: delivery_id, shipment_id, delivered_on_datetime, and qty.
In shipments, the shipment_id and receive_by_datetime fields always match up. There are many rows in the table that would appear to be duplicates based off of those two columns (but they aren't... other fields are different).
In deliveries, the shipment_id matches up to the shipments table. There are also many rows that would appear to be duplicates based off of the delivery_id and delivered_on_datetime fields (but they aren't again... other fields exist that I didn't list).
I am trying to pull one row per aggregate delivered_on_datetime and receive_by_datetime, but because of the many-to-many relationships, it's difficult. Is a query somewhere along these lines correct?
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty)
FROM deliveries d
LEFT JOIN (
SELECT DISTINCT s1.shipment_id, s1.receive_by_datetime
FROM shipments s1
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
You will run into problems where the total SUM(d.qty) will be larger than the value from SELECT SUM(qty) FROM deliveries
Something like this might be better suited for you:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, s1.receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id, s1.received_by_datetime
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
If you somehow have (or might have) a shipment_id that has multiple values for received_by_datetime and it's best practice to assume that something else might have corrupted the data slightly then to prevent the lines in the deliveries table being duplicated while still returning a valid result you can use:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, MAX(s1.receive_by_datetime) AS receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
Yep, the problem with many-to-many is you get the cartesian product of rows, so you end up counting the same row more than once. Once for each other row it matches against.
In shipments, the shipment_id and receive_by_datetime fields always match up
If this means there cannot be two shipments with the same ID but different dates then your query will work. But in general it is not safe. i.e. If subselect distinct could return more than one row per shipment ID, you will be subject to the double counting issue. Generically this is a very tricky problem to solve - in fact I see no way it could be with this data model.