Finding pairs that do not exist in a different table

Finding pairs that do not exist in a different table - sql

I have a table (orders) with order id, location 1, location 2 and another table (mileage) with location 1 and location 2.
I'm using the Except action to return those location pairs in orders that are not in mileage. But I'm not sure how I can also return the corresponding order_id that belongs to those pairs (order_id doesn't exist in the mileage table). The only thing I can think of is having an outer select statement that searches orders for those location pairs. I haven't tried it but I'm looking for other options.
I have something like this.
SELECT location_id1, location_id2
FROM orders
except
SELECT lm.origin_id, lm.dest_id
from mileage
How can I also retrieve the order id for those pairs?

You might try using a Not Exists statement instead:
Select O.order_id, O.location_id1, O.location_id2
From orders As O
Where Not Exists (
Select 1
From mileage As M1
Where M1.origin_id = O.location_id1
And M1.dest_id = O.location_id2
)
Another solution if you really wanted to use Except
Select O.order_id, O.location_id1, O.location_id2
From Orders As O
Except
Select O.order_id, O.location_id1, O.location_id2
From Orders As O
Join Mileage As M
On M.origin_id = O.location_id1
And M.dest_id = O.location_id2

You could left-outer-join to the mileage table, and only return rows that don't join. Like so:
select O.order_id, O.location_id1, O.location_id2
from orders O left outer join mileage M1 on
O.location_id1 = M1.origin_id and
O.location_id2 = M1.dest_id
where M1.origin_id is NULL

If you want to get the pairs that do not exist on mileage table you can do something like
select location_id1, location_id2
from orders
where (select count(*) from mileage
where mileage.origin_id = location_id1 and mileage.dest_id = location_id2) = 0

I thought it was but as Gabe pointed, this does NOT work in SQL-Server 2008:
SELECT order_id
, location_id1
, location_id2
FROM orders
WHERE (location_id1, location_id2) NOT IN
( SELECT origin_id, dest_id
FROM mileage
)
Would this solution with EXCEPT (which actually is a JOIN between your original query and Orders) work fast or horribly? I have no idea.
SELECT o.order_id, o.location_id1, o.location_id2
FROM orders o
JOIN
( SELECT location_id1, location_id2
FROM orders
except
SELECT origin_id, dest_id
FROM mileage
) AS g
ON o.location_id1 = g.location_id1
AND o.location_id2 = g.location_id2

MySQL doesn't support Except. For anyone who comes across this question using MySQL, here's how you do it:
http://nimal.info/blog/2007/intersection-and-set-difference-in-mysql-a-workaround-for-except/

Related

Multi Join Table, Multiple Sums

I've got 3 tables I need to work with:
CREATE TABLE invoices (
id INTEGER,
number VARCHAR(256)
)
CREATE TABLE items (
invoice_id INTEGER,
total DECIMAL
)
CREATE TABLE payments (
invoice_id INTEGER,
total DECIMAL
)
I need a result set along the lines of:
invoices.id
invoices.number
item_total
payment_total
oustanding_balance
00001
i82
42.50
42.50
00.00
00002
i83
89.99
9.99
80.00
I tried
SELECT
invoices.*,
SUM(items.total) AS item_total,
SUM(payments.total) AS payment_total,
SUM(items.total) - SUM(payments.total) AS oustanding_balance
FROM
invoices
LEFT OUTER JOIN items ON items.invoice_id = invoices.id
LEFT OUTER JOIN payments ON payments.invoice_id = invoices.id
GROUP BY
invoices.id
But that fails. The sum for payments ends up wrong since I'm doing 2 joins here and I end up counting payments multiple times.
I ended up with
SELECT
invoices.*,
invoices.item_total - invoices.payment_total AS oustanding_balance
FROM
(
SELECT invoices.*,
(SELECT SUM(items.total FROM items WHERE items.invoice_id = invoices.id) AS item_total,
(SELECT SUM(payments.total FROM payments WHERE payments.invoice_id = invoices.id) AS payment_total
) AS invoices
But ... that feels ugly. Now I've got subqueries going on everywhere. It DOES work, but I'm concerned about performance?
There has to be some good way to do this with joins - I'm sure I'm missing something super obvious?

As you say the sum behavior with multiple joins is normal and working with sub queries (Or CTE for SQl Server) is not a bad practice.
Doing such GOUP BY on an ID and a total in sub queries won't significantly downgrade your performance (depending on your tables sizes).
Another solution could be doing one SUM sub query for each column you need. It would be easier to understand this way I think :
SELECT
invoices.id
, i_total.total as item_total
, p_total.total aspayment_total
, ( i_total.total - p_total.total) as outstanding_balance
FROM
invoices
LEFT JOIN (
SELECT invoice_id, SUM(total) as total FROM items GROUP BY invoice_id
) i_total
ON i_total.invoice_id = invoices.id
LEFT JOIN (
SELECT invoice_id, SUM(total) as total FROM payments GROUP BY invoice_id
) p_total
ON p_total.invoice_id = invoices.id

I think a common table expression (or in this case two CTEs) will give you what you want. You are using something called a scalar, which is precisely speaking not wrong, but as you correctly identified is ugly, hard to read, hard to maintain and can be non-performant in many situations.
CTE essentially take a query and makes it "behave" like a table. We define it once and then we can refer to it later.
with item_data as (
SELECT invoice_id, SUM(total) as item_total
FROM items
group by invoice_id
),
payment_data as (
SELECT invoice_id, SUM(total) as payment_total
FROM payments
group by invoice_id
)
select
i.*,
id.item_total - pd.payment_total as outstanding_balance
from
invoices i
join item_data id on i.invoice_id = id.invoice_id
join payment_data pd on i.invoice_id = pd.invoice_id
Untested, but hopefully you get the idea.

SQL LEFT JOIN - Inner select not returning columns

I have two tables called 'Customers' and 'Orders'. Tables column names are as follow:
Customers: id, name, address
Orders: id, person_id, product, price
The desired outcome is to query all customers with one of their latest purchases. I have a lot of duplicates in 'Orders' table whereby two records with same time-stamp due to some bug.
I have written the following code but the issue is that the query does not return table 2(Orders) column values. Can anyone advise what the issue is?
SELECT C.Id,C.Name, O.item, O.price, O.product
FROM Customers C
LEFT JOIN
(
SELECT TOP 1 person_id
FROM Orders
WHERE status = 'Pending'
) O ON C.ID = O.person_id
Results: O.item, O.price, O.product values are all null
Edit: Sample Data
ID/ NAME/ ADDRESS/
1/ A/ Ad1/
2/ B/ Ad2/
3/ C/ Ad3/
ID/ Person ID/ PRODUCT PRICE/ Created Date
ID-1234/ 1/ Book/ $5/ 26-2-2017
ID-1235/ 1/ Book/ $5/ 26-2-2017
ID-1236/ 2/ Calendar/ $10/ 4-2-2017
ID-1238/ 1/ Pen/ $2/ 1-1-2016

Assuming that the id column in Orders is a primary key autoincrement, then the following should work:
SELECT c.id,
c.name,
COALESCE(t1.price, 0.0) AS price,
COALESCE(t1.product, 'NA') AS product
FROM Customers c
LEFT JOIN Orders t1
ON c.id = t1.person_id
LEFT JOIN
(
SELECT person_id, MAX(CAST(SUBSTRING(id, 4, LEN(id)) AS INT)) AS max_id
FROM Orders
GROUP BY person_id
) t2
ON t1.person_id = t2.person_id AND
t2.max_id = CAST(SUBSTRING(t1.id, 4, LEN(t1.id)) AS INT)
This answer assumes that taking the greatest order ID per customer will yield the most recent purchase. Ideally you should have a timestamp column which captures when a transaction took place. Note that even in the query above, we still have no way of knowing when the most recent transaction took place.

So where is the timestamp column? It's not mentioned in your table schema. But your description does not mention the status column either, and that is clearly in there.
Is orders.id unique? Is it the key for the Orders table?> If it is, then your schema has no way to identify "duplicate" records. You cannot mean to imply that only one order per customer is allowed, so if there are multiple orders for a single customer, how do we identify the duplicates? By the unmentioned timestamp column?
If there IS a `timestamp column, and that's how you would identify dupes, then use it.
SELECT C.Id,C.Name, O.item, O.price, O.product
FROM Customers C LEFT JOIN Orders o
on o.id = (Select Min(id) from orders
where person_id = c.Id
and timestamp = o.timestamp
and status = 'Pending')

Select rows that don't have a corresponding join in join table

I have two SQL tables - customer and widget. There's a join table, customers_widgets between them, that has two columns (customer_id and widget_id)
Is there a way I can select all the customers that aren't joined to a widget? So they have an id that doesn't appear in the customer_id column on the join table?

In general I've found NOT IN to be expensive and slow, but your mileage may vary on different RDBMS.
The two alternatives that I most often use are:
SELECT
*
FROM
customer
WHERE
NOT EXISTS (SELECT *
FROM customers_widgets
WHERE customers_widgets.customer_id = customer.customer_id
)
And...
SELECT
customer.*
FROM
customer
LEFT JOIN
customers_widgets
ON customers_widgets.customer_id = customer.customer_id
WHERE
customer_widgets.customer_id IS NULL

Try this:
SELECT customer_id
FROM customer
WHERE customer_id NOT IN (SELECT customer_id
FROM customers_widgets)

You can use an OUTER JOIN for this:
Select C.*
From customer C
Left Join customer_widgets W On C.customer_id = W.customer_id
Where W.customer_id Is Null

Using SQL query to find details of customers who ordered > x types of products

Please note that I have seen a similar query here, but think my query is different enough to merit a separate question.
Suppose that there is a database with the following tables:
customer_table with customer_ID (key field), customer_name
orders_table with order_ID (key field), customer_ID, product_ID
Now suppose I would like to find the names of all the customers who have ordered more than 10 different types of product, and the number of types of products they ordered. Multiple orders of the same product does not count.
I think the query below should work, but have the following questions:
Is the use of count(distinct xxx) generally allowed with a "group by" statement?
Is the method I use the standard way? Does anybody have any better ideas (e.g. without involving temporary tables)?
Below is my query
select T1.customer_name, T1.customer_ID, T2.number_of_products_ordered
from customer_table T1
inner join
(
select cust.customer_ID as customer_identity, count(distinct ord.product_ID) as number_of_products_ordered
from customer_table cust
inner join order_table ord on cust.customer_ID=ord.customer_ID
group by ord.customer_ID, ord.product_ID
having count(distinct ord.product_ID) > 10
) T2
on T1.customer_ID=T2.customer_identity
order by T2.number_of_products_ordered, T1.customer_name

Isn't that what you are looking for? Seems to be a little bit simpler. Tested it on SQL Server - works fine.
SELECT customer_name, COUNT(DISTINCT product_ID) as products_count FROM customer_table
INNER JOIN orders_table ON customer_table.customer_ID = orders_table.customer_ID
GROUP BY customer_table.customer_ID, customer_name
HAVING COUNT(DISTINCT product_ID) > 10

You could do it more simply:
select
c.id,
c.cname,
count(distinct o.pid) as `uniques`
from o join c
on c.id = o.cid
group by c.id
having `uniques` > 10

SQL query for join with condition

I have these two tables:
Customers: Id, Name
Orders: Id, CustomerId, Time, Status
I want to get a list of customers for which the LAST order does not have a status of 'Wrong'.
I know how to use a LEFT JOIN to get a count of orders for each customer, but I don't know how I can use this statement for what I want. Maybe a JOIN is not the right thing to use too, I'm not sure.
It's possible that customers do not have any order, and they should be returned.
I'm abstracting the real tables here, but the scenario is for a windows phone app sending notifications. I want to get all clients for which their last notification does not have a 'Dropped' status. I can sort their notifications (orders) by the 'Time' field. Thanks for the help, while I continue experimenting with subqueries in the where clause.

Select ...
From Customers As C
Where Not Exists (
Select 1
From Orders As O1
Join (
Select O2.CustomerId, Max( O2.Time ) As Time
From Orders As O2
Group By O2.CustomerId
) As LastOrderTime
On LastOrderTime.CustomerId = O1.CustomerId
And LastOrderTime.Time = O1.Time
Where O1.Status = 'Dropped'
And O1.CustomerId = C.Id
)
There are obviously alternatives based on the actual database product and version. For example, in SQL Server one could use the TOP command or a CTE perhaps. However, without knowing what specific product is being used, the above solution should produce the results you want in almost any database product.
Addition
If you were using a product that supported ranking functions (which database product and version isn't mentioned) and common-table expressions, then an alternative solution might be something like so:
With RankedOrders As
(
Select O.CustomerId, O.Status
, Row_Number() Over( Partition By CustomerId Order By Time Desc ) As Rnk
From Orders As O
)
Select ...
From Customers
Where Not Exists (
Select 1
From RankedOrders As O1
Where O1.CustomerId = C.Id
And O1.Rnk = 1
And O1.Status = 'Dropped'
)

Assuming Last order refers to the Time column here is my query:
SELECT C.Id,
C.Name,
MAX(O.Time)
FROM
Customers C
INNER JOIN Orders O
ON C.Id = O.CustomerId
WHERE
O.Status != 'Wrong'
GROUP BY C.Id,
C.Name
EDIT:
Regarding your table configuration. You should really consider revising the structure to include a third table. They would look like this:
Customer
CustomerId | Name
Order
OrderId | Status | Time
CompletedOrders
CoId | CustomerId | OrderId
Now what you do is store the info about a customer or order in their respective tables ... then when an order is made you just create a CompletedOrders entry with the ids of the 2 individual records. This will allow for a 1 to Many relationship between customer and orders.

Didn't check it out, but something like this?
SELECT c.CustmerId, c.Name, MAX(o.Time)
FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.CustomerId
WHERE o.Status <> 'Wrong'
GROUP BY c.CustomerId, C.Name

You can get list of customers with the LAST order which has status of 'Wrong' with something like
select customerId from orders where status='Wrong'
group by customerId
having time=max(time)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding pairs that do not exist in a different table - sql

You could left-outer-join to the mileage table, and only return rows that don't join. Like so: select O.order_id, O.location_id1, O.location_id2 from orders O left outer join mileage M1 on O.location_id1 = M1.origin_id and O.location_id2 = M1.dest_id where M1.origin_id is NULL

If you want to get the pairs that do not exist on mileage table you can do something like select location_id1, location_id2 from orders where (select count(*) from mileage where mileage.origin_id = location_id1 and mileage.dest_id = location_id2) = 0

MySQL doesn't support Except. For anyone who comes across this question using MySQL, here's how you do it: http://nimal.info/blog/2007/intersection-and-set-difference-in-mysql-a-workaround-for-except/

Related

Multi Join Table, Multiple Sums

SQL LEFT JOIN - Inner select not returning columns

Select rows that don't have a corresponding join in join table

Using SQL query to find details of customers who ordered > x types of products

SQL query for join with condition

Categories

Resources