T-SQL "not in (select " not working (as expected) - sql

I have an ordinary one-to-many relation:
customer.id = order.customerid
I want to find customers who have no associated orders.
I tried:
-- one record
select * from customers where id = 123
-- no records
select * from orders where customerid = 123
-- NO RECORDS
select *
from customers
where id not in (select customerid from orders)
-- many records, as expected.
select *
from customers
where not exist (select customerid from orders
where customers.customerid = customer.id)
Am I mistaken, or should it work?

NOT IN does not behave as expected when the in-list contains NULL values.
In fact, if any values are NULL, then no rows are returned at all. Remember: In SQL, NULL means "indeterminate" value, not "missing value". So, if the list contains any NULL value then it might be equal to a comparison value.
So, customerid must be NULL in the orders table.
For this reason, I strongly recommend that you always use NOT EXISTS with a subquery rather than NOT IN.

I normally do this via a left join that looks for nulls created when the join fails:
SELECT c.*
FROM
customers c
LEFT JOIN orders o
ON c.id = o.customerid
WHERE
o.customerid IS NULL
Left join treats the customer table as "solid" and connects orders to it where there is an order with a given customer id and puts nulls where there isn't any matching order, hence the orders side of the relationship has "holes" in the data. By then saying we only want to see the holes (via the where clause), we get a list of "customers with no orders"
As per the comments I've always worked to the rule "do not use IN for lists longer than you'd be prepared to write by hand" but increasingly optimisers are rewriting IN, EXISTS and LEFT JOIN WHERE NULL queries to function identically as they're all recognised patterns of "data in A that has no matching data in B"

Related

How to put conditions on left joins

I have two tables, CustomerCost and Products that look like the following:
I am joining the two tables using the following SQL query:
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod ON Cost.productId =prod.productId
WHERE prod.productId=4
AND (Cost.Customer_Id =2717
OR Cost.Customer_Id IS NULL)
The result of the join is:
joins result
What i want to do is when I pass customerId 2717 it should return only specific customer cost i.e. 258.93, and when customerId does not match then only it should take cost as 312.50
What am I doing wrong here?
You can get your expected output as follows:
SELECT Cost.ProductId,
Cost.CustomerCost
FROM CUSTOMERCOST Cost
INNER JOIN PRODUCTS prod ON Cost.productId = prod.productId
WHERE prod.productId=4
AND Cost.Customer_Id = 2717
However, if you want to allow customer ID to be passed as NULL, you will have to change the last line to AND Cost.Customer_Id IS NULL. To do so dynamically, you'll need to use variables and generate the query based on the input.
The problem in the original query that you have posted is that you have used an alias called custCost which is not present in the query.
EDIT: Actually, you don't even need a join. The CUSTOMERCOST table seems to have both Customer and Product IDs.
You can simply:
SELECT
Cost.ProductId, Cost.CustomerCost
FROM
CUSTOMERCOST Cost
WHERE
Cost.Customer_Id = 2717
AND Cost.productId = 4
You seem to want:
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id = 2717
UNION ALL
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id IS NULL AND
NOT EXISTS (SELECT 1 FROM CUSTOMERCOST c2 WHERE c2.productId = 4 AND c2.Customer_Id = 2717);
That is, take the matching cost, if it exists for the customer. Otherwise, take the default cost.
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod
ON Cost.productId =prod.productId
AND (Cost.Customer_Id =2717 OR Cost.Customer_Id IS NULL)
WHERE prod.productId=4
WHERE applies to the joined row. ON controls the join condition.
Outer joins are why FROM and ON were added to SQL-92. The old SQL-89
syntax had no support for them, and different vendors added different,
incompatible syntax to support them.

How to select joined rows even if there is no match?

I checked so many similar questions but none apply to Firebird I guess.
I have two tables; one stores the customer information and the second stores the stock activities (which also includes orders). I'd like to fetch all customers and the counts of orders they have made. But no matter how I join the orders table; I end up with only the customers that have at least one order. That means customers who don't have a match in the stock activities table won't show up in the result set.
Here is the query I run;
SELECT
C.NAME, C.GROUPNAME, C.EMAIL,
COALESCE(COUNT(DISTINCT S.ORDERNO), '0') AS TOTALORDERS,
COALESCE(SUM(S.AMOUNT), '0') as TOTALREVENUE
FROM CUSTOMERS C
LEFT OUTER JOIN STOCK_ACTIVITY S ON C.ID = S.CUSTOMERID
WHERE C.GROUPNAME = 'B'
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
GROUP BY C.NAME, C.GROUPNAME, C.EMAIL
Without the join, I get 570 rows (of customers) and it's the correct result set. When I join the orders table to fetch the total order amount of these customers; I get only 379 results; which are the ones having at least one order. That means customers who don't have orders won't return. As you might have guessed; I want to have the customers having zero activity to return "0" as order amount and revenue.
The problem is that your WHERE clause filters on the "right hand" table's values.
WHERE ...
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
When the outer join generates records for "unmatched" rows from the left table, it supplies NULL values for all columns from the right table. So S.TYPE is NULL for those records.
There are two possible solutions:
Explicitly allow for the "NULL record" case in your WHERE logic.
By some standards this might be "more pure" in separating join conditions from filters, but it can get fairly complicated (and hence error-prone). One issue to be aware of is that you may have to distinguish generated NULL records from "real" records of the right table that just happen to have some NULL data.
Testing for the right table's value for the join key to be NULL should be reasonably safe. You could test for the right table's PK value to be NULL (assuming you have a true PK on that table).
Move the predicate from the WHERE clause to the outer join's ON clause.
This is very simple, and looks like
SELECT C.NAME, C.GROUPNAME, C.EMAIL,
COALESCE(COUNT(DISTINCT S.ORDERNO), '0') AS TOTALORDERS,
COALESCE(SUM(S.AMOUNT), '0') as TOTALREVENUE
FROM CUSTOMERS C
LEFT OUTER JOIN STOCK_ACTIVITY S
ON C.ID = S.CUSTOMERID
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
WHERE C.GROUPNAME = 'B'
GROUP BY C.NAME, C.GROUPNAME, C.EMAIL
This effectively filters the STOCK_ACTIVITY records presented to the join before attempting to match them against CUSTOMERS records (meaning the NULL records can still be generated without interference). ("Effectively" because it's folly to talk like you know what steps the DBMS will follow; all we can say is this has the same effect that you'd get by following certain steps...)
If there is no STOCK_ACTIVITY for a CUSTOMER a line full of NULLs will be attached. This also means that the WHERE statement AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE') never can be true for those lines.
Keep the aggregate operation separated from the JOIN. That is the cleanest. First do the grouping then join the additional information.

SQL Server WHERE NOT EXISTS not working

I have the 3 below statements,
Selects the Order Numbers that dont exist
select Orders.OrderNumber
FROM Orders
inner join InvoiceControl on Orders.OrderNumber = InvoiceControl.OrderNumber
where not exists (select OrderNumber from Orders where InvoiceControl.OrderNumber = Orders.OrderNumber)
Selects a specific Order number that does not exist
select OrderNumber from Orders where OrderNumber = 987654
Selects the specific Order Number in the corresponding table that does not exist
select OrderNumber from InvoiceControl where OrderNumber = 987654
these 3 queries work in other scenarios regarding other tables but not this one, have I made an obvious mistake anywhere? below is the query ran and the outputs:
the idea behind this is to locate the OrderNumbers that do not exist in the InvoiceControl, based on the OrderNumbers in the Orders Tabl, so the top query would also return the value 987654 as this value has not yet been included in the InvoiceControl Table as this could be a new Order without an Invoice
Because your INNER JOIN will already create all correspondents between Orders.OrderNumber = InvoiceControl.OrderNumber.
After this result set is built, you actually filter out everything based on the condition in your WHERE.
where not exists (select OrderNumber from Orders where InvoiceControl.OrderNumber = Orders.OrderNumber)
Hypothetically, if you'd have just 987654 in your Orders table and you'd have a Correspondent in your InvoiceControl table, then the following query, without your WHERE clause
select Orders.OrderNumber
FROM Orders
inner join InvoiceControl on Orders.OrderNumber = InvoiceControl.OrderNumber
would return:
OrderNumber
987654
Then, by applying your where not exists (select OrderNumber from Orders where InvoiceControl.OrderNumber = Orders.OrderNumber) condition, you'd be looking for all records that do not have a correspondent (but you already have all possible correspondents between your two tables, based on your INNER JOIN).
Thus, your result will be:
OrderNumber
In the first query, you first asked for rows in both Orders and InvoiceControl (by way of the FROM and JOIN tables), and then you added in your WHERE clause a request to exclude all rows that exist in Orders. Since your starting set only includes rows that are in Orders, if you ask for all of those rows to be excluded, you will get no results back.
If what you're looking for is to find all the ordernumbers in tbl Orders and not in tbl InvoiceControl.
Then I would try this instead.
Select O.Ordernumbers from Orders O
Left Join Invoicecontrol I
On O.Ordernumbers = I.Ordernumbers
Where I.Ordernumbers is null

WHERE using a temporary table

I have three tables: customers, orders and refunds. Some customers (not all) placed orders and for some orders (not all) there were refunds.
When I join the three tables like this (the details are not that important):
SELECT ...
FROM customers LEFT JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
//WHERE order_id IS NOT NULL;// uncomment to filter out customers that have no orders
I get a big table in which all customers are listed (even the ones that have not placed any orders and they have NULL in the 'order_id' column), with all their orders and the orders' refunds (even if not all orders have refunds):
NAME ORDER_ID ORDER AMOUNT REFUND
------------------------------------------------------------
Natalie 2 12.50 NULL
Natalie 3 18.00 18.00
Brenda 4 20.00 NULL
Adam NULL NULL NULL
Since I only want to see only customers that have placed orders, i.e in this case I want to filter Adam from the table, I uncomment the 'WHERE' row from the SQL query above.
This yields the desired result.
My question is:
On which table is the WHERE executed - on the original 'orders' table (which has no order_id that is NULL) or on the table that is result of the JOINs?
Apparently it is the latter, but just want to make sure, since it is not very obvious from the SQL syntax and it is a very important point.
Thank you
In this case, you're making SQL work harder than it has to. It is operating on the results (likely a MERGE event, or something along those lines).
There's a chance SQL is realizing what you're doing and optimizing the plan and changing to an INNER JOIN for you. But I can't be certain (and neither can SQL -- it can change how it optimizes over time).
In the case where you only want where an order is there, use an INNER JOIN instead. SQL will be much more efficient at this.
SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
You can change the LEFT JOIN as INNER JOIN to eliminate customers which don't have any order
SELECT ...
FROM customers INNER JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
It's because you're using LEFT JOIN, which will return all rows from the left hand table, in your case this is the Customer Table, and return NULL where no corresponding values appear in the right hand tables.
Just rewrite it using inner joins, so only rows where matching data is found will be returned.
SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id
INNER JOIN refunds
ON orders.order_id=refunds.order_id;

What is the most efficient way to write a select statement with a "not in" subquery?

What is the most efficient way to write a select statement similar to the below.
SELECT *
FROM Orders
WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders)
The gist is you want the records from one table when the item is not in another table.
For starters, a link to an old article in my blog on how NOT IN predicate works in SQL Server (and in other systems too):
Counting missing rows: SQL Server
You can rewrite it as follows:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
, however, most databases will treat these queries the same.
Both these queries will use some kind of an ANTI JOIN.
This is useful for SQL Server if you want to check two or more columns, since SQL Server does not support this syntax:
SELECT *
FROM Orders o
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM HeldOrders ho
)
Note, however, that NOT IN may be tricky due to the way it treats NULL values.
If Held.Orders is nullable, no records are found and the subquery returns but a single NULL, the whole query will return nothing (both IN and NOT IN will evaluate to NULL in this case).
Consider these data:
Orders:
OrderID
---
1
HeldOrders:
OrderID
---
2
NULL
This query:
SELECT *
FROM Orders o
WHERE OrderID NOT IN
(
SELECT OrderID
FROM HeldOrders ho
)
will return nothing, which is probably not what you'd expect.
However, this one:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
will return the row with OrderID = 1.
Note that LEFT JOIN solutions proposed by others is far from being a most efficient solution.
This query:
SELECT *
FROM Orders o
LEFT JOIN
HeldOrders ho
ON ho.OrderID = o.OrderID
WHERE ho.OrderID IS NULL
will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius
An ANTI JOIN method used by both IN and EXISTS will just need to make sure that a record does not exists once per each row in Orders, so it will eliminate all possible duplicates first:
NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
A HASH ANTI JOIN will eliminate duplicates when building the hash table.
"Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.
There are three ways I commonly use to accomplish what you want, depending on the situation.
1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.
2. Another method is the "correlated subquery" which is a slight variation of what you have...
SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID
FROM HeldOrders h
where h.order_id = o.order_id)
Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.
3. Another method I use sometimes is left outer join...
SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null
When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.
Each of these variations can work more or less efficiently in various scenarios.
You can use a LEFT OUTER JOIN and check for NULL on the right table.
SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL
I'm not sure what is the most efficient, but other options are:
1. Use EXISTS
SELECT *
FROM ORDERS O
WHERE NOT EXISTS (SELECT 1
FROM HeldOrders HO
WHERE O.Order_ID = HO.OrderID)
2. Use EXCEPT
SELECT O.Order_ID
FROM ORDERS O
EXCEPT
SELECT HO.Order_ID
FROM HeldOrders
Try
SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL