How to select joined rows even if there is no match? - sql

I checked so many similar questions but none apply to Firebird I guess.
I have two tables; one stores the customer information and the second stores the stock activities (which also includes orders). I'd like to fetch all customers and the counts of orders they have made. But no matter how I join the orders table; I end up with only the customers that have at least one order. That means customers who don't have a match in the stock activities table won't show up in the result set.
Here is the query I run;
SELECT
C.NAME, C.GROUPNAME, C.EMAIL,
COALESCE(COUNT(DISTINCT S.ORDERNO), '0') AS TOTALORDERS,
COALESCE(SUM(S.AMOUNT), '0') as TOTALREVENUE
FROM CUSTOMERS C
LEFT OUTER JOIN STOCK_ACTIVITY S ON C.ID = S.CUSTOMERID
WHERE C.GROUPNAME = 'B'
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
GROUP BY C.NAME, C.GROUPNAME, C.EMAIL
Without the join, I get 570 rows (of customers) and it's the correct result set. When I join the orders table to fetch the total order amount of these customers; I get only 379 results; which are the ones having at least one order. That means customers who don't have orders won't return. As you might have guessed; I want to have the customers having zero activity to return "0" as order amount and revenue.

The problem is that your WHERE clause filters on the "right hand" table's values.
WHERE ...
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
When the outer join generates records for "unmatched" rows from the left table, it supplies NULL values for all columns from the right table. So S.TYPE is NULL for those records.
There are two possible solutions:
Explicitly allow for the "NULL record" case in your WHERE logic.
By some standards this might be "more pure" in separating join conditions from filters, but it can get fairly complicated (and hence error-prone). One issue to be aware of is that you may have to distinguish generated NULL records from "real" records of the right table that just happen to have some NULL data.
Testing for the right table's value for the join key to be NULL should be reasonably safe. You could test for the right table's PK value to be NULL (assuming you have a true PK on that table).
Move the predicate from the WHERE clause to the outer join's ON clause.
This is very simple, and looks like
SELECT C.NAME, C.GROUPNAME, C.EMAIL,
COALESCE(COUNT(DISTINCT S.ORDERNO), '0') AS TOTALORDERS,
COALESCE(SUM(S.AMOUNT), '0') as TOTALREVENUE
FROM CUSTOMERS C
LEFT OUTER JOIN STOCK_ACTIVITY S
ON C.ID = S.CUSTOMERID
AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE')
WHERE C.GROUPNAME = 'B'
GROUP BY C.NAME, C.GROUPNAME, C.EMAIL
This effectively filters the STOCK_ACTIVITY records presented to the join before attempting to match them against CUSTOMERS records (meaning the NULL records can still be generated without interference). ("Effectively" because it's folly to talk like you know what steps the DBMS will follow; all we can say is this has the same effect that you'd get by following certain steps...)

If there is no STOCK_ACTIVITY for a CUSTOMER a line full of NULLs will be attached. This also means that the WHERE statement AND (S.TYPE = 'RECEIPT' OR S.TYPE = 'INVOICE') never can be true for those lines.

Keep the aggregate operation separated from the JOIN. That is the cleanest. First do the grouping then join the additional information.

Related

T-SQL "not in (select " not working (as expected)

I have an ordinary one-to-many relation:
customer.id = order.customerid
I want to find customers who have no associated orders.
I tried:
-- one record
select * from customers where id = 123
-- no records
select * from orders where customerid = 123
-- NO RECORDS
select *
from customers
where id not in (select customerid from orders)
-- many records, as expected.
select *
from customers
where not exist (select customerid from orders
where customers.customerid = customer.id)
Am I mistaken, or should it work?
NOT IN does not behave as expected when the in-list contains NULL values.
In fact, if any values are NULL, then no rows are returned at all. Remember: In SQL, NULL means "indeterminate" value, not "missing value". So, if the list contains any NULL value then it might be equal to a comparison value.
So, customerid must be NULL in the orders table.
For this reason, I strongly recommend that you always use NOT EXISTS with a subquery rather than NOT IN.
I normally do this via a left join that looks for nulls created when the join fails:
SELECT c.*
FROM
customers c
LEFT JOIN orders o
ON c.id = o.customerid
WHERE
o.customerid IS NULL
Left join treats the customer table as "solid" and connects orders to it where there is an order with a given customer id and puts nulls where there isn't any matching order, hence the orders side of the relationship has "holes" in the data. By then saying we only want to see the holes (via the where clause), we get a list of "customers with no orders"
As per the comments I've always worked to the rule "do not use IN for lists longer than you'd be prepared to write by hand" but increasingly optimisers are rewriting IN, EXISTS and LEFT JOIN WHERE NULL queries to function identically as they're all recognised patterns of "data in A that has no matching data in B"

2 Tables - one customer, one transactions. How to handle a customer with no transaction?

I have 2 tables-one customers, one transactions. One customer does not have any transactions. How do I handle that? As I'm trying to join my tables, the customer with no transaction does not show up as shown in code below.
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName, Orders.TxnDate, Orders.Amount
FROM Orders
INNER JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
I need to be able to account for the customer with no transaction such as querying for least transaction amount.
Use below updated query - Right Outer join is used instead of Inner join to show all customers regardless of the customer placed an order yet.
SELECT Orders.Customer_Id, Customers.AcctOpenDate,
Customers.CustomerFirstName, Customers.CustomerLastName,
Orders.TxnDate, Orders.Amount
FROM Orders
Right Outer JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
INNER Joins show only those records that are present in BOTH tables
OUTER joins gets SQL to list all the records present in the designated table and shows NULLs for the fields in the other table that are not present
LEFT OUTER JOIN (the first table)
RIGHT OUTER JOIN (the second table)
FULL OUTER JOIN (all records for both tables)
Get up to speed on the join types and how to handle NULLS and that is 90% of writing SQL script.
Below is the same query with a left join and using ISNULL to turn the amount column into 0 if it has no records present
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName
, Orders.TxnDate, ISNULL(Orders.Amount,0)
FROM Customers
LEFT OUTER JOIN Orders ON Orders.Customer_Id=Customers.Customer_Id;
try this :
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName, Orders.TxnDate, Orders.Amount
FROM Orders
Right OUTER JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
I strongly recommend LEFT JOIN. This keeps all rows in the first table, along with matching columns in the second. If there are no matching rows, these columns are NULL:
SELECT c.Customer_Id, c.AcctOpenDate, c.CustomerFirstName, c.CustomerLastName,
o.TxnDate, o.Amount
FROM Customers c LEFT JOIN
Orders o
ON o.Customer_Id = c.Customer_Id;
Although you could use RIGHT JOIN, I never use RIGHT JOINs, because I find them much harder to follow. The logic of "keep all rows in the first table I read" is relatively simple. The logic of "I don't know which rows I'm keeping until I read the last table" is harder to follow.
Also note that I included table aliases and change the CustomerId to come from customers -- the table where you are keeping all rows.
Using CASE will replace "null" with 0 then you can sum the values. This will count customers with no transactions.
SELECT c.Name,
SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END) as TransactionsPerCustomer
FROM Customers c
LEFT JOIN Transactions t
ON c.Name = t.customerID
group by c.Name
SELECT c.Name,
SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END) as numberoftransaction
FROM customers c
LEFT JOIN transactions t
ON c.Name = t.customerID
group by c.Name

How to put conditions on left joins

I have two tables, CustomerCost and Products that look like the following:
I am joining the two tables using the following SQL query:
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod ON Cost.productId =prod.productId
WHERE prod.productId=4
AND (Cost.Customer_Id =2717
OR Cost.Customer_Id IS NULL)
The result of the join is:
joins result
What i want to do is when I pass customerId 2717 it should return only specific customer cost i.e. 258.93, and when customerId does not match then only it should take cost as 312.50
What am I doing wrong here?
You can get your expected output as follows:
SELECT Cost.ProductId,
Cost.CustomerCost
FROM CUSTOMERCOST Cost
INNER JOIN PRODUCTS prod ON Cost.productId = prod.productId
WHERE prod.productId=4
AND Cost.Customer_Id = 2717
However, if you want to allow customer ID to be passed as NULL, you will have to change the last line to AND Cost.Customer_Id IS NULL. To do so dynamically, you'll need to use variables and generate the query based on the input.
The problem in the original query that you have posted is that you have used an alias called custCost which is not present in the query.
EDIT: Actually, you don't even need a join. The CUSTOMERCOST table seems to have both Customer and Product IDs.
You can simply:
SELECT
Cost.ProductId, Cost.CustomerCost
FROM
CUSTOMERCOST Cost
WHERE
Cost.Customer_Id = 2717
AND Cost.productId = 4
You seem to want:
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id = 2717
UNION ALL
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id IS NULL AND
NOT EXISTS (SELECT 1 FROM CUSTOMERCOST c2 WHERE c2.productId = 4 AND c2.Customer_Id = 2717);
That is, take the matching cost, if it exists for the customer. Otherwise, take the default cost.
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod
ON Cost.productId =prod.productId
AND (Cost.Customer_Id =2717 OR Cost.Customer_Id IS NULL)
WHERE prod.productId=4
WHERE applies to the joined row. ON controls the join condition.
Outer joins are why FROM and ON were added to SQL-92. The old SQL-89
syntax had no support for them, and different vendors added different,
incompatible syntax to support them.

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Excluding multiple results in specific column (SQL JOIN)

I'm taking my first steps in terms of practical SQL use in real life.
I have a few tables with contractual and financial information and the query works exactly as I need - to a certain point. It looks more or less like that:
SELECT /some columns/ from CONTRACTS
Linked 3 extra tables with INNER JOIN to add things like department names, product information etc. This all works but they all have simplish one-to-one relationship (one contract related to single department in Department table, one product information entry in the corresponding table etc).
Now this is my challenge:
I also need to add contract invoicing information doing something like:
inner join INVOICES on CONTRACTS.contnoC = INVOICES.contnoI
(and selecting also the Invoice number linked to the Contract number, although that's partly optional)
The problem I'm facing is that unlike with other tables where there's always one-to-one relationship when joining tables, INVOICES table can have multiple (or none at all) entries that correspond to a single contract no. The result is that I will get multiple query results for a single contract no (with different invoice numbers presented), needlessly crowding the query results.
Essentially I'm looking to add INVOICES table to a query to just identify if the contract no is present in the INVOICES table (contract has been invoiced or not). Invoice number itself could be presented (it is with INNER JOIN), however it's not critical as long it's somehow marked. Invoice number fields remains blank in the result with the INNER JOIN function, which is also necessary (i.e. to have the row presented even if the match is not found in INVOICES table).
SELECT DISTINCT would look to do what I need, but I seemed to face the problem that I need to levy DISTINCT criteria only for column representing contract numbers, NOT any other column (there can be same values presented, but all those should be presented).
Unfortunately I'm not totally aware of what database system I am using.
Seems like the question is still getting some attention and in an effort to provide some explanation here are a few techniques.
If you just want any contract with details from the 1 to 1 tables you can do it similarily to what you have described. the key being NOT to include any column from Invoices table in the column list.
SELECT
DISTINCT Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
INNER JOIN Invoices i
ON c.contnoC = i.contnoI
Perhaps a Little cleaner would be to use IN or EXISTS like so:
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
EXISTS (SELECT 1 FROM Invoices i WHERE i.contnoI = c.contnoC )
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
contnoC IN (SELECT contnoI FROM Invoices)
Don't use IN if the SELECT ... list can return a NULL!!!
If you Actually want all of the contracts and just know if a contract has been invoiced you can use aggregation and a case expression:
SELECT
Contract, Department, ProductId, CASE WHEN COUNT(i.contnoI) = 0 THEN 0 ELSE 1 END as Invoiced
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
LEFT JOIN Invoices i
ON c.contnoC = i.contnoI
GROUP BY
Contract, Department, ProductId
Then if you actually want to return details about a particular invoice you can use a technique similar to that of cybercentic87 if your RDBMS supports or you could use a calculated column with TOP or LIMIT depending on your system.
SELECT
Contract, Department, ProductId, (SELECT TOP 1 InvoiceNo FROM invoices i WHERE c.contnoC = i.contnoI ORDER BY CreateDate DESC) as LastestInvoiceNo
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
GROUP BY
Contract, Department, ProductId
I would do it this way:
with mainquery as(
<<here goes you main query>>
),
invoices_rn as(
select *,
ROW_NUMBER() OVER (PARTITION BY contnoI order by
<<some column to decide which invoice you want to take eg. date>>) as rn
)
invoices as (
select * from invoices_rn where rn = 1
)
select * from mainquery
left join invoices i on contnoC = i.contnoI
This gives you an ability to get all of the invoice details to your query, also it gives you full control of which invoice you want see in your main query. Please read more about CTEs; they are pretty handy and much easier to understand / read than nested selects.
I still don't know what database you are using. If ROW_NUMBER is not available, I will figure out something else :)
Also with a left join you should use COALESCE function for example:
COALESCE(i.invoice_number,'0')
Of course this gives you some more possibilities, you could for example in your main select do:
CASE WHEN i.invoicenumber is null then 'NOT INVOICED'
else 'INVOICED'
END as isInvoiced
You can use
SELECT ..., invoiced = 'YES' ... where exists ...
union
SELECT ..., invoiced = 'NO' ... where not exists ...
or you can use a column like "invoiced" with a subquery into invoices to set it's value depending on whether you get a hit or not