I often get asked the questions in an interview that "what is an outer join in SQL"?
While it can be answered, I wonder what might be some classic and good real life examples where a (LEFT) OUTER JOIN is used?
In the Northwind database on the Customers and Orders table.
Doing an inner join will only give you customers that have placed orders.
Doing an outer join will get all customers and orders for customers that have placed orders.
To add to Robin Day's answer, you can also use a Left Outer Join to grab only customers who have NOT placed orders by checking for NULL.
SELECT *
FROM Customer
LEFT OUTER JOIN Order
ON Customer.CustomerId = Order.CustomerId
WHERE Order.CustomerId IS NULL
Following is the visual represntation of the left outer join
SELECT <select_list>
FROM Table_A A
LEFT JOIN Table_B B
ON A.Key = B.Key
read more about joins in the below article
http://www.codeproject.com/KB/database/Visual_SQL_Joins.aspx ( one of the best article must read )
A LEFT OUTER JOIN can be used when you want all records from one table, as well as records from another table if any.
E.g., given table User and Address, where Address has a FK to User and there could be 0 or more addresses per user:
select *
from User u
left outer join Address a on u.UserID = a.UserID
This will ensure you get all User records, regardless of whether there was a corresponding Address record or not.
If you want to show all Users that do not have addresses, you can do this:
select *
from User u
left outer join Address a on u.UserID = a.UserID
where a.UserID is null
Classic example is cutomers and orders. Some customers have orders and others do not. You want to show a list of customers with total sales. So you do a left outer join from the customer to the order and get:
Customer A: $100;
Customer B: $0;
Customer C: $500
instead of:
Customer A: $100;
Customer C: $500
Here is an example:
I need a list of all customers, with their vouchers, I also need the customers that never used vouchers.
SELECT *
FROM Customer
LEFT OUTER JOIN Voucher
ON Customer.CustomerId = Voucher.CustomerId
Get a list of all customers including any details of orders they have made. Some customers may not have made orders and therefore an INNER JOIN would exclude them from this list.
SELECT
*
FROM
Customer
LEFT OUTER JOIN
Order
ON
Customer.CustomerId = Order.CustomerId
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 months ago.
Improve this question
I need to do a query across 3 tables to pull all the data through, here is what I have so far and not sure how to tie it all together. I basically want to end up knowing the current balance and amount for each account and the related email address.
SELECT account.account.AccountId, account.account.CustomerId, account.account.RentAmount, account.deposit.Balance, customer.customer.EmailAddress
FROM account.account
INNER JOIN account.deposit ON account.account.CustomerId=customer.customer.CustomerId
INNER JOIN customer.customer ON account.account.AccountId=account.deposit.AccountId
--Output Needed:
--Rent Amount
--Balance
--Email Address
It seems you are just beginning with SQL. It doesn't matter whether you select from one table or more, there is just one select cause. List all columns you want to show and use the table names to qualify the columns, so the DBMS knows what column of what table you are talking of, e.g.
SELECT account.AccountId, account.CustomerId, sales.CurrentBalance
In a join you specify on which conditions you want to join the tables. You are confusing two syntaxes here. One option is to explicitly name the condition in an ON clause:
FROM account.account
INNER JOIN account.sales ON sales.AccountId = account.AccountId
Another option is to list the columns you want to be equal in the joined table rows:
FROM account.account
INNER JOIN account.sales USING (AccountId)
This second option, however, is not available in SQL Server. SQL Server doesn't feature the standard SQL USING clause.
So, the basic join you are looking for is this:
FROM account.account
INNER JOIN account.sales ON sales.AccountId = account.AccountId
INNER JOIN customer.customer ON customer.CustomerId = account.CustomerId
This gets more readable with table aliases:
FROM account.account a
INNER JOIN account.sales s ON s.accountid = a.accountid
INNER JOIN customer.customer c ON c.customerid = a.customerid
But then there my be accounts without any sales yet. In that case, if we wanted to see the accounts in our results despite the missing matching sales rows, we'd have to use an outer join:
FROM account.account a
LEFT OUTER JOIN account.sales s ON s.accountid = a.accountid
INNER JOIN customer.customer c ON c.customerid = a.customerid
This will give us all acounts with all their sales joined. Now you could do this and then aggregate your resulting rows such that you end up with one row per account. It is generally better, though, to join what you want to join in the first place. In your case you want to join sale sums (?) to the accounts. Something along the lines of
SELECT a.accountid, s.total_amount, c.emailaddress
FROM account.account a
LEFT OUTER JOIN
(
SELECT accountid, SUM(amount) AS total_amount
FROM account.sales
GROUP BY accountid
) s ON s.accountid = a.accountid
INNER JOIN customer.customer c ON c.customerid = a.customerid
ORDER BY a.accountid;
I have 2 tables-one customers, one transactions. One customer does not have any transactions. How do I handle that? As I'm trying to join my tables, the customer with no transaction does not show up as shown in code below.
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName, Orders.TxnDate, Orders.Amount
FROM Orders
INNER JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
I need to be able to account for the customer with no transaction such as querying for least transaction amount.
Use below updated query - Right Outer join is used instead of Inner join to show all customers regardless of the customer placed an order yet.
SELECT Orders.Customer_Id, Customers.AcctOpenDate,
Customers.CustomerFirstName, Customers.CustomerLastName,
Orders.TxnDate, Orders.Amount
FROM Orders
Right Outer JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
INNER Joins show only those records that are present in BOTH tables
OUTER joins gets SQL to list all the records present in the designated table and shows NULLs for the fields in the other table that are not present
LEFT OUTER JOIN (the first table)
RIGHT OUTER JOIN (the second table)
FULL OUTER JOIN (all records for both tables)
Get up to speed on the join types and how to handle NULLS and that is 90% of writing SQL script.
Below is the same query with a left join and using ISNULL to turn the amount column into 0 if it has no records present
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName
, Orders.TxnDate, ISNULL(Orders.Amount,0)
FROM Customers
LEFT OUTER JOIN Orders ON Orders.Customer_Id=Customers.Customer_Id;
try this :
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName, Orders.TxnDate, Orders.Amount
FROM Orders
Right OUTER JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
I strongly recommend LEFT JOIN. This keeps all rows in the first table, along with matching columns in the second. If there are no matching rows, these columns are NULL:
SELECT c.Customer_Id, c.AcctOpenDate, c.CustomerFirstName, c.CustomerLastName,
o.TxnDate, o.Amount
FROM Customers c LEFT JOIN
Orders o
ON o.Customer_Id = c.Customer_Id;
Although you could use RIGHT JOIN, I never use RIGHT JOINs, because I find them much harder to follow. The logic of "keep all rows in the first table I read" is relatively simple. The logic of "I don't know which rows I'm keeping until I read the last table" is harder to follow.
Also note that I included table aliases and change the CustomerId to come from customers -- the table where you are keeping all rows.
Using CASE will replace "null" with 0 then you can sum the values. This will count customers with no transactions.
SELECT c.Name,
SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END) as TransactionsPerCustomer
FROM Customers c
LEFT JOIN Transactions t
ON c.Name = t.customerID
group by c.Name
SELECT c.Name,
SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END) as numberoftransaction
FROM customers c
LEFT JOIN transactions t
ON c.Name = t.customerID
group by c.Name
I remember this rule of thumb from back in college that if you put a left join in a SQL query, then all subsequent joins in that query must also be left joins instead of inner joins, or else you'll get unexpected results. But I don't remember what those results are, so I'm wondering if maybe I'm misremembering something. Anyone able to back me up on this or refute it? Thanks! :)
For instance:
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id -- this inner join might be bad mojo
Not that they have to be. They should be (or perhaps a full join at the end). It is a safer way to write queries and express logic.
Your query is:
select *
from customer c left join
ledger l
on c.id = l.customerid inner join
order o
on l.orderid = o.id
The left join says "keep all customers, even if there is no matching record in ledger. The second says, "I have to have a matching ledger record". So, the inner join converts the first to an inner join.
Because you presumably want all customers, regardless of whether there is a match in the other two tables, you would use a left join:
select *
from customer c left join
ledger l
on c.id = l.customerid left join
order o
on l.orderid = o.id
You remember correctly some parts of it!
The thing is, when you chain join tables like this
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id
The JOIN is executed sequentialy, so when customer left join ledger happens, you are making sure all joined keys from customer return (because it's a left join! and you placed customers to the left).
Next,
The results of the former JOIN are joined with order (using inner join), forcing the "the first join keys" to match (1 to 1) with the keys from order so you will end up only with records that were matched in order table as well
Bad mojo? it really depends on what you are trying to accomplish.
If you want to guarantee all records from customers return, you should keep "left joining" to it.
You can, however, make this a little more intuitive to understand (not necessarily a better way of writing SQL!) by writing:
SELECT * FROM
(
(SELECT * from customer) c
LEFT JOIN
(SELECT * from ledger) l
ON
c.id= l.customerid
) c_and_l
INNER JOIN (OR PERHAPS LEFT JOIN)
(SELECT * FROM order) as o
ON c_and_l.orderid (better use c_and_l.id as you want to refer to customerid from customers table) = o.id
So now you understand that c_and_l is created first, and then joined to order (you can imagine it as 2 tables are joining again)
I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Sorry for the confusing title, but I wasn't sure how to define this issue.
The background is:
two tables - a customer table and an account table.
The tables are joined on a Customer ID number.
The Account Table contains a NAICS or SIC code field.
This NAICS (SIC) code is used to select the customers. Only specific SICS are deined in the report specification.
The customer, however, may have additional accounts under OTHER SIC or NAICS codes.
Both the Accounts with the SIC or NAICS code that is used in the WHERE to filter for the customer AND any additional accounts linked to that customer must be selected.
A simplified version of the query is here:
SELECT
dbo.CUSTOMER.customer_id,
dbo.CUSTOMER.customer_full_name,
dbo.ACCOUNT.account_id,
dbo.ACCOUNT.date_first_account_opened,
dbo.ACCOUNT.NAICS_No,
dbo.ACCOUNT.NAICS_description
FROM
dbo.CUSTOMER
LEFT OUTER JOIN dbo.ACCOUNT WITH (nolock)
ON dbo.CUSTOMER.account_id = dbo.ACCOUNT.account_id
WHERE
dbo.account.NAICS)No in ('6011','6062', '6021', '6022', '6035', '6036', '6029', '6081', '522110')-- SIC and NAICS codes
This code will return X number of customers and their associated accounts as selected by the condition in the WHERE clause. What I need to get are any OTHER accounts associated with a customer that are NOT in the WHERE filter list.
Any ideas?
When outer joining don't put criteria on it in the where clause. Your where clause turns your join into an inner join because only found matches have a NAICS_No. Move your where clause to the ON clause.
SELECT
dbo.CUSTOMER.customer_id,
dbo.CUSTOMER.customer_full_name,
dbo.ACCOUNT.account_id,
dbo.ACCOUNT.date_first_account_opened,
dbo.ACCOUNT.NAICS_No,
dbo.ACCOUNT.NAICS_description
FROM
dbo.CUSTOMER
LEFT OUTER JOIN dbo.ACCOUNT WITH (nolock)
ON dbo.CUSTOMER.account_id = dbo.ACCOUNT.account_id
AND dbo.account.NAICS_No in ('6011','6062', '6021', '6022', '6035', '6036', '6029', '6081', '522110')-- SIC and NAICS codes
Something like this should work:
SELECT
cu.customer_id,
cu.customer_full_name,
ac.account_id,
ac.date_first_account_opened,
ac.NAICS_No,
ac.NAICS_description
FROM
dbo.CUSTOMER cu
INNER JOIN dbo.ACCOUNT ac
ON cu.account_id = ac.account_id
WHERE cu.customer_id in (-- List of customers with at least one "target" account
select distinct cu2.customer_id
from dbo.CUSTOMER cu2
inner join dbo.ACCOUNT ac2
on ac2.account_id = cu2.account_id
where ac.NAICS_No in ('6011','6062', '6021', '6022', '6035', '6036', '6029', '6081', '522110'))
The subequery gets a list of all customers with at least one of the specified accounts, and the "outer" query gets all accounts for those customers.