Struggling to interpret this SQL join syntax - sql

Two pretty similar queries:
Query #1:
SELECT *
FROM employee e
LEFT JOIN employee_payments ep
INNER JOIN payments p ON ep.payment_id = p.id
ON ep.employee_id = e.id
Query #2:
SELECT *
FROM employee e
LEFT JOIN employee_payments ep ON ep.employee_id = e.id
INNER JOIN payments p ON ep.payment_id = p.id
But obviously crucially different syntax.
The way I learn these new syntax concepts best are to interpret them as plain English. So how could you describe what these are selecting?
I would expect that they'd produce the same results, but it feels to me like the LEFT JOIN in the second query acts as an INNER JOIN somehow - since a fraction of my results set are returned (i.e. the employees with payments).
If the first query 'says' "give me all employees, along with any available employee_payments (that have already been joined with their payment record)"- what does the second query say?

If the first query 'says' "give me all employees, along with any available employee_payments (that have already been joined with their payment record)"- what does the second query say?
I suppose you might put it as "Take all employees along with any available employee_payments. Join this with the payment records."
The "Join this with the payment records" is what filters out employees that don't have any associated employee_payments records: the attempt to join with the payment records will fail.
but it feels to me like the LEFT JOIN in the second query acts as an INNER JOIN somehow
It's not the LEFT JOIN that's doing the filtering, but it does indeed give the exact same result as if the LEFT JOIN had been an INNER JOIN.

In order to understand the logical order1 in which joins happen, you need to look at the ON clauses. For each ON clause that you encounter, you pair it with the closest previous JOIN clause that hasn't already been processed. This means that you first query is:
INNER JOIN ep to p (producing, say, ep')
LEFT JOIN e to ep'
And your second query is:
LEFT JOIN e to ep (producing, say, e')
INNER JOIN e' to p
Since the conditions of the INNER JOIN rely upon columns present in ep, this is why the different join orders matter here.
1The logical join order determines the final shape of the result set. SQL Server is free to perform joins in any order it sees fit, but it must produce results consistent with the logical join order.

Related

Is it true that all joins following a left join in a SQL query must also be left joins? Why or why not?

I remember this rule of thumb from back in college that if you put a left join in a SQL query, then all subsequent joins in that query must also be left joins instead of inner joins, or else you'll get unexpected results. But I don't remember what those results are, so I'm wondering if maybe I'm misremembering something. Anyone able to back me up on this or refute it? Thanks! :)
For instance:
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id -- this inner join might be bad mojo
Not that they have to be. They should be (or perhaps a full join at the end). It is a safer way to write queries and express logic.
Your query is:
select *
from customer c left join
ledger l
on c.id = l.customerid inner join
order o
on l.orderid = o.id
The left join says "keep all customers, even if there is no matching record in ledger. The second says, "I have to have a matching ledger record". So, the inner join converts the first to an inner join.
Because you presumably want all customers, regardless of whether there is a match in the other two tables, you would use a left join:
select *
from customer c left join
ledger l
on c.id = l.customerid left join
order o
on l.orderid = o.id
You remember correctly some parts of it!
The thing is, when you chain join tables like this
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id
The JOIN is executed sequentialy, so when customer left join ledger happens, you are making sure all joined keys from customer return (because it's a left join! and you placed customers to the left).
Next,
The results of the former JOIN are joined with order (using inner join), forcing the "the first join keys" to match (1 to 1) with the keys from order so you will end up only with records that were matched in order table as well
Bad mojo? it really depends on what you are trying to accomplish.
If you want to guarantee all records from customers return, you should keep "left joining" to it.
You can, however, make this a little more intuitive to understand (not necessarily a better way of writing SQL!) by writing:
SELECT * FROM
(
(SELECT * from customer) c
LEFT JOIN
(SELECT * from ledger) l
ON
c.id= l.customerid
) c_and_l
INNER JOIN (OR PERHAPS LEFT JOIN)
(SELECT * FROM order) as o
ON c_and_l.orderid (better use c_and_l.id as you want to refer to customerid from customers table) = o.id
So now you understand that c_and_l is created first, and then joined to order (you can imagine it as 2 tables are joining again)

how to join these tables in sql

I have these tables and would like to query them to show the all clients and their groups (if any), the following image describes the case:
How to join tables to get the result using sql server?
This looks like a lesson teaching CROSS JOIN. Because you want a row in your result for each intersection of client and group, whether or not it is valid, you want to cross join those tables then see if there is a matching record in client_group. In a working application this cross join could get unwieldy very quickly, with a few thousand groups and clients you'd have many millions of results.
Something like this should get your cartesian result and see if a matching record is found:
SELECT
c.id 'client_id', g.Id 'group_id', ISNULL(cg.client_id)
FROM
(client c
CROSS JOIN group g)
LEFT JOIN client_group cg ON c.id = cg.client_id AND g.id = cg.group_id
More on joining:
What is the difference between Left, Right, Outer and Inner Joins?

When I add a LEFT OUTER JOIN, the query returns only a few rows

The original query returns 160k rows. When I add the LEFT OUTER JOIN:
LEFT OUTER JOIN Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
the query returns only 150 rows. I'm not sure what I'm doing wrong.
All I need to do is add a column to the query, which will bring back a code from a different table. The code could be a number or a NULL. I still have to display NULL, hence the reason for the LEFT join. They should join on the "id" columns.
SELECT <lots of stuff> + the new column that I need (called "code").
FROM
dbo.Table_A A WITH (NOLOCK)
INNER JOIN
dbo.Table_B B WITH (NOLOCK) ON A.Id = B.Id AND A.version = B.version
--this is where I added the LEFT OUTER JOIN. with it, the query returns 150 rows, without it, 160k rows.
LEFT OUTER JOIN
Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
LEFT OUTER JOIN
Table_E E WITH (NOLOCK) ON A.agent = E.agent
LEFT OUTER JOIN
Table_D D WITH (NOLOCK) ON E.location = D.location
AND E.type = 'Organization'
AND D.af_type = 'agent_location'
INNER JOIN
(SELECT X , MAX(Version) AS MaxVersion
FROM LocalTable WITH (NOLOCK)
GROUP BY agemt) P ON E.agent = P.location AND E.Version = P.MaxVersion
Does anyone have any idea what could be causing the issue?
When you perform a LEFT OUTER JOIN between tables A and E, you are maintaining your original set of data from A. That is to say, there is no data, or lack of data, in table E that can reduce the number of rows in your query.
However, when you then perform an INNER JOIN between E and P at the bottom, you are indeed opening yourself up to the possibility of reducing the number of rows returned. This will treat your subsequent LEFT OUTER JOINs like INNER JOINs.
Now, without your exact schema and a set of data to test against, this may or may not be the exact issue you are experiencing. Still, as a general rule, always put your INNER JOINs before your OUTER JOINs. It can make writing queries like this much, much easier. Your most restrictive joins come first, and then you won't have to worry about breaking any of your outer joins later on.
As a quick fix, try changing your last join to P to a LEFT OUTER JOIN, just to see if the Z join works.
You have to be very careful once you start with LEFT JOINs.
Let's suppose this model: You have tables Products, Orders and Customers. Not all products necessarily have been ordered, but every order must have customer entered.
Task: Show all products, and if the product was ordered, list the ordering customers; i.e., product without orders will be shown as one row, product with 10 orders will have 10 rows in the resultset. This calls for a query designed around FROM Products LEFT JOIN Orders.
Now someone could think "OK, Customer is always entered into orders, so I can make inner join from orders to customers". Wrong. Since the table Customers is joined through left-joined table Orders, it has to be left-joined itself... otherwise the inner join will propagate into the previous level(s) and as a result, you will lose all products that have no orders.
That is, once you join any table using LEFT JOIN, any subsequent tables that are joined through this table, need to keep LEFT JOINs. But it does not mean that once you use LEFT JOIN, all joins have to be of that type... only those that are dependent on the first performed LEFT JOIN. It would be perfectly fine to INNER JOIN the table Products with another table Category for example, if you only want to see Products which have a category set.
(Answer is based on this answer: http://www.sqlservercentral.com/Forums/Topic247971-8-1.aspx -> last entry)

Mysql statement (syntax error on FULL JOIN)

What is wrong with my sql statement, it says that the problem is near the FULL JOIN, but I'm stumped:
SELECT `o`.`name` AS `offername`, `m`.`name` AS `merchantName`
FROM `offer` AS `o`
FULL JOIN `offerorder` AS `of` ON of.offerId = o.id
INNER JOIN `merchant` AS `m` ON o.merchantId = m.id
GROUP BY `of`.`merchantId`
Please be gentle, as I am not a sql fundi
MySQL doesn't offer full join, you can either use
a pair of LEFT+RIGHT and UNION; or
use a triplet of LEFT, RIGHT and INNER and UNION ALL
The query is also very wrong, because you have a GROUP BY but your SELECT columns are not aggregates.
After you convert this properly to LEFT + RIGHT + UNION, you still have the issue of getting an offername and merchantname from any random record per each distinct of.merchantid, and not even necessarily from the same record.
Because you have an INNER JOIN condition against o.merchant, the FULL JOIN is not necessary since "offerorder" records with no match in "offer" will fail the INNER JOIN. That turns it into a LEFT JOIN (optional). Because you are grouping on of.merchantid, any missing offerorder records will be grouped together under "NULL" as merchantid.
This is a query that will work, for each merchantid, it will show just one offer that the merchant made (the one with the first name when sorted in lexicographical order).
SELECT MIN(o.name) AS offername, m.name AS merchantName
FROM offer AS o
LEFT JOIN offerorder AS `of` ON `of`.offerId = o.id
INNER JOIN merchant AS m ON o.merchantId = m.id
GROUP BY `of`.merchantId, m.name
Note: The join o.merchantid = m.id is highly suspect. Did you mean of.merchantid = m.id? If that is the case, change the LEFT to RIGHT join.

Combining OUTER JOIN and WHERE

I'm trying to fetch some data from a database.
I want to select an employee, and if available, all appointments and other data related to that employee.
This is the query:
SELECT
TA.id,
TEI.displayname,
TA.threatment_id,
TTS.appointment_date,
TEI.displayname
FROM
tblemployee AS TE
LEFT OUTER Join tblappointment AS TA ON TE.employeeid = TA.employee_id
Inner Join tblthreatment AS T ON TA.threatment_id = T.threatmentid
Inner Join tblappointments AS TTS ON TTS.id = TA.appointments_id AND
TTS.appointment_date = '2009-09-28'
INNER Join tblemployeeinfo AS TEI ON TEI.employeeinfoid = TE.employeeinfoid
Inner Join tblcustomercard AS TCC ON TCC.customercardid = TTS.customercard_id
WHERE
TE.employeeid = 4
The problem is, it just returns null for all fields selected when there are no appointments. What am I not getting here?
Edit:
For clearity, i removed some of the collumns. I removed one too many. TEI.displayname should at least be displayed.
Looking at the list of columns returned by your query, you will notice that they all come from the "right" side of the LEFT OUTER JOIN. You do not include any columns from the "left" side of the join. Therefore, the expected result is the one you are observing — NULL values supplied for all right-hand columns in the result set for those rows that have no right-hand rows returned.
To see data even for those rows, include some columns from TE (tblemployee) in the result set.
Looking at your query I'm guessing that the situation is a bit more complex and that some of those tables on the right-hand side of the join should be moved to the left-hand side and, furthermore, that some of the other tables might possibly require their own OUTER joins to participate correctly in the query.
Edited w/ response to questioner's comment:
You have an odd situation (maybe not odd at all, depending on your application) in which you have an employee table and a separate employee information (employeeinfo) table.
Because you are joining the employeeinfo to the appointments table with an INNER join you can effectively think of them as a single table in terms of how they contribute to the final result set. Because this combined table REQUIRES a record in the appointments table and because this combined table is joined into the main result set with a LEFT OUTER join, the effect is that the employeeinfo record is not found if there's no appointment to link it to.
If you move the employeeinfo table to the left side of the join, or replace the employee table w/ the employeeinfo table, you should get the results you want.
In your query, you LEFT OUTER JOIN to the tblappointment table, but then you INNER JOIN to the tblthreatment and tblappointments tables.
You should try and structure your query in the order that you expect data to be there. Then in most simple queries, once you perform an OUTER join, most tables after that will be an OUTER join. This is by NO MEANS a rule and complex queries can vary, but in the marjority of simple queries its a good practice.
Try something like this for your query.
SELECT
TA.id,
TEI.displayname,
TA.threatment_id,
TTS.appointment_date
FROM
tblemployee AS TE
INNER Join
tblemployeeinfo AS TEI
ON
TEI.employeeinfoid = TE.employeeinfoid
LEFT OUTER Join
tblappointment AS TA
ON
TE.employeeid = TA.employee_id
LEFT OUTER JOIN
tblthreatment AS T
ON
TA.threatment_id = T.threatmentid
LEFT OUTER JOIN
tblappointments AS TTS
ON
TTS.id = TA.appointments_id
AND
TTS.appointment_date = '2009-09-28'
LEFT OUTER JOIN
tblcustomercard AS TCC
ON
TCC.customercardid = TTS.customercard_id
WHERE
TE.employeeid = 4
The issue is that the way you're joining (most of everything is joining to your left outer-joined table) whenever you're joining off of that, if the value in the outer joined table is nothing, there is nothing for the other fields to join to. Try to re-adjust your query so everything is joining off of your employeeID. I normally use left joined tables after I've limited everything down as much as possible with inner joins.
So my query would be something like:
SELECT
TA.id,
TEI.displayname,
TA.threatment_id,
TTS.appointment_date
FROM
tblemployee AS TE
INNER Join tblemployeeinfo AS TEI ON TEI.employeeinfoid = TE.employeeinfoid
Inner Join tblthreatment AS T ON TA.threatment_id = T.threatmentid
Inner Join tblappointments AS TTS ON TTS.id = TA.appointments_id AND
TTS.appointment_date = '2009-09-28'
Inner Join tblcustomercard AS TCC ON TCC.customercardid = TTS.customercard_id
LEFT OUTER Join tblappointment AS TA ON TE.employeeid = TA.employee_id
WHERE
TE.employeeid = 4
where the last outer join just gives me one column worth of information, not using it all to join more things onto. For speed, you also want to try to limit your information down as fast as possible with your first few inner joins, and then you do the outer joins last to join possible null values on to the smallest dataset you can. I hope this helps, if it's confusing, I'm sorry... I haven't had my caffeine yet.
The query is performing as it should.
A left out join will select all records from one table, join them with the records in another, and produce nulls where no records in the second table are found that match the join condition.
If you're looking for a separate behavior, you may want to think about two separate queries.