How does one implement SQL joins without using the JOIN keyword?
This is not really necessary, but I thought that by doing this I could better understand what joins actually do.
The basic INNER JOIN is easy to implement.
The following:
SELECT L.XCol, R.YCol
FROM LeftTable AS L
INNER JOIN RightTable AS R
ON L.IDCol=R.IDCol;
is equivalent to:
SELECT L.XCol, R.YCol
FROM LeftTable AS L, RightTable AS R
WHERE L.IDCol=R.IDCol;
In order to extend this to a LEFT/RIGHT/FULL OUTER JOIN, you only need to UNION the rows with no match, along with NULL in the correct columns, to the previous INNER JOIN.
For a LEFT OUTER JOIN, add:
UNION ALL
SELECT L.XCol, NULL /* cast the NULL as needed */
FROM LeftTable AS L
WHERE NOT EXISTS (
SELECT * FROM RightTable AS R
WHERE L.IDCol=R.IDCol)
For a RIGHT OUTER JOIN, add:
UNION ALL
SELECT NULL, R.YCol /* cast the NULL as needed */
FROM RightTable AS R
WHERE NOT EXISTS (
SELECT * FROM LeftTable AS L
WHERE L.IDCol=R.IDCol)
For a FULL OUTER JOIN, add both of the above.
There is an older deprecated SQL syntax that allows you to join without using the JOIN keyword.. but I personally find it more confusing than any permutation of the JOIN operator I've ever seen. Here's an example:
SELECT A.CustomerName, B.Address1, B.City, B.State, B.Zip
FROM dbo.Customers A, dbo.Addresses B
WHERE A.CustomerId = B.CustomerId
In the older way of doing it, you join by separating the tables with a comma and specifying the JOIN conditions in the WHERE clause. Personally, I would prefer the JOIN syntax:
SELECT A.CustomerName, B.Address1, B.City, B.State, B.Zip
FROM dbo.Customers A
JOIN dbo.Addresses B
ON A.CustomerId = B.CustomerId
The reason you should shy away from this old style of join is clarity and readability. When you are simply joining one table to another, it's pretty easy to figure out what's going on. When you're combining multiple types of joins across a half dozen (or more) tables, this older syntax becomes very challenging to manage.
The best way to get a handle on the JOIN operator is working with it. Here's a decent visual example of what the different JOINs do:
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
Some more info:
https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-old-style-joins
http://www.sqlservercentral.com/blogs/brian_kelley/2009/09/30/the-old-inner-join-syntax-vs-the-new-inner-join-syntax/
When SQL was an infant we didn't have "inner join" "left outer join" etc. All we did was list the tables like this:
FROM table1, table2, table3, .... tablen
Then we had a where clause that was like a novel in length, some of the conditions were for filtering the data, many of the conditions were to join tables, like this
FROM table1, table2, table2, .... tablen
WHERE table1.code = 'x' and table1.id = table3.fk and table2.name like 'a%' and table2.id = table1.fk and tablen.fk = table3.id and table2.dt >= '2014-01-01'
from this we hoped like heck we had all the tables nicely related and we crossed our fingers. The worst case scenario - which happened a lot - was that we forgot to include a table at all in the where clause. This was not nice because what we get when we do that is a "Cartesian product" (basically a multiplication of all rows by the number of rows in the table we missed).
Then came ANSI standard join syntax, and life was better. We now place the join conditions on the join - not in the where clause - and as a bonus the where clause is easier to understand.
I don't think you will find it easier to understand this ancient syntax, for example an outer join was join = bizarre(+) or maybe it was (+)bizarre = join (I try not to remember).
Try http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
Related
I'm confused by a bit of a query i'm working with.
select *
from Table1
inner join Table2
on Table1.id1 = Table2.id1
right outer join Table3
right outer join Table4
inner join Table5
on Table4.id1 = Table5.id1
on Table3.id1 = Table5.id2
on Table1.id2 = Table5.id3
I tried to keep the query as close to what i'm working with as I could.
I don't understand the joins without the ON and then the join with multiple ONs.
Are tables 3 and 4 not actually being joined until after table 5 is joined?
The following doesn't work as Table5.id1 and Table5.id2 receive 'multi-part identifier "Table5.id_" could not be bound
select *
from Table1
inner join Table2
on Table1.id1 = Table2.id1
right outer join Table3
on Table3.id1 = Table5.id2
right outer join Table4
on Table4.id1 = Table5.id1
inner join Table5
on Table1.id2 = Table5.id3
Additionally, this bit does process because table 5 is joined first and solves the bounding error, but I receive about 27k more records than is wanted
select *
from Table1
inner join Table2
on Table1.id1 = Table2.id1
inner join Table5
on Table1.id2 = Table5.id3
right outer join Table3
on Table3.id1 = Table5.id2
right outer join Table4
on Table4.id1 = Table5.id1
So at this point it's obvious the original query is built the way it is for a reason, but I still don't understand the logic behind it or what is actually happening.
Any help would be much appreciated.
What you have here are multiple nested joins. Before I explain, I'm going to reformat the query a bit to make it easier to see what's going on.
select *
from Table1
inner join Table2 on Table1.id1 = Table2.id1
right outer join Table3 -- Join B
right outer join Table4 -- Join A
inner join Table5 on Table4.id1 = Table5.id1
on Table3.id1 = Table5.id2 -- ON clause for Join A
on Table1.id2 = Table5.id3 -- ON clause for Join B
Nested joins let you join two tables together then join that result to another set of records. Initially that doesn't sound terribly useful. That's just a regular join, right? Kinda. The difference is that only if the inner-most join succeeds does it attempt to join that row to the outer table. This isn't really useful at all if all you are doing is using inner joins. It becomes a lot more interesting if you are mixing inner and outer joins (more on that shortly).
I'll attempt to explain what's going on with this query, both in prose and in comments so hopefully between the two it will make sense.
First, the inner-most join here is an inner join between tables 4 and 5. Those tables are joined together first. That will give you a result set where each row in Table4 has at least one matching row in Table5 (according to whatever criteria exists in the on clause, in this case that Table4.id1 = Table5.id1). This implicitly filters out any rows from both Table4 and Table5 that don't have a match in the other table.
Then that result is then right joined to Table3 (on Table3.id1 = Table5.id2). Meaning you will get all records from Table3 joined with their corresponding match in the Table4/5 join set (if present).
Then we do a right join on that whole result set with Table1 (on Table3.id1 = Table5.id2). Meaning we will end up with everything in Table3 joined to the Table4/5 combo and then to a Table 1/2 combo.
The ultimate result set is everything from Table3 joined with 0 or more rows that match with Table1 and Table2 (if Table1 doesn't have a matching Table2 record, neither will be joined to Table3). Same for Table4/5. I believe this is correct (too much staring at this without the ability to run the query means I may have confused myself, but the basic idea is correct).
So why this crazy syntax? Alternatives are kind of a pain too. You could use CTEs or apply statements, both of which are their own kind of fun (not necessarily hard, just not your vanilla SQL. I tried converting your query using those and I think I got reasonably close, then I confused myself into a corner because of poor naming of things then I gave up). So why do this? Well it means you can ensure that you can outer join two tables to a third table only if there are matches in the first two tables. Maybe a more concrete example would help?
Say you have 4 tables Person, Order, OrderItem and OrderItemDiscount. You are tasked with getting back a result set that shows every order and to highlight orders that contain a Figlewubbit and where a discount code was used on it. So you write this:
select *
from Person p
left join Order o on o.PersonId = p.PersonId
left join OrderItem oi on oi.OrderId = o.OrderId
and oi.ItemName = 'Figlewubbit'
left join OrderItemDiscount oid on oid.OrderItemId = oi.OrderItemId
Another way to write it would be this:
select *
from Person p
left join Order o on o.PersonId = p.PersonId
left join OrderItem oi
inner join OrderItemDiscount oid on oid.OrderItemId = oi.OrderItemId
on oi.OrderId = o.OrderId
and oi.ItemName = 'Figlewubbit'
The execution plan here will change. OrderItem and OrderItemdDiscount will get joined together then that set will get fed into the left join to Order. Each OrderItem and OrderItemdDiscount joined row is effectively treated as a combined entity for the other joins. You won't get one without the other.
(I apologize if this example seems contrived. Nested joins are a weird beast. They have their uses (I've needed them once or twice). But coming up with a simple example that requires their use is quite hard. They are a very specialized tool that usually requires an equally specialized (and complicated) requirement to warrant their use. I highly recommend researching this some more and using simple versions of them first. Combining right joins and multiple nested joins even gives me a headache trying to parse it.)
Actually, tables 4 and 5 are joined first, then table 3 is joined. Here's execution plan:
I'm a complete novice teaching myself SQL by writing and modifying a few queries and reports at work.
I've got something of a handle on the various types of JOINs and I've used INNER JOIN a few times with decent success.
What I'm stuck on should be a simple task, but my Google-Fu must be weak. Here's what I'm trying to do.
Say I have 3 tables, Table_A, Table_B, and Table_C, and each table has a column called [Serial_Number].
What I'm wanting to select is 3 of the other columns if A.Serial_Number = B.Serial_Number OR C.Serial_Number.
I've tried doing:
SELECT
*
FROM
Table_A AS A
INNER JOIN Table_B AS B ON A.Serial_Number = B.Serial_Number
INNER JOIN Table_C AS C ON A.Serial_Number = C.Serial_Number
But this always yields 0 results as the nature of the data dictates that if A matches B, it will never match C and vice versa. I also tried a LEFT OUTER JOIN as the second clause, but this just includes NULLs from Table_C that have already matched on Table_B.
All the searches I have done relating to JOINs on multiple tables seem to be about using JOINS to further exclude records, where I'm actually wanting to INCLUDE more records.
Like I said, I'm sure this is really simple, just needing a nudge in right direction.
Thanks!
The use of two inner joins here is akin to saying
If A.Serial_Number = B.Serial_Number AND
A.Serial_Number = C.Serial_Number
Using left outer join on the second clause - by which i presume you mean second join - would perform a left join on a result set already filtered by A.Serial_Number = B.Serial_Number by the first inner join. Given that B.Serial_Number doesn't relate to C.Serial_Number you wouldn't expect the an equijoin to return any result from tablec.
What you want is a left outer join like you tried but for both tableb and tablec.
Select *
From tablea
Left join tableb on tableb.Serial_Number = tablea.Serial_Number
Left join tablec on tablec.Serial_Number = tablea.Serial_Number
This way regardless of whether tablea.Serial_Number is in tableb it will still be returned and thus available to be joined to tablec
Agreed. Your output for your inner joins is producing NULLs which is why it is resulting in 0. I would suggest modifying your INNER JOIN.
Can you tell me if inner join and equi-join are the same or not ?
An 'inner join' is not the same as an 'equi-join' in general terms.
'equi-join' means joining tables using the equality operator or equivalent. I would still call an outer join an 'equi-join' if it only uses equality (others may disagree).
'inner join' is opposed to 'outer join' and determines how to join two sets when there is no matching value.
Simply put: an equi-join is a possible type of inner-joins
For a more in-depth explanation:
An inner-join is a join that returns only rows from joined tables where a certain condition is met. This condition may be of equality, which means we would have an equi-join; if the condition is not that of equality - which may be a non-equality, greater than, lesser than, between, etc. - we have a nonequi-join, called more precisely theta-join.
If we do not want such conditions to be necessarily met, we can have
outer joins (all rows from all tables returned), left join (all rows
from left table returned, only matching for right table), right join
(all rows from right table returned, only matching for left table).
The answer is NO.
An equi-join is used to match two columns from two tables using explicit operator =:
Example:
select *
from table T1, table2 T2
where T1.column_name1 = T2.column_name2
An inner join is used to get the cross product between two tables, combining all records from both tables. To get the right result you can use a equi-join or one natural join (column names between tables must be the same)
Using equi-join (explicit and implicit)
select *
from table T1 INNER JOIN table2 T2
on T1.column_name = T2.column_name
select *
from table T1, table2 T2
where T1.column_name = T2.column_name
Or Using natural join
select *
from table T1 NATURAL JOIN table2 T2
The answer is No,here is the short and simple for readers.
Inner join can have equality (=) and other operators (like <,>,<>) in the join condition.
Equi join only have equality (=) operator in the join condition.
Equi join can be an Inner join,Left Outer join, Right Outer join
If there has to made out a difference then ,I think here it is .I tested it with DB2.
In 'equi join'.you have to select the comparing column of the table being joined , in inner join it is not compulsory you do that . Example :-
Select k.id,k.name FROM customer k
inner join dealer on(
k.id =dealer.id
)
here the resulted rows are only two columns rows
id name
But I think in equi join you have to select the columns of other table too
Select k.id,k.name,d.id FROM customer k,dealer d
where
k.id =d.id
and this will result in rows with three columns , there is no way you cannot have the unwanted compared column of dealer here(even if you don't want it) , the rows will look like
id(from customer) name(from Customer) id(from dealer)
May be this is not true for your question.But it might be one of the major difference.
The answer is YES, But as a resultset. So here is an example.
Consider three tables:
orders(ord_no, purch_amt, ord_date, customer_id, salesman_id)
customer(customer_id,cust_name, city, grade, salesman_id)
salesman(salesman_id, name, city, commission)
Now if I have a query like this:
Find the details of an order.
Using INNER JOIN:
SELECT * FROM orders a INNER JOIN customer b ON a.customer_id=b.customer_id
INNER JOIN salesman c ON a.salesman_id=c.salesman_id;
Using EQUI JOIN:
SELECT * FROM orders a, customer b,salesman c where
a.customer_id=b.customer_id and a.salesman_id=c.salesman_id;
Execute both queries. You will get the same output.
Coming to your question There is no difference in output of equijoin and inner join. But there might be a difference in inner executions of both the types.
If you are going to join multiple tables in a SQL query, where do you think is a better place to put the join statement: in the FROM clause or the WHERE clause?
If you are going to do it in the FROM clause, how do you format it so that it is clear and readable? (I'm talking about indents, newlines, whitespace in general.)
Are there any advantages/disadvantages to each?
I tend to use the FROM clause, or rather the JOIN clause itself, indenting like this (and using aliases):
SELECT t1.field1, t2.field2, t3.field3
FROM table1 t1
INNER JOIN table2 t2
ON t1.id1 = t2.id1
INNER JOIN table3 t3
ON t1.id1 = t3.id3
This keeps the join condition close to where the join is made. I find it easier to understand this way then trying to look through the WHERE clause to figure out what exactly is joined how.
When making OUTER JOINs (ANSI-89 or ANSI-92), filtration location matters because criteria specified in the ON clause is applied before the JOIN is made. Criteria against an OUTER JOINed table provided in the WHERE clause is applied after the JOIN is made. This can produce very different result sets.
In comparison, it doesn't matter for INNER JOINs if the criteria is provided in the ON or WHERE clauses -- the result will be the same. That said, I strive to keep the WHERE clause clean -- anything related to JOINed tables will be in their respective ON clause. Saves hunting through the WHERE clause, which is why ANSI-92 syntax is more readable.
I prefer the FROM clause if for no other reason that it distinguishes between filtering results (from a Cartesian product) merely between foreign key relationships and between a logical restriction. For example:
SELECT * FROM Products P JOIN ProductPricing PP ON P.Id = PP.ProductId
WHERE PP.Price > 10
As opposed to
SELECT * FROM Products P, ProductPricing PP
WHERE P.Id = PP.ProductID AND Price > 10
I can look at the first one and instantly know that the only logical restriction I'm placing is the price, as opposed to the implicit machinery of joining tables together on the relationship key.
I almost always use the ANSI 92 joins because it makes it clear that these conditions are for JOINING.
Typically I write it this way
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
sometimes I write it this way when it trivial
FROM
foo f
INNER JOIN bar b ON f.id = b.id
INNER JOIN baz b2 ON b.id = b2.id
When its not trivial I do the first way
e.g.
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
and b.type = 1
or
FROM
foo f
INNER JOIN (
SELECT max(date) date, id
FROM foo
GROUP BY
id) lastF
ON f.id = lastF.id
and f.date = lastF.Date
Or really the weird (not sure if I got the parens correctly but its supposed to be an LEFT join to table bar but bar needs an inner join to baz)
FROM
foo f
LEFT JOIN (bar b
INNER JOIN baz b2
ON b.id = b2.id
)ON f.id = b.id
You should put joins in Join clauses which means the From clause. A different question could be had about where to put filtering statements.
With respect to indenting, there are many styles. My preference is to indent related joins and keep main clauses like Select, From, Where, Group By, Having and Order By indented at the same level. In addition, I put each of these main attributes and the first line of an On clause on its own line.
Select ..
From Table1
Join Table2
On Table2.FK = Table1.PK
And Table2.OtherCol = '12345'
And Table2.OtherCol2 = 9876
Left Join (Table3
Join Table4
On Table4.FK = Table3.PK)
On Table3.FK = Table2.PK
Where ...
Group By ...
Having ...
Order By ...
Use the FROM clause to be compliant with ANSI-92 standards.
This:
select *
from a
inner join b
on a.id = b.id
where a.SomeColumn = 'x'
Not this:
select *
from a, b
where a.id = b.id
and a.SomeColumn = 'x'
I definitely always do my JOINS (of whatever type) in my FROM clause.
The way I indent them is this:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
In fact, I'll generally go a step farther and move as much of my constraining logic from the WHERE clause to the FROM clause, because this (at least in MS SQL) front-loads the constraint, meaning that it reduces the size of the recordset sooner in the query construction (I've seen documentation that contradicts this, but my execution plans are invariably more efficient when I do it this way).
For example, if I wanted to only select things in the above query where t3.id = 3, you could but that in the WHERE clause, or you could do it this way:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
AND
t3.id = 3
I personally find queries laid out in this way to be very readable and maintainable, but this is certainly a matter of personal preference, so YMMV.
Regardless, I hope this helps.
ANSI joins. I omit any optional keywords from the SQL as they only add noise to the equation. There's no such thing as a left inner join, is there? And by default, a simple join is an inner join, so there's no particular point to saying 'inner join'.
Then I column align things as much as possible.
The point being that a large complex SQL query can be very difficult to comprehend, so the more order that is imposed on it to make it more readable, the better. Any body looking at the query to fix, modify or tune it, needs to be able to answer a few things off right off the bat:
what tables/views are involved in the query?
what are the criteria for each join? What's the cardinality of each join?
what/how many columns are returned by the query
I like to write my queries so they look something like this:
select PatientID = rpt.ipatientid ,
EventDate = d.dEvent ,
Side = d.cSide ,
OutsideHistoryDate = convert(nchar, d.devent,112) ,
Outcome = p.cOvrClass ,
ProcedureType = cat.ctype ,
ProcedureCategoryMajor = cat.cmajor ,
ProcedureCategoryMinor = cat.cminor
from dbo.procrpt rpt
join dbo.procd d on d.iprocrptid = rpt.iprocrptid
join dbo.proclu lu on lu.iprocluid = d.iprocluid
join dbo.pathlgy p on p.iProcID = d.iprocid
left join dbo.proccat cat on cat.iproccatid = lu.iproccatid
where procrpt.ipatientid = #iPatientID
I'm creating a really complex dynamic sql, it's got to return one row per user, but now I have to join against a one to many table. I do an outer join to make sure I get at least one row back (and can check for null to see if there's data in that table) but I have to make sure I only get one row back from this outer join part if there's multiple rows in this second table for this user.
So far I've come up with this: (sybase)
SELECT a.user_id
FROM table1 a
,table2 b
WHERE a.user_id = b.user_id
AND a.sub_id = (
SELECT min(c.sub_id)
FROM table2 c
WHERE b.sub_id = c.sub_id
)
The subquery finds the min value in the one to many table for that particular user.
This works but I fear nastiness from doing correlated subqueries when table 1 and 2 get very large.
Is there a better way? I'm trying to dream up a way to get joins to do it, but I'm not seeing it.
Also saying "where rowcount=1" or "top 1" doesn't help me, because I'm not trying to fix the above query, I'm ADDING the above to an already complex query.
In MySql you can ensure that any query returns at most X rows using
select *
from foo
where bar = 1
limit X;
Unfortunately, I'm fairly sure this is a MySQL-specific extension to SQL. However, a Google search for something like "mysql sybase limit" might turn up an equivalent for Sybase.
A few quick points:
You need to have definitive business rules. If the query returns more than one row then you need to think about why (beyond just "it's a 1:many relationship - WHY is it a 1:many relationship?). You should come up with the business solution rather than just use "min" because it gives you 1 row. The business solution might simply be "take the first one", in which case min might be the answer, but you need to make sure that's a conscious decision.
You should really try to use the ANSI syntax for joins. Not just because it's standard, but because the syntax that you have isn't really doing what you think it's doing (it's not an outer join) and some things are simply impossible to do with the syntax that you have.
Assuming that you end up using the MIN solution, here's one possible solution without the subquery. You should test it with various other solutions to make sure that they are equivalent in outcome and to see which performs the best.
SELECT
a.user_id, b.*
FROM
dbo.Table_1 a
LEFT OUTER JOIN dbo.Table_2 b ON b.user_id = a.user_id AND b.sub_id = a.sub_id
LEFT OUTER JOIN dbo.Table_2 c ON c.user_id = a.user_id AND c.sub_id < b.sub_id
WHERE
c.user_id IS NULL
You'll need to test this to see if it's really giving what you want and you might need to tweak it, but the basic idea is to use the second LEFT OUTER JOIN to ensure that there are no rows that exist with a lower sub_id than the one found in the first LEFT OUTER JOIN (if any is found). You can adjust the criteria in the second LEFT OUTER JOIN depending on the final business rules.
How about:
select a.user_id
from table1 a
where exists (select null from table2 b
where a.user_id = b.user_id
)
Maybe your example is too simplified, but I'd use a group by:
SELECT
a.user_id
FROM
table1 a
LEFT OUTER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
a.user_id
I fear the only other way would be using nested queries:
The difference between this query and your example is a 'sub table' is only generated once, however in your example you generate a 'sub table' for each row in table1 (but may depend on the compiler, so you might want to use query analyser to check performance).
SELECT
a.user_id,
b.sub_id
FROM
table1 a
LEFT OUTER JOIN (
SELECT
user_id,
min(sub_id) as sub_id,
FROM
table2
GROUP BY
user_id
) b ON (a.user_id = b.user_id)
Also, if your query is getting quite complex I'd use temporary tables to simplify the code, it might cost a little more in processing time, but will make your queries much easier to maintain.
A Temp Table example would be:
SELECT
user_id
INTO
#table1
FROM
table1
WHERE
.....
SELECT
a.user_id,
min(b.sub_id) as sub_id,
INTO
#table2
FROM
#table1 a
INNER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
a.user_id
SELECT
a.*,
b.sub_id
from
#table1 a
LEFT OUTER JOIN #table2 b ON (a.user_id = b.user_id)
First of all, I believe the query you are trying to write as your example is:
select a.user_id
from table1 a, table2 b
where a.user_id = b.user_id
and b.sub_id = (select min(c.sub_id)
from table2 c
where b.user_id = c.user_id)
Except you wanted an outer join (which I think someone edited out the Oracle syntax).
select a.user_id
from table1 a
left outer join table2 b on a.user_id = b.user_id
where b.sub_id = (select min(c.sub_id)
from table2 c
where b.user_id = c.user_id)
Well, you already have a query that works. If you are concerned about the speed you could
Add a field to table2 which
identifies which sub_id is the
'first one' or
Keep track of table2's primary key in table1, or in another table