SQL Implicit Join joining 3 tables - sql

I understand implicit JOINS are outdated but it's for a homework assignment. All his examples for implicit joins only join 2 tables, and I can't find an example anywhere that joins 3.
SELECT Name
FROM Employee
JOIN EmployeeSkills
ON (EmployeeID=E.ID)
JOIN Skill ON (SkillID=S.ID)
WHERE Title=’DBA’;
Here is the explicit version of what I want. How would I write this implicitly?
Thanks

Here's how I'd write it:
SELECT E.Name FROM Employee E,EmployeeSkills ES,Skill S
WHERE E.ID = ES.EmployeeID
AND ES.SkillID = S.ID
AND S.Title=’DBA’;
Pretty much the same answer as the first you received, but, for clarity, once I'd start using aliases I'd use them throughout, and make sure you define them. In the above example, both the employee and the skill could have a Name, and the Title could be an employee's title or the name of a skill. Using the table name (or alias) makes it easy to see what's going on even if you're not familiar with the schema.
Also, might have been the database I was using years ago, but it had a performance hit (very slight, but big difference on big data) if you didn't write the joins in the order you introduced the tables.

It's pretty much the same as the two table examples:
SELECT a.Name
FROM Employee a,EmployeeSkills b ,Skill c
WHERE a.EmployeeID = b.ID
AND b.SkillID = c.ID
AND Title=’DBA’;
Edit: Good point made by Everett Warren to alias, best practice is not using implicit joins as they were deprecated long ago.

Related

Clarification on PostgreSQL update join statement?

I can't seem to find any documentation that clearly elaborate how certain update-join statements work in PostgreSQL. Suppose there are three tables in a database: professors, classes, and classrooms. In the professors table, one of the attributes is a class_id, a foreign key referencing the classes table. In the classes table, there is a classroom_id, referencing the classrooms table. This is the command I'm interested in:
UPDATE classes c SET year = 2
FROM classes cl
JOIN professors on cl.class_id = professors.class_id
JOIN classrooms on cl.classroom_id = classrooms.classroom_id
WHERE cl.class_id = c.class_id
This seems to calculate X = inner join(inner join(classes, professors), classrooms) and updates year = 2 for every class that is included in X. Is this correct? Furthermore, I don't understand how the WHERE clause accomplishes this. Why does this work, and why don't I have to use the IN keyword to accomplish this task?
I would appreciate a simple and systematic explanation of what an UPDATE... FROM... JOIN... WHERE statement does in PostgreSQL. Thank you so much!
I don't get your data model, because it seems that a professor could teach more than one course.
That said, your query is
UPDATE classes c
SET year = 2
FROM classes c2 JOIN
professors p
ON c2.class_id = p.class_id JOIN
classrooms cr
ON c2.classroom_id = cr.classroom_id
WHERE c2.class_id = c.class_id;
The FROM clause is doing a JOIN, just as you would expect. You could check the results by using SELECT:
SELECT *
FROM . . . <the FROM clause here>
What else is happening? Well, the UPDATE is telling Postgres to update the table classes. Ignore the fact that classes is in the FROM clause (that is a different reference).
How does it know which records to update? Well, the WHERE clause is saying to update the rows that match the FROM. In this case, the JOINs are doing the filtering.
You can also express this logic with IN or EXISTS:
UPDATE classes c
SET year = 2
WHERE EXISTS (SELECT 1 FROM professors p WHERE c.class_id = p.class_id) AND
EXISTS (SELECT 1 FROM classrooms cr WHERE c.classroom_id = cr.classroom_id);
Personally I think the intention is clearer with this version.
I'm not sure this explanation is really systematic, but here goes.
When you UPDATE classes, it already knows what table you are dealing with. In practice I don't generally reference the table again after FROM but I'm sure there are cases where it might be preferable to do a join than use WHERE.
The WHERE clause in your example simply makes sure that both instances of the classes table are dealing with the same rows. Otherwise I guess it would be an outer join.
In your example, there seems no point to the other JOINS. Often there is, when you might want to constrain the update to certain professors, as in WHERE professors.professor='Einstein'.
Finally, as I mentioned, I don't usually use the updated table a second time, rather I'd use.
UPDATE classes c SET year = 2
FROM professors, classrooms
WHERE on c.class_id = professors.class_id
AND c.classroom_id = classrooms.classroom_id
Works fine for me, but there may be cases where the other syntax is preferable, especially if you want a left join or some such.

SQL Question: Does the order of the WHERE/INNER JOIN clause when interlinking table matter?

Exam Question (AQA A-level Computer Science):
[Primary keys shown by asterisks]
Athlete(*AthleteID*, Surname, Forename, DateOfBirth, Gender, TeamName)
EventType(*EventTypeID*, Gender, Distance, AgeGroup)
Fixture(*FixtureID*, FixtureDate, LocationName)
EventAtFixture(*FixtureID*, *EventTypeID*)
EventEntry(*FixtureID*, *EventTypeID*, *AthleteID*)
A list is to be produced of the names of all athletes who are competing in the fixture
that is taking place on 17/09/18. The list must include the Surname, Forename and
DateOfBirth of these athletes and no other details. The list should be presented in
alphabetical order by Surname.
Write an SQL query to produce the list.
I understand that you could do this two ways, one using a WHERE clause and the other using the INNER JOIN clause. However, I am wondering if the order matters when linking the tables.
First exemplar solution:
SELECT Surname, Forename, DateOfBirth
FROM Athlete, EventEntry, Fixture
WHERE FixtureDate = "17/09/2018"
AND Athlete.AthleteID = EventEntry.AthleteID
AND EventEntry.FixtureID = Fixture.FixtureID
ORDER BY Surname
Here is the first exemplar solution, would it still be correct if I was to switch the order of the tables in the WHERE clause, for example:
WHERE FixtureDate = "17/09/2018"
AND EventEntry.AthleteID = Athlete.AthleteID
AND Fixture.FixtureID = EventEntry.FixtureID
I have the same question for the INNER JOIN clause to, here is the second exemplar solution:
SELECT Surname, Forename, DateOfBirth
FROM Athlete
INNER JOIN EventEntry ON Athlete.AthleteID = EventEntry.AthleteID
INNER JOIN Fixture ON EventEntry.FixtureID = Fixture.FixtureID
WHERE FixtureDate = "17/09/2018"
ORDER BY Surname
Again, would it be correct if I used this order instead:
INNER JOIN EventEntry ON Fixture.FixtureID = EventEntry.FixtureID
If the order does matter, could somebody explain to me why it is in the order shown in the examples?
Some advice:
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Use table aliases that are abbreviations for the table names.
Use standard date formats!
Qualify all column names.
Then, the order of the comparisons doesn't matter for equality. I would recommend using a canonical ordering.
So, the query should look more like:
SELECT a.Surname, a.Forename, a.DateOfBirth
FROM Athlete a INNER JOIN
EventEntry ee
ON a.AthleteID = ee.AthleteID INNER JOIN
Fixture f
ON ee.FixtureID = f.FixtureID
WHERE a.FixtureDate = '2018-09-17'
ORDER BY a.Surname;
I am guessing that all the columns in the SELECT come from Athlete. If that is not true, then adjust the table aliases.
There are lots of stylistic conventions for SQL and #gordonlinoff's answer mentions some of the perennial ones.
There are a few answers to your question.
The most important is that (notionally) SQL is a declarative language - you tell it what you want it to do, not how to do it. In a procedural language (like C, or Java, or PHP), the order of execution really matters - the sequence of instructions is part of the procedure. In a declarative language, the order doesn't really matter.
This wasn't always totally true - older query optimizers seemed to like the more selective where clauses earlier on in the statement for performance reasons. I haven't seen that for a couple of decades now, so assume that's not really a thing.
Because order doesn't matter, but correctly understanding the intent of a query does, many SQL developers emphasize readability. That's why we like explicit join syntax, and meaningful aliases. And for readability, the sequence of instructions can help. I favour starting with the "most important" table, usually the one from which you're selecting most columns, and then follow a logical chain of joins from one table to the next. This makes it easier to follow the logic.
When you use inner joins order does not matter as long as the prerequisite table is above/before. At your example both joins start from table Athlete so order doesn't matter. If however this very query is found starting from EventEntry (for any reason), then you must join at Athlete at the first inner else you cannot join to Fixture. As recommended, it is best to use standard join syntax and preferable place all inner joins before all lefts. If you cant then you need to review because the left you need to put inside the group of inner joins will probably behave like an inner join. That is because an inner below uses the left table else you could place it below the inner block. So when it comes to null the left will be ok but the inner below will cut the record.
When however the above cases do not exist/affect order and all inner joins can be placed at any order, only performance matters. Usually table with high cardinality on top perform better while there are cases where the opposite works better. So if the order is free you may try higher to lower cardinality tables ordering or the opposite - whatever works faster.
Clarifying: As prerequisite table i call the table needed by the joined table by condition: ... join B on [whatever] join C on c.id=b.cid - here table B is prerequisite for table C.
I mention left joins because while the question is about inner order, when joins are mixed (inners and lefts)then order of joins alone is important (to be all above) as may affect query logic:
... join B on [whatever] left join C on c.id=b.cid join D on D.id = C.did
At the above example the left join sneaks into the inner joins order. We cannot order it after D because it is prerequisite for D. For records however where condition c.id=b.cid is not true the entire B table row turns null and then the entire result row (B+C+D) turns off the results because of D.id = C.did condition of the following inner join. This example needs review as the purpose of left join evaporates by the following (next on order) inner join. Concluding, the order of inner joins when mixed with lefts is better to be on top without any left joins interfering.

Using "From Multiple Tables" or "Join" performance difference [duplicate]

Most SQL dialects accept both the following queries:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x
SELECT a.foo, b.foo
FROM a
LEFT JOIN b ON a.x = b.x
Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?
The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.
It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.
Let me give you an example.
Let's suppose you have 3 tables in your system:
Company
Department
Employee
Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.
Ok, so now you want to do the following:
List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.
So you do this:
SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID
Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.
Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.
If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.
The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.
And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.
On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.
So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.
Enter the new syntax, with this you can choose.
For instance, if you want all companies, as the problem description stated, this is what you would write:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID
Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.
Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'
This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.
This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).
Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.
So there you have it.
LEFT and INNER JOIN is the wave of the future.
The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.
By the way, you can do an outer join with the first syntax too:
WHERE a.x = b.x(+)
Or
WHERE a.x *= b.x
Or
WHERE a.x = b.x or a.x not in (select x from b)
Basically, when your FROM clause lists tables like so:
SELECT * FROM
tableA, tableB, tableC
the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.
DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like
tableA.id = tableB.a_id
AND tableB.id = tableC.b_id
which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.
The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .
It took many years before database companies adopted the SQL-92 standard.
So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.
The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.
Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.
Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.
I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.
In really complex SELECT statements it becomes much easier for a reader to understand what is going on.
The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.
The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.
Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.
When you need an outer join the second syntax is not always required:
Oracle:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x(+)
MSSQLServer (although it's been deprecated in 2000 version)/Sybase:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x *= b.x
But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.
I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.
To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.
Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.
If both are inner joins there is no difference in the semantics or the execution of the SQL or performance. Both are ANSI Standard SQL It is purely a matter of preference, of coding standards within your work group.
Over the last 25 years, I've developed the habit that if I have a fairly complicated SQL I will use the INNER JOIN syntax because it is easier for the reader to pick out the structure of the query at a glance. It also gives more clarity by singling out the join conditions from the residual conditions, which can save time (and mistakes) if you ever come back to your query months later.
However for outer joins, for the purpose of clarity I would not under any circumstances use non-ansi extensions.

Natural Join Explanation

I am a student trying to learn Microsoft SQL and I am frustrated that I cannot understand how natural join works. I have a problem with a solution but I cannot understand how this solution was created through natural join. My friend sent my his old solution but cant remember how he got the result. I really want to figure out how this natural join works. Can someone please explain how this answer was achieved?
The question taken from "Database Design, Application, and Administration 6th Ed.:
Show the result of a natural join that combines the Customer and OrderTbl tables.
Edit: The book doesn't give a query statement on how to get the results for the natural join in the chapters I have read so far. Which makes things much more confusing for me as I cannot simply add the table and send a query.
Edit2: The reason why some entries in the order table have null values is because the question is implying that some orders were processed through the internet. A friend of mine told me that was the reason why he got the question right opposed to some people who argued against it. :S
If I read your question correctly, you are trying to learn two concepts at the same time. The first is INNER JOIN (sometimes called equijoin). The second is natural join.
It's easier to explain natural join, assuming you already know how INNER JOIN works. A natural join is simply an equijoin where the column names indicate to you what the join condition ought to be. In your case, the fact that CustNo appears in both tables is the only clue you need in order to devise the correct join condition. You also include the join field only once in the result.
Column names are actually quite arbitrary, and could have been made very different in this case. for example, if the column Customer.CustNo had been named Customer.ID instead, you wouldn't be able to do a natural join.
for a correct solution in your case, see the answer provided by JamieC.
If you simply want the query which will result in the final table in your question here it is,
SELECT
o.OrdNo,
o.OrdDate,
o.EmpNo,
o.CustNo,
c.CustFirstName
c.CustLastName,
c.CustCity,
c.CustSatate,
c.CustZip,
c.CustBal
FROM OrderTbl o
INNER JOIN Customer c
ON o.CustNo = c.CustNo
ORDER BY c.CustNo
So, by way of explanation; this query selects all data from Customer and OrderTbl joining the two using CustNo which is the primary key (presumably) in Customer and a foreign key in OrderTbl. The ordering of the result is a little more tricky, and based almost purely on guesswork, I suspect the result is ordered by CustNo as well.
The Employee table does not feature at all in the result, however as the OrderTbl table has some blanks for EmpNo, you would almost certainly want a LEFT JOIN/RIGHT JOIN (as appropriate) if you wanted to retrieve any information about the employee from the orders table.
MS SQL Server doesn't support NATURAL JOIN. However, if you were using a platform that would support it, a simple:
SELECT * FROM Customer NATURAL JOIN OrderTbl;
should do the trick.
https://en.wikipedia.org/wiki/Join_(SQL)#Natural_join is quite good.

SQL query help (ambiguous column error) using select for multiple table queries

I am having trouble with an 'Ambiguous column name' will really appreciate your help with this as I am stuck on this and don't understand. I am learning SQL
Basically I am trying to generate a student detail who fail the exam using multiple tables.
select
examDate, examLevel, examSubject, examResult,
studentID, studentFirstName
from
completedExams, exams, student
where
(completedExams.examNo = exams.examNo)
and (completedExams.examResult = 'Fail')
Error:
Msg 209, Level 16, State 1, Line 2
Ambiguous column name 'studentID'.
First off your method for joining is hideously outdated. People were urged to stop doing this when ANSI came out with new standards back in 92. Your query should start off looking like this.
select
examDate,
examLevel,
examSubject,
examResult,
studentID,
studentFirstName
from completedExams
inner join exams
on completedExams.examNo = exams.examNo
and completedExams.examResult = 'Fail'
inner join student
on student.studentID = CompletedExams.StudentID
Next lets talk about aliases. It is incredibly helpful to give your table an alias so you are not constantly typing completedexams. over and over. All you do is simply place the new name after the table name and you're golden.
select
examDate,
examLevel,
examSubject,
examResult,
studentID,
studentFirstName
from completedExams as c
inner join exams as e
on c.examNo = e.examNo
and c.examResult = 'Fail'
inner join student as s
on c.studentID = s.StudentID
Now if the column exists in both databases you're going to have to apply that alias before the column that way the database engine knows which column you actually want. I don't know exactly which go where because I'm not connected to your database however here's an example anyway.
select
c.examDate,
c.examLevel,
e.examSubject,
e.examResult,
s.studentID,
s.studentFirstName
from completedExams as c
inner join exams as e
on c.examNo = e.examNo
and c.examResult = 'Fail'
inner join student as s
on s.studentID = c.StudentID
Well I hope that helps. This has been introduction to sql or sql 101 with Zane.
your completedExams probably has a studentID as well.
so when you join both student and completedExam you will get two studentIDs in the join table.
You could try alliassing your tables to select the propper id
select student.studentId, ... from ...
As suggested in the comments as well, this is not a good way to join your tables. You should consider specifying the join column for the three tables
select student.studentId,... from completedExams, exams, student
where student.studentId=completedExams.studentId
and completedExams.examId=exams.id
I would suggest you explicitly define the type of join relationships you are establishing between tables. This will ensure you have the best performance possible from the queries and help ensure the database engine uses the best execution plan possible. As it is written the performance of your query will suffer especially if you have a large number of records. While it is true that the database engine may be able to correctly interpret the join based on the where clause this can lead to scale problems later down the road.
Also, the use of table aliases will go a long way to avoid the ambiguous column problem.
Here are some reference materials....
http://technet.microsoft.com/en-us/library/ms187455(v=SQL.105).aspx
http://technet.microsoft.com/en-us/library/ms191517(v=SQL.105).aspx