I'm trying to show the Student's First Name, Last Name, SID, Major, and Minor. As well as the Courses, CID's and Meeting Times for each student. The Bridge table is confusing me though. Idk whether to do a Union or Multi-Table Join or what. Please Help!

Since this is homework, instead of answering your question directly I will provide you some tips.
You should really consider reviewing UNION and JOINs.
A UNION is used when you multiple queries that you want returned in the same dataset. They have to have similar datatypes in each field and the same number of fields in each query. So for this particular question, does it make sense to use a UNION? Probably not, since you want the student and their course in a single row.
For this type of query you want a JOIN. There is a really helpful visual explanation of JOINs that is available online. I suggest keep it handy to help you while you are studying SQL.
Since you need the Student data and their courses info, the only way you can get the two is by using the tblRegistration. If you use this table, in your query you should get the results you are looking for.
Since a commenter has pointed out that the second drawing is a UNION not a JOIN, which I disagree on here is an example to prove it.
A UNION as stated above combines two queries into a single result set. Here is a sqlfiddle with a UNION demo:
A FULL OUTER JOIN Specifies that a row from either the left or right table that does not meet the join condition is included in the result set, and output columns that correspond to the other table are set to NULL. This is in addition to all rows typically returned by the INNER JOIN. Here is a sqlfiddle with a FULL OUTER JOIN demo:
As you can see the products from both queries are not the same so a FULL OUTER JOIN in the diagram is not a UNION.

FROM tblStudent AS s
LEFT OUTER JOIN tblRegistration AS r
ON s.SID = r.SID
ON c.CID = r.CID

Oracle Syntax:
SELECT stu.FirstName,
tblStudent as stu,
tblRegistration as reg,
tblCourse as cou
stu.SID(+) = reg.SID
and reg.CID = cou.CID


SQL Question: Does the order of the WHERE/INNER JOIN clause when interlinking table matter?

Exam Question (AQA A-level Computer Science):
[Primary keys shown by asterisks]
Athlete(*AthleteID*, Surname, Forename, DateOfBirth, Gender, TeamName)
EventType(*EventTypeID*, Gender, Distance, AgeGroup)
Fixture(*FixtureID*, FixtureDate, LocationName)
EventAtFixture(*FixtureID*, *EventTypeID*)
EventEntry(*FixtureID*, *EventTypeID*, *AthleteID*)
A list is to be produced of the names of all athletes who are competing in the fixture
that is taking place on 17/09/18. The list must include the Surname, Forename and
DateOfBirth of these athletes and no other details. The list should be presented in
alphabetical order by Surname.
Write an SQL query to produce the list.
I understand that you could do this two ways, one using a WHERE clause and the other using the INNER JOIN clause. However, I am wondering if the order matters when linking the tables.
First exemplar solution:
SELECT Surname, Forename, DateOfBirth
FROM Athlete, EventEntry, Fixture
WHERE FixtureDate = "17/09/2018"
AND Athlete.AthleteID = EventEntry.AthleteID
AND EventEntry.FixtureID = Fixture.FixtureID
ORDER BY Surname
Here is the first exemplar solution, would it still be correct if I was to switch the order of the tables in the WHERE clause, for example:
WHERE FixtureDate = "17/09/2018"
AND EventEntry.AthleteID = Athlete.AthleteID
AND Fixture.FixtureID = EventEntry.FixtureID
I have the same question for the INNER JOIN clause to, here is the second exemplar solution:
SELECT Surname, Forename, DateOfBirth
FROM Athlete
INNER JOIN EventEntry ON Athlete.AthleteID = EventEntry.AthleteID
INNER JOIN Fixture ON EventEntry.FixtureID = Fixture.FixtureID
WHERE FixtureDate = "17/09/2018"
ORDER BY Surname
Again, would it be correct if I used this order instead:
INNER JOIN EventEntry ON Fixture.FixtureID = EventEntry.FixtureID
If the order does matter, could somebody explain to me why it is in the order shown in the examples?
Some advice:
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Use table aliases that are abbreviations for the table names.
Use standard date formats!
Qualify all column names.
Then, the order of the comparisons doesn't matter for equality. I would recommend using a canonical ordering.
So, the query should look more like:
SELECT a.Surname, a.Forename, a.DateOfBirth
EventEntry ee
ON a.AthleteID = ee.AthleteID INNER JOIN
Fixture f
ON ee.FixtureID = f.FixtureID
WHERE a.FixtureDate = '2018-09-17'
ORDER BY a.Surname;
I am guessing that all the columns in the SELECT come from Athlete. If that is not true, then adjust the table aliases.
There are lots of stylistic conventions for SQL and #gordonlinoff's answer mentions some of the perennial ones.
There are a few answers to your question.
The most important is that (notionally) SQL is a declarative language - you tell it what you want it to do, not how to do it. In a procedural language (like C, or Java, or PHP), the order of execution really matters - the sequence of instructions is part of the procedure. In a declarative language, the order doesn't really matter.
This wasn't always totally true - older query optimizers seemed to like the more selective where clauses earlier on in the statement for performance reasons. I haven't seen that for a couple of decades now, so assume that's not really a thing.
Because order doesn't matter, but correctly understanding the intent of a query does, many SQL developers emphasize readability. That's why we like explicit join syntax, and meaningful aliases. And for readability, the sequence of instructions can help. I favour starting with the "most important" table, usually the one from which you're selecting most columns, and then follow a logical chain of joins from one table to the next. This makes it easier to follow the logic.
When you use inner joins order does not matter as long as the prerequisite table is above/before. At your example both joins start from table Athlete so order doesn't matter. If however this very query is found starting from EventEntry (for any reason), then you must join at Athlete at the first inner else you cannot join to Fixture. As recommended, it is best to use standard join syntax and preferable place all inner joins before all lefts. If you cant then you need to review because the left you need to put inside the group of inner joins will probably behave like an inner join. That is because an inner below uses the left table else you could place it below the inner block. So when it comes to null the left will be ok but the inner below will cut the record.
When however the above cases do not exist/affect order and all inner joins can be placed at any order, only performance matters. Usually table with high cardinality on top perform better while there are cases where the opposite works better. So if the order is free you may try higher to lower cardinality tables ordering or the opposite - whatever works faster.
Clarifying: As prerequisite table i call the table needed by the joined table by condition: ... join B on [whatever] join C on c.id=b.cid - here table B is prerequisite for table C.
I mention left joins because while the question is about inner order, when joins are mixed (inners and lefts)then order of joins alone is important (to be all above) as may affect query logic:
... join B on [whatever] left join C on c.id=b.cid join D on D.id = C.did
At the above example the left join sneaks into the inner joins order. We cannot order it after D because it is prerequisite for D. For records however where condition c.id=b.cid is not true the entire B table row turns null and then the entire result row (B+C+D) turns off the results because of D.id = C.did condition of the following inner join. This example needs review as the purpose of left join evaporates by the following (next on order) inner join. Concluding, the order of inner joins when mixed with lefts is better to be on top without any left joins interfering.

Using COUNT (DISTINCT..) when also using INNER JOIN to join 3 tables but Postgres keeps erroring

I need to use INNER JOINs to get a series of information and then I need to COUNT this info. I need to be able to "View all courses and the instructor taking them, the capacity of the course, and the number of members currently booked on the course."
To get all the info I have done the following query:
C.coursename, Instructors.fname, Instructors.lname,C.maxNo, membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
but no matter where I put the COUNT it always tells me that whatever is outside the COUNT should be in the GROUP BY
I have worked out how to COUNT/GROUP BY the necessary info e.g.:
FROM Membercourse AS MC
but I don't know how to combine the two!
I think what you're looking for is a subquery. I'm a SQL-Server guy (not postgresql) but the concept looks to be almost identical after some crash-course postgresql googling.
Anyway, basically, when you write a SELECT statement, you can use a subquery instead of an actual table. So your SQL would look something like:
select count(*)
select stuff from table
inner join someOtherTable
... hopefully that makes sense. Instead of trying to write one big query where you're doing both the inner join and count, you're writing two: an inner one that gets your inner-join'ed data, and then an outer one to actually count the rows.
EDIT: To help explain a bit more on the thought process behind subqueries.
Subqueries are a way of logically breaking down the steps/processes on the data. Instead of trying to do everything in one big step, you do it in steps.
In this case, what's step one? It's to get a combined data source for your combined, inner-join'ed data.
Step 1: Write the Inner Join query
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
Okay, now, what next?
Well, let's say we want to get a count of how many entries there are for each 'memno' in that result above.
Instead of trying to figure out how to modify that query above, we instead use it as a data source, like it was a table itself.
Step 2 - Make it A Subquery
select * from
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID
) mySubQuery
Step 3 - Modify your outer query to get the data you want.
Well, we wanted to group by 'memno', and get the count, right? So...
select memno, count(*)
-- all that same subquery stuff
) mySubQuery
group by memno
... make sense? Once you've got your subquery written out, you don't need to worry about it any more - you just treat it like a table you're working with.
This is actually incredibly important, and makes it much easier to read more intricate queries - especially since you can name your subqueries in a way that explains what the subquery represents data-wise.
There are many ways to solve this, such using Window Functions and so on. But you can also achieve it using a simple subquery:
C.courseID = Membercourse.courseID) AS members
Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo;

Using "From Multiple Tables" or "Join" performance difference [duplicate]

Most SQL dialects accept both the following queries:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x
SELECT a.foo, b.foo
LEFT JOIN b ON a.x = b.x
Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?
The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.
It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.
Let me give you an example.
Let's suppose you have 3 tables in your system:
Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.
Ok, so now you want to do the following:
List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.
So you do this:
SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID
Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.
Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.
If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.
The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.
And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.
On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.
So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.
Enter the new syntax, with this you can choose.
For instance, if you want all companies, as the problem description stated, this is what you would write:
FROM Company
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID
Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.
Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:
FROM Company
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'
This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.
This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).
Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.
So there you have it.
LEFT and INNER JOIN is the wave of the future.
The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.
By the way, you can do an outer join with the first syntax too:
WHERE a.x = b.x(+)
WHERE a.x *= b.x
WHERE a.x = b.x or a.x not in (select x from b)
Basically, when your FROM clause lists tables like so:
tableA, tableB, tableC
the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.
DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like
tableA.id = tableB.a_id
AND tableB.id = tableC.b_id
which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.
The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .
It took many years before database companies adopted the SQL-92 standard.
So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.
The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.
Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.
Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.
I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.
In really complex SELECT statements it becomes much easier for a reader to understand what is going on.
The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.
The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.
Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.
When you need an outer join the second syntax is not always required:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x(+)
MSSQLServer (although it's been deprecated in 2000 version)/Sybase:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x *= b.x
But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.
I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.
To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.
Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.
If both are inner joins there is no difference in the semantics or the execution of the SQL or performance. Both are ANSI Standard SQL It is purely a matter of preference, of coding standards within your work group.
Over the last 25 years, I've developed the habit that if I have a fairly complicated SQL I will use the INNER JOIN syntax because it is easier for the reader to pick out the structure of the query at a glance. It also gives more clarity by singling out the join conditions from the residual conditions, which can save time (and mistakes) if you ever come back to your query months later.
However for outer joins, for the purpose of clarity I would not under any circumstances use non-ansi extensions.

Converting SQL query to Hive query

I am having some trouble converting my SQL queries into Hive queries.
Relational schema:
Suppliers(sid, sname, address)
Parts(pid, pname, color)
Catalog(sid, pid, cost)
Query 1: Find the pnames of parts for which there is some supplier.
I have attempted one of the query conversions for query 1 and I think it is correct If someone can let me know if it is correct or incorrect I would really appreciate it. They seem to be the same to me based on the info I have looked up for Hive.
Query 1: SQL
SELECT pname
FROM Parts, Catalog
WHERE Parts.pid = Catalog.pid
Query 1: Converted to Hive
SELECT pname
FROM Parts, Catalog
WHERE Parts.pid = Catalog.pid;
Query 2: Find the sids of suppliers who supply only red parts.
For the second query I am having trouble. Mainly I am having trouble with the "not exists" part and the defining what color we want part. Can someone help me figure this out? I need to put the SQL into a Hive query.
Query 2: SQL
FROM Catalog C
FROM Parts P
WHERE P.pid = C.pid AND P.color <> ‘Red’)
If someone can help me get these into the correct Hive format I would really appreciate it.
Thank you.
Although I have never used HiveQL in looking up some of its documentation it appears to support outer joins written in plain sql. In that case this should work: (an outer join where there is no match)
select distinct c.id
from catalog c
left outer join parts p
on (c.pid = p.pid
and p.color <> 'Red')
where p.pid is null
Edit -- enclosed on clause in () , this is not normally required of any major databases but seems to be needed in hiveql -- (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins).
Regarding your first query I don't think it wouldn't work in hive based on example queries in those docs however as another commenter mentioned it is best practice to use explicit joins via the join clause rather than implicit joins in the where clause. Think of the where clause as where you filter based on various conditions (except for join conditions), and the join clause as where you put all join conditions. It helps organize your query's logic. In addition, you can only imply inner joins (in the where clause). The join clause is needed any time you want to work with outer joins (as in the case of your second query, above).
This is the same as your first query but with explicit join syntax:
select pname
from parts p
join catalog c
on (p.pid = c.pid)

Translating Oracle SQL to Access Jet SQL, Left Join

There must be something I'm missing here. I have this nice, pretty Oracle SQL statement in Toad that gives me back a list of all active personnel with the IDs that I want:
This works very nicely. Where I run into problems is when I put it into Access. I know, I know, Access Sucks. Sometimes one needs to use it, especially if one has multiple database types that they just want to store a few queries in, and especially if one's boss only knows Access. Anyway, I was having trouble with the ANDs inside the FROM, so I moved those to the WHERE, but for some odd reason, Access isn't doing the LEFT JOINs, returning only those personnel with EID, IDTWO, and LICENSENO's. Not everybody has all three of these.
Best shot in Access so far is:
I think that part of the problem could be that I'm using the same alias (lookup) table for all three joins. Maybe there's a more efficient way of doing this? Still new to SQL land, so any tips as far as that goes would be great. I feel like these should be equivalent, but the Toad query gives me back many many tens of thousands of imperfect rows, and Access gives me fewer than 500. I need to find everybody so that nobody is left out. It's almost as if the LEFT JOINs aren't working at all in Access.
To understand what you are doing, let's look at simplified version of your query:
If the LEFT JOIN finds match, your row might look like this:
Person_ID EID
12345 JDB
If it doesn't find a match, (disregard the WHERE clause for a second), it could look like:
Person_ID EID
12345 NULL
When you add the WHERE clauses above, you are telling it to only find records in the PERSONNEL_ALIAS table that meet the condition, but if no records are found, then the values are considered NULL, so they will never satisfy the WHERE condition and no records will come back...
As Joe Stefanelli said in his comment, adding a WHERE clause to a LEFT JOIN'ed table make it act as an INNER JOIN instead...
Further to #Sparky's answer, to get the equivalent of what you're doing in Oracle, you need to filter rows from the tables on the "outer" side of the joins before you join them. One way to do this might be:
For each table on the "outer" side of a join that you need to filter rows from (that is, the three instances of PERSONNEL_ALIAS), create a query that filters the rows you want. For example, the first query (say, named PA_EID) might look something like this:SELECT PERSONNEL_ALIAS.* FROM PERSONNEL_ALIAS WHERE PERSONNEL_ALIAS.PERSONNEL_ALIAS_TYPE_CD = 1086 AND PERSONNEL_ALIAS.ALIAS_POOL_CD = 3796547
In your "best shot in Access so far" query in the original post: a) replace each instance of PERSONNEL_ALIAS with the corresponding query created in Step 1, and, b) remove the corresponding conditions (on PA_EID, PA_IDTWO, and PA_LIC) from the WHERE clause.