Contradiction Between Multiple Left Joins - sql

I am trying to understand the following query which is automatically produced by some software library:
SELECT DISTINCT `t`.* FROM `teacher` AS `t`
LEFT JOIN `rel` AS `rel_profile`
ON `rel_profile`.`field_id` = 2319 AND `rel_profile`.`item_id` = `t`.`id`
LEFT JOIN `teacher_info` AS `profile`
ON `profile`.`id` = `rel_profile`.`related_item_id`
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320 AND `rel_profile_city`.`item_id` = `profile`.`id` WHERE `rel_profile_city`.`item_id` = 1
There are three left joins. I understand the first and second one. What I don't understand is the third left join:
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320 AND `rel_profile_city`.`item_id` = `profile`.`id` WHERE `rel_profile_city`.`item_id` = 1
The table rel has already been used in the first left join:
LEFT JOIN `rel` AS `rel_profile`
ON `rel_profile`.`field_id` = 2319
Now, the same table is left joined again but this time the value of the joined field is different:
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320
How come that these two joins do not contradict?

The query is using aliases:
`rel` AS `rel_profile`
Says to pretend that the table rel is actually a table called rel_profile. That alias is then used throughout the rest of the query. I'm not sure of MySQL, but on some other database systems, it's an error to refer to the table as rel from then onwards(*) (unless there's another join that re-introduces the table and doesn't provide an alias).
And joining to the same table multiple times is allowed - provided that the names (or aliases) are unique. This is useful when you're trying to construct a result that relies on the content of multiple rows from the same table, where the result should occupy a single row.
(*) "Then onwards" being in the order in which the clauses are processed, not the text order. E.g. you should use the alias in the SELECT clause because, even though it occurs earlier textually, it's (conceptually) processed after the FROM clause.

This query will show teacher rows that have associated rows in rel with field_id = 2319 OR field_id = 2320

The are not "contradicting" each other. Imagine you have a table of users, wich have the demographic and personal data of your users. And another table with the "relation" between users. So, in this "relations" table, you have columns UserId1 and UserId2. If you want a query that returns the data of those two users, you'll need to do two JOINS with the table Users, once per each User column. This doesn't mean that they are contradicting each other.

Related

Semi-join vs Subqueries

What is the difference between semi-joins and a subquery? I am currently taking a course on this on DataCamp and i'm having a hard time making a distinction between the two.
Thanks in advance.
A join or a semi join is required whenever you want to combine two or more entities records based on some common conditional attributes.
Unlike, Subquery is required whenever you want to have a lookup or a reference on same table or other tables
In short, when your requirement is to get additional reference columns added to existing tables attributes then go for join else when you want to have a lookup on records from the same table or other tables but keeping the same existing columns as o/p go for subquery
Also, In case of semi join it can act/used as a subquery because most of the times we dont actually join the right table instead we mantain a check via subquery to limit records in the existing hence semijoin but just that it isnt a subquery by itself
I don't really think of a subquery and a semi-join as anything similar. A subquery is nothing more interesting than a query that is used inside another query:
select * -- this is often called the "outer" query
from (
select columnA -- this is the subquery inside the parentheses
from mytable
where columnB = 'Y'
)
A semi-join is a concept based on join. Of course, joining tables will combine both tables and return the combined rows based on the join criteria. From there you select the columns you want from either table based on further where criteria (and of course whatever else you want to do). The concept of a semi-join is when you want to return rows from the first table only, but you need the 2nd table to decide which rows to return. Example: you want to return the people in a class:
select p.FirstName, p.LastName, p.DOB
from people p
inner join classes c on c.pID = p.pID
where c.ClassName = 'SQL 101'
group by p.pID
This accomplishes the concept of a semi-join. We are only returning columns from the first table (people). The use of the group by is necessary for the concept of a semi-join because a true join can return duplicate rows from the first table (depending on the join criteria). The above example is not often referred to as a semi-join, and is not the most typical way to accomplish it. The following query is a more common method of accomplishing a semi-join:
select FirstName, LastName, DOB
from people
where pID in (select pID
from class
where ClassName = 'SQL 101'
)
There is no formal join here. But we're using the 2nd table to determine which rows from the first table to return. It's a lot like saying if we did join the 2nd table to the first table, what rows from the first table would match?
For performance, exists is typically preferred:
select FirstName, LastName, DOB
from people p
where exists (select pID
from class c
where c.pID = p.pID
and c.ClassName = 'SQL 101'
)
In my opinion, this is the most direct way to understand the semi-join. There is still no formal join, but you can see the idea of a join hinted at by the usage of directly matching the first table's pID column to the 2nd table's pID column.
Final note. The last 2 queries above each use a subquery to accomplish the concept of a semi-join.

Access SQL subquery access field from parent

I have a SQL query that works in Access 2016:
SELECT
Count(*) AS total_tests,
Sum(IIf(score>=securing_threshold And score<mastering_threshold,1,0)) AS total_securing,
Sum(IIf(score>=mastering_threshold,1,0)) AS total_mastering,
total_securing/Count(*) AS percent_securing,
total_mastering/Count(*) AS percent_mastering,
(Count(*)-total_securing-total_mastering)/Count(*) AS percent_below,
subjects.subject,
students.year_entered,
IIf(Month(Date())<9,Year(Date())-students.year_entered+6,Year(Date())-students.year_entered+7) AS current_form,
groups.group
FROM
((subjects
INNER JOIN tests ON subjects.ID = tests.subject)
INNER JOIN (students
INNER JOIN test_results ON students.ID = test_results.student) ON tests.ID = test_results.test)
LEFT JOIN
(SELECT * FROM group_membership LEFT JOIN groups ON group_membership.group = groups.ID) As g
ON students.ID = g.student
GROUP BY subjects.subject, students.year_entered, groups.group;
However, I wish to filter out irrelevant groups before joining them to my table. The table groups has a column subject which is a foreign key.
When I try changing ON students.ID = g.student to ON students.ID = g.student And subjects.ID = g.subject I get the error 'JOIN expression not supported'.
Alternatively, when I try adding WHERE subjects.ID = groups.subject to the subquery, it asks me for the parameter value of subjects.ID, although it is a column in the parent query.
Googling reveals many similar errors but they were all resolved by changing the brackets. That didn't help.
Just in case the table relationships help:
Thank you.
EDIT: Sample database at https://www.dropbox.com/s/yh80oooem6gsni7/student%20tracker.ACCDB?dl=0
MS Access queries with many joins are difficult to update by SQL alone as parenetheses pairings are required unlike other RDBMS's and these pairings must follow an order. Moreover, some pairings can even be nested. Hence, for beginners it is advised to build queries with complex and many joins using the query design GUI in the MS Office application and let it build out the SQL.
For a simple filter on the g derived table, you could filter subject on the derived table, g, but likely you want want all subjects:
...
(SELECT * FROM group_membership
LEFT JOIN groups ON group_membership.group = groups.ID
WHERE groups.subject='Earth Science') As g
...
So for all subjects, consider re-building query from scratch in GUI that nearly mirrors your table relationships which actually auto-links joins in the GUI. Then, drop unneeded tables.
Usually you want to begin with the join table or set like groups and group_membership or tests and test_results. In fact, consider saving the g derived table as its own query.
Then add the distinct record primary source tables like students and subjects.
You may even need to play around with order in FROM and JOIN clauses to attain desired results, and maybe even add the same table in query. And be careful with adding join tables like group_membership (two one-to-many links), to GROUP BY queries as it leads to the duplicate record aggregation. So you may need to join aggregates queries by subject.
Unless you can post content of all tables, from our perspective it is difficult to help from here.
Your subquery g uses a LEFT JOIN, but there is a enforced 1:n relation between the two tables, so there will always be a matching group. Use a INNER JOIN instead.
With g.subject you are trying to join on a column that is on the right side of a left join, that cannot really work.
Also you shouldn't use SELECT * on a join of tables with identical column names. Include only the qualified column names that you need.
LEFT JOIN
(SELECT group_membership.student, groups.group, groups.subject
FROM group_membership INNER JOIN groups
ON group_membership.group = groups.ID) As g
ON (students.ID = g.student AND subjects.ID = g.subject)
I would call the columns in group_membership group_ID and student_ID to avoid confusion.
I don't have the database to test, but I would use subject table as subquery:
(SELECT * FROM subject WHERE filter out what you don't need) Subj
Then INNER JOIN this new Subj Table in your query which would exclude irrelevant groups.
Also I would never create join in WHERE clause (WHERE subjects.ID = groups.subject), what this does it creates cartesian product (table with all the possible combinations of subjects.ID and groups.subject) then it filters out records to satisfy your join. When dealing with huge data it might take forever or crash.
Error related to "Join expression may not be supported"; do datatypes match in those fields?
I solved it by (a lot of trial and error and) taking the advice here to make queries in the GUI and joining them. The end result is 4 queries deep! If the database was bigger, performance would be awful... but now its working I can tweak it eventually.
Thank you everybody!

SQL Server View tables not joining

I am incredibly new to SQL and am trying to create a view for a pizza store database. The sides ordered table and the sides names table have to be separate but need a view that combines them.
This is the code I have entered,
CREATE VIEW ordered_sides_view
AS
SELECT
ordered_side_id, side.side_id, side_name, number_ordered,
SUM(number_ordered * price) AS 'total_cost'
FROM
ordered_side
FULL JOIN
side ON ordered_side.side_id = side.side_id
GROUP BY
ordered_side_id, side.side_id, side_name, number_ordered;
The problem is that this is the resulting table.
Screenshot of view table:
How do I get the names to match the ordered sides?
You fail to understand what a FULL JOIN and an INNER JOIN operation does.
FULL JOIN returns at least every row from each table (plus any extra values from the ON clause).
INNER JOIN returns only matching row sets based on the ON clause.
OUTER JOIN returns every matching row set PLUS the side of the join that the OUTER JOIN is on (LEFT OUTER JOIN vs RIGHT OUTER JOIN).
In your picture, you can clearly see that there are no rows that match from the tables ordered_side and side...
That is why switching to an INNER JOIN returns zero rows...there are no matches on the COLUMNS YOU CHOSE TO USE.
Why in your SELECT operator do you have this:
SELECT ordered_side_id, side.side_id, side_name, number_ordered,
while your ON clause has this:
side ON ordered_side.side_id = side.side_id
ordered_side_id !=ordered_side.side_id
Investigate your columns and fix your JOIN clause to match the correct columns.
P.S. I like how you structure your queries. Very nice and what an
expert does! It makes reading MUCH, MUCH easier. :)
One suggestion I might add is structure your columns in the SELECT statement in its own row:
SELECT ordered_side_id
, side.side_id
, side_name
, number_ordered
, SUM(number_ordered * price) AS Total_Cost --or written [Total_Cost]/'Total_Cost'
FROM ordered_side
FULL JOIN side ON ordered_side.ordered_side_id = side.side_id
GROUP BY ordered_side_id
, side.side_id
, side_name
, number_ordered;

3 Table SQL JOIN not returning data

The following query will not return any data. Cannot figure out why. Removing one of the joins works but then I can't get data from one table.
$productInfo = "SELECT stock.*, s_list.*, c_list.*
FROM stock
INNER JOIN s_list
ON stock.s_compo_id = s_list.id
INNER JOIN c_list
ON stock.c_compo_id = c_list.id
WHERE batch_id = '$productID'";
This is your query:
SELECT stock.*, s_list.*, c_list.*
FROM stock INNER JOIN
s_list
ON stock.s_compo_id = s_list.id INNER JOIN
c_list
ON stock.c_compo_id = c_list.id
WHERE batch_id = '$productID'";
Here are some reasons that I readily think of that you would get no data:
batch_id and $productID don't match. The names are different, so why should I think they refer to the same thing?
Either s_list or c_list (or both) have no matching records. You're doing inner joins, so no matching records would mean no rows are returned.
You are getting rows, but the columns have the same names in the two tables. For instance, you will likely see one id column in your output, and it is not clear which table it comes from. Explicitly list the columns you want and give them unique aliases.
And, less likely because the naming looks right:
The join conditions for one or both joins are not correct, so nothing matches.
And, an obligatory note that you should not be putting variables directly into query strings. Use parameters. Not only is it safer, but it gives the engine that opportunity to cache the query plan, saving effort when called multiple times.
Try with the below query.
$productInfo = "SELECT stock.*, s_list.*, c_list.*
FROM stock
LEFT JOIN s_list
ON stock.s_compo_id = s_list.id
LEFT JOIN c_list
ON stock.c_compo_id = c_list.id
WHERE batch_id = '$productID'";

Translating Oracle SQL to Access Jet SQL, Left Join

There must be something I'm missing here. I have this nice, pretty Oracle SQL statement in Toad that gives me back a list of all active personnel with the IDs that I want:
SELECT PERSONNEL.PERSON_ID,
PERSONNEL.NAME_LAST_KEY,
PERSONNEL.NAME_FIRST_KEY,
PA_EID.ALIAS EID,
PA_IDTWO.ALIAS IDTWO,
PA_LIC.ALIAS LICENSENO
FROM PERSONNEL
LEFT JOIN PERSONNEL_ALIAS PA_EID
ON PERSONNEL.PERSON_ID = PA_EID.PERSON_ID
AND PA_EID.PERSONNEL_ALIAS_TYPE_CD = 1086
AND PA_EID.ALIAS_POOL_CD = 3796547
AND PERSONNEL.ACTIVE_IND = 1
LEFT JOIN PERSONNEL_ALIAS PA_IDTWO
ON PERSONNEL.PERSON_ID = PA_IDTWO.PERSON_ID
AND PA_IDTWO.PERSONNEL_ALIAS_TYPE_CD = 3839085
AND PA_IDTWO.ACTIVE_IND = 1
LEFT JOIN PERSONNEL_ALIAS PA_LIC
ON PERSONNEL.PERSON_ID = PA_LIC.PERSON_ID
AND PA_LIC.PERSONNEL_ALIAS_TYPE_CD = 1087
AND PA_LIC.ALIAS_POOL_CD = 683988
AND PA_LIC.ACTIVE_IND = 1
WHERE PERSONNEL.ACTIVE_IND = 1 AND PERSONNEL.PHYSICIAN_IND = 1;
This works very nicely. Where I run into problems is when I put it into Access. I know, I know, Access Sucks. Sometimes one needs to use it, especially if one has multiple database types that they just want to store a few queries in, and especially if one's boss only knows Access. Anyway, I was having trouble with the ANDs inside the FROM, so I moved those to the WHERE, but for some odd reason, Access isn't doing the LEFT JOINs, returning only those personnel with EID, IDTWO, and LICENSENO's. Not everybody has all three of these.
Best shot in Access so far is:
SELECT PERSONNEL.PERSON_ID,
PERSONNEL.NAME_LAST_KEY,
PERSONNEL.NAME_FIRST_KEY,
PA_EID.ALIAS AS EID,
PA_IDTWO.ALIAS AS ID2,
PA_LIC.ALIAS AS LICENSENO
FROM ((PERSONNEL
LEFT JOIN PERSONNEL_ALIAS AS PA_EID ON PERSONNEL.PERSON_ID=PA_EID.PERSON_ID)
LEFT JOIN PERSONNEL_ALIAS AS PA_IDTWO ON PERSONNEL.PERSON_ID=PA_IDTWO.PERSON_ID)
LEFT JOIN PERSONNEL_ALIAS AS PA_LIC ON PERSONNEL.PERSON_ID=PA_LIC.PERSON_ID
WHERE (((PERSONNEL.ACTIVE_IND)=1)
AND ((PERSONNEL.PHYSICIAN_IND)=1)
AND ((PA_EID.PRSNL_ALIAS_TYPE_CD)=1086)
AND ((PA_EID.ALIAS_POOL_CD)=3796547)
AND ((PA_IDTWO.PRSNL_ALIAS_TYPE_CD)=3839085)
AND ((PA_IDTWO.ACTIVE_IND)=1)
AND ((PA_LIC.PRSNL_ALIAS_TYPE_CD)=1087)
AND ((PA_LIC.ALIAS_POOL_CD)=683988)
AND ((PA_LIC.ACTIVE_IND)=1));
I think that part of the problem could be that I'm using the same alias (lookup) table for all three joins. Maybe there's a more efficient way of doing this? Still new to SQL land, so any tips as far as that goes would be great. I feel like these should be equivalent, but the Toad query gives me back many many tens of thousands of imperfect rows, and Access gives me fewer than 500. I need to find everybody so that nobody is left out. It's almost as if the LEFT JOINs aren't working at all in Access.
To understand what you are doing, let's look at simplified version of your query:
SELECT PERSONNEL.PERSON_ID,
PA_EID.ALIAS AS EID
FROM PERSONNEL
LEFT JOIN PERSONNEL_ALIAS AS PA_EID ON PERSONNEL.PERSON_ID=PA_EID.PERSON_ID
WHERE PERSONNEL.ACTIVE_IND=1
AND PERSONNEL.PHYSICIAN_IND=1
AND PA_EID.PRSNL_ALIAS_TYPE_CD=1086
AND PA_EID.ALIAS_POOL_CD=3796547
If the LEFT JOIN finds match, your row might look like this:
Person_ID EID
12345 JDB
If it doesn't find a match, (disregard the WHERE clause for a second), it could look like:
Person_ID EID
12345 NULL
When you add the WHERE clauses above, you are telling it to only find records in the PERSONNEL_ALIAS table that meet the condition, but if no records are found, then the values are considered NULL, so they will never satisfy the WHERE condition and no records will come back...
As Joe Stefanelli said in his comment, adding a WHERE clause to a LEFT JOIN'ed table make it act as an INNER JOIN instead...
Further to #Sparky's answer, to get the equivalent of what you're doing in Oracle, you need to filter rows from the tables on the "outer" side of the joins before you join them. One way to do this might be:
For each table on the "outer" side of a join that you need to filter rows from (that is, the three instances of PERSONNEL_ALIAS), create a query that filters the rows you want. For example, the first query (say, named PA_EID) might look something like this:SELECT PERSONNEL_ALIAS.* FROM PERSONNEL_ALIAS WHERE PERSONNEL_ALIAS.PERSONNEL_ALIAS_TYPE_CD = 1086 AND PERSONNEL_ALIAS.ALIAS_POOL_CD = 3796547
In your "best shot in Access so far" query in the original post: a) replace each instance of PERSONNEL_ALIAS with the corresponding query created in Step 1, and, b) remove the corresponding conditions (on PA_EID, PA_IDTWO, and PA_LIC) from the WHERE clause.