Using ON clause to JOIN tables with same column name - sql

I wanted to ask about the condition of an ON clause while joining tables:
SELECT c_User.ID
FROM c_User
WHERE EXISTS (
SELECT *
FROM c_Group
JOIN c_Member ON (c_Group.Group_Name LIKE 'mcp%')
WHERE
c_Group.Name = c_Member.Parent_Name
AND c_Member.Child_Name = c_User.Lower_User_Name
)
I know that tables c_Member and c_Group have one column with the same name, Directory_ID. What I expected was c_Member and c_Group to join on that column using something like:
c_Group JOIN c_Member ON (c_Group.Directory_ID = c_Member.Directory_ID)
WHERE c_Group.Group_Name like 'mcp%'
How is this condition able to match the rows?
c_Member ON (c_Group.Group_Name LIKE 'mcp%')
Is this is a shorter way of referring to two tables joining on a column with the same name, while applying the LIKE condition?
If so, then can such a style work for a table with multiple column names that are the same?

This is your correlated subquery:
SELECT *
FROM c_Group
JOIN c_Member ON (c_Group.Group_Name LIKE 'mcp%')
WHERE
c_Group.Name = c_Member.Parent_Name
AND c_Member.Child_Name = c_User.Lower_User_Name
This subquery works, but the way it is spelled makes it quite unclear:
The join condition (c_Group.Group_Name LIKE 'mcp%') is not actually not related to the table being joined (c_Member) ; what it actually does is apply filter on table c_Group that makes it filter on (there is no magic such as shorter way of referring to two tables joining on a column with the same name, while applying the LIKE condition). It would make more sense to move it to the WHERE clause (this would still be functionaly equivalent).
On the other hand, the WHERE clause contains conditions that relate to the tables being joined (for example: c_Group.Name = c_Member.Parent_Name). A more sensible option would be to put them in the ON clause of the JOIN.
Other remarks:
when using NOT EXISTS, you usually would prefer SELECT 1 instead of SELECT *, (most RDBMS will optimize this under the hood for you, but this makes the intent clearer).
table aliases can be used to make the query more readable
I would suggest the following syntax for the query (which is basically syntaxically equivalent to the original, but a lot clearer):
SELECT u.ID
FROM c_User u
WHERE EXISTS (
SELECT 1
FROM c_Group g
JOIN c_Member m ON g.Name = m.Parent_Name AND m.Child_Name = u.Lower_User_Name
WHERE g.Group_Name LIKE 'mcp%'
)

Related

Semi-join vs Subqueries

What is the difference between semi-joins and a subquery? I am currently taking a course on this on DataCamp and i'm having a hard time making a distinction between the two.
Thanks in advance.
A join or a semi join is required whenever you want to combine two or more entities records based on some common conditional attributes.
Unlike, Subquery is required whenever you want to have a lookup or a reference on same table or other tables
In short, when your requirement is to get additional reference columns added to existing tables attributes then go for join else when you want to have a lookup on records from the same table or other tables but keeping the same existing columns as o/p go for subquery
Also, In case of semi join it can act/used as a subquery because most of the times we dont actually join the right table instead we mantain a check via subquery to limit records in the existing hence semijoin but just that it isnt a subquery by itself
I don't really think of a subquery and a semi-join as anything similar. A subquery is nothing more interesting than a query that is used inside another query:
select * -- this is often called the "outer" query
from (
select columnA -- this is the subquery inside the parentheses
from mytable
where columnB = 'Y'
)
A semi-join is a concept based on join. Of course, joining tables will combine both tables and return the combined rows based on the join criteria. From there you select the columns you want from either table based on further where criteria (and of course whatever else you want to do). The concept of a semi-join is when you want to return rows from the first table only, but you need the 2nd table to decide which rows to return. Example: you want to return the people in a class:
select p.FirstName, p.LastName, p.DOB
from people p
inner join classes c on c.pID = p.pID
where c.ClassName = 'SQL 101'
group by p.pID
This accomplishes the concept of a semi-join. We are only returning columns from the first table (people). The use of the group by is necessary for the concept of a semi-join because a true join can return duplicate rows from the first table (depending on the join criteria). The above example is not often referred to as a semi-join, and is not the most typical way to accomplish it. The following query is a more common method of accomplishing a semi-join:
select FirstName, LastName, DOB
from people
where pID in (select pID
from class
where ClassName = 'SQL 101'
)
There is no formal join here. But we're using the 2nd table to determine which rows from the first table to return. It's a lot like saying if we did join the 2nd table to the first table, what rows from the first table would match?
For performance, exists is typically preferred:
select FirstName, LastName, DOB
from people p
where exists (select pID
from class c
where c.pID = p.pID
and c.ClassName = 'SQL 101'
)
In my opinion, this is the most direct way to understand the semi-join. There is still no formal join, but you can see the idea of a join hinted at by the usage of directly matching the first table's pID column to the 2nd table's pID column.
Final note. The last 2 queries above each use a subquery to accomplish the concept of a semi-join.

Selecting ambiguous column from subquery with postgres join inside

I have the following query:
select x.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
Since id0 is in both sessions and clicked_products, I get the expected error:
column reference "id0" is ambiguous
However, to fix this problem in the past I simply needed to specify a table. In this situation, I tried:
select sessions.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
However, this results in the following error:
missing FROM-clause entry for table "sessions"
How do I return just the id0 column from the above query?
Note: I realize I can trivially solve the problem by getting rid of the subquery all together:
select sessions.id0
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0;
However, I need to do further aggregations and so do need to keep the subquery syntax.
The only way you can do that is by using aliases for the columns returned from the subquery so that the names are no longer ambiguous.
Qualifying the column with the table name does not work, because sessions is not visible at that point (only x is).
True, this way you cannot use SELECT *, but you shouldn't do that anyway. For a reason why, your query is a wonderful example:
Imagine that you have a query like yours that works, and then somebody adds a new column with the same name as a column in the other table. Then your query suddenly and mysteriously breaks.
Avoid SELECT *. It is ok for ad-hoc queries, but not in code.
select x.id from
(select sessions.id0 as id, clicked_products.* from sessions
inner join
clicked_products on
sessions.id0 = clicked_products.session_id0 ) x;
However, you have to specify other columns from the table sessions since you cannot use SELECT *
I assume:
select x.id from (select sessions.id0 id
from sessions
inner join clicked_products
on sessions.id0 = clicked_products.session_id0 ) x;
should work.
Other option is to use Common Table Expression which are more readable and easier to test.
But still need alias or selecting unique column names.
In general selecting everything with * is not a good idea -- reading all columns is waste of IO.

ORA-00904 Invalid Identifier for a query involving inner join

I have a very simple query that I am trying to execute:
select *
from submissions
inner join (
select *
from hackers
inner join challenges
on hackers.hacker_id = challenges.hacker_id
) dtable
on submissions.challenge_id = dtable.challenge_id
and submissions.hacker_id = dtable.hacker_id;
Oracle rejects it with:
ORA-00904: "DTABLE"."HACKER_ID": invalid identifier.
I have kept the alias dtable visible by keeping it outside the brackets. Why is Oracle rejecting my query?
SELECT * in your sub-query is a problem.
You have the same column name in both tables that are being joined. This means that you're trying to create an inline-view called dtable where at least two columns have the same name (in this case both tables have a hacker_id column and your use of * is essentially saying "use them both".). You can't do that.
You're going to need SELECT hackers.a, hackers.b, challenges.x, challenges.y, etc, etc in your sub-query. By being explicit in this way you can ensure that no two columns have the same name.
An alternative could be SELECT hackers.*, challenges.a AS c_a, challenges.b AS c_b, etc.
Either way you are being explicit about which fields to pick up, what their positions and names are, etc. The end result being that you can then avoid columns having the same name as other columns.
You do not need subqueries for this. Your query is not actually "simple". The simple form looks more like this:
select . . .
from submissions s join
hackers h
on s.hacker_id = h.hacker_id join
challenges c
on s.challenge_id = c.challenge_id;
Note that I removed the condition between challenge and hackers on hacker_id. That extra join condition doesn't really make sense to me (although it might make sense if you provided sample data).
As others have said: The sub-select selects two different columns hacker_id from two different tables. This confuses Oracle.
But there is no need for a sub-select
select * from
submissions
inner join challenges
on submissions.challenge_id = challenges.challenge_id
inner join hackers
on submissions.hacker_id = hackers.hacker_id;

SQL optimization (inner join or selects)

I have a dilema, i had a teacher that thought me basically that inner joins are hell (he reproved me because I missed the delivery of the final proyect by 3 mins...), now i have another that tells me that using just selects is inefficient, so I don't know what is white nor black... could someone enlighten me with their knowledge?
Joins
SELECT
NombreP AS Nombre,
Nota
FROM Lleva
INNER JOIN Estudiante ON CedEstudiante = Estudiante.Cedula
WHERE
Lleva.SiglaCurso='CI1312';
No Joins
SELECT
NombreP AS Nombre,
Nota
FROM (
SELECT
Nota,
CedEstudiante
FROM Lleva
WHERE
SiglaCurso='CI1312'
) AS Lleva, (
SELECT
NombreP,
Cedula
FROM Estudiante
) AS Estudiante
WHERE
CedEstudiante = Estudiante.Cedula;
So wich one is more efficient?
Lets re-write the code so it's easier to understand:
SELECT E.NombreP AS Nombre
,L.Nota
FROM Lleva L INNER JOIN Estudiante E ON L.CedEstudiante = E.Cedula
WHERE
L.SiglaCurso='CI1312';
A table using a subquery may look like this:
SELECT L.Nota
,(SELECT E.Nombre
FROM Estudiante E
WHERE E.Cedula = L.CedEstudiante
)
FROM Lleva L
WHERE
L.SiglaCurso='CI1312'
What you actually did in your original query was an implicit join. This is similar to inner join, without declaring the exact joining conditions. Implicit joins will attempt to join on similarly named columns between tables. Most programmers do not advise or use implicit join.
As for join versus subquery, they are applied in different situations.
They are not equivalent. Notice what I have put in bold below:
A subquery will guarantee 1 returned value or return NULL; if there are multiple values returned in the subquery you will get an error for returning more than 1 value and have to solve the problem with an aggregation perhaps (max value, top 1 value). Subqueries that return no match will return NULL without affecting the rest of the row.
The JOIN (INNER JOIN) operates differently. It can match rows and get the single value you're looking for just like a subquery. But a join can multiply returned rows if the joining conditions are not distinct (singular/non-repeating). This is why joins are usually done on Primary Keys (PK's). PK's are distinct by definition. The INNER JOIN will also remove rows if a joining condition doesn't occur between tables. This may be what your first professor was trying to explain- an INNER JOIN can work in many cases - similar to a subquery- but can also can return additional rows or remove rows from the output.

Difference visibility in subquery join and where

I had problems with a simple join:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place
AND wpw.id_worker = wo.id_worker)
)
The error I had was ORA-00904: "WO"."ID_WORKER": not valid identifier.
Then I decided to move the union of tables from join clause to the where clause:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place)
WHERE wpw.id_worker = wo.id_worker
)
And this last query works perfect.
Why is not possible to make it in the join? The table should be visible like it is in the where clause. Am I missing something?
In
FROM working_place wp
JOIN working_place_worker wpw ON ...
WHERE ...
the ON clause refers only to the two tables participating in the join, namely wp and wpw. Names from the outer query are not visible to it.
The WHERE clause (and its cousin HAVING is the means by which the outer query is correlated to the subquery. Names from the outer query are visible to it.
To make it easy to remember,
ON is about the JOIN, how two tables relate to form a row (or rows)
WHERE is about the selection criteria, the test the rows must pass
While the SQL parser will admit literals (which aren't column names) in the ON clause, it draws the line at references to columns outside the join. You could regard this as a favor that guards against errors.
In your case, the wo table is not part of the JOIN, and is rejected. It is part of the whole query, and is recognized by WHERE.