SQL optimization (inner join or selects) - sql

I have a dilema, i had a teacher that thought me basically that inner joins are hell (he reproved me because I missed the delivery of the final proyect by 3 mins...), now i have another that tells me that using just selects is inefficient, so I don't know what is white nor black... could someone enlighten me with their knowledge?
Joins
SELECT
NombreP AS Nombre,
Nota
FROM Lleva
INNER JOIN Estudiante ON CedEstudiante = Estudiante.Cedula
WHERE
Lleva.SiglaCurso='CI1312';
No Joins
SELECT
NombreP AS Nombre,
Nota
FROM (
SELECT
Nota,
CedEstudiante
FROM Lleva
WHERE
SiglaCurso='CI1312'
) AS Lleva, (
SELECT
NombreP,
Cedula
FROM Estudiante
) AS Estudiante
WHERE
CedEstudiante = Estudiante.Cedula;
So wich one is more efficient?

Lets re-write the code so it's easier to understand:
SELECT E.NombreP AS Nombre
,L.Nota
FROM Lleva L INNER JOIN Estudiante E ON L.CedEstudiante = E.Cedula
WHERE
L.SiglaCurso='CI1312';
A table using a subquery may look like this:
SELECT L.Nota
,(SELECT E.Nombre
FROM Estudiante E
WHERE E.Cedula = L.CedEstudiante
)
FROM Lleva L
WHERE
L.SiglaCurso='CI1312'
What you actually did in your original query was an implicit join. This is similar to inner join, without declaring the exact joining conditions. Implicit joins will attempt to join on similarly named columns between tables. Most programmers do not advise or use implicit join.
As for join versus subquery, they are applied in different situations.
They are not equivalent. Notice what I have put in bold below:
A subquery will guarantee 1 returned value or return NULL; if there are multiple values returned in the subquery you will get an error for returning more than 1 value and have to solve the problem with an aggregation perhaps (max value, top 1 value). Subqueries that return no match will return NULL without affecting the rest of the row.
The JOIN (INNER JOIN) operates differently. It can match rows and get the single value you're looking for just like a subquery. But a join can multiply returned rows if the joining conditions are not distinct (singular/non-repeating). This is why joins are usually done on Primary Keys (PK's). PK's are distinct by definition. The INNER JOIN will also remove rows if a joining condition doesn't occur between tables. This may be what your first professor was trying to explain- an INNER JOIN can work in many cases - similar to a subquery- but can also can return additional rows or remove rows from the output.

Related

where clause conditions in SQL

Below is a join based on where clause:
SELECT a.* FROM TEST_TABLE1 a,TEST_TABLE2 b,TEST_TABLE3 c
WHERE a.COL11 = b.COL11
AND b.COL12 = c.COL12
AND a.COL3 = c.COL13;
I have been learning SQL from online resources and trying to convert it with joins
Two issues:
The original query is confusing. The outer joins (with the (+) suffix) are made irrelevant by the last where condition. Because of that condition, the query should only return records where there is an actual matching c record. So the original query is the same as if there were no such (+) suffixes.
Your query joins TEST_TABLE3 twice, while the first query only joins it once, and there are two conditions that determine how it is joined there. You should not split those conditions over two separate joins.
BTW, it is surprising that the SQL Fiddle site does not show an error, as it makes no sense to use the same alias twice. See for example how MySQL returns the error with the same query on dbfiddle (on all available versions of MySQL):
Not unique table/alias: 'C'
So to get the same result using the standard join notation, all joins should be inner joins:
SELECT *
FROM TEST_TABLE1 A
INNER JOIN TEST_TABLE2 B
ON A.COL11 = B.COL11
INNER JOIN TEST_TABLE3 C
ON A.COL11 = B.COL11
AND B.COL12 = C.COL12;
#tricot correctly pointed out that it's strange to have 2 aliases with the same name and not getting an error. Also, to answer your question :
In the first query, we are firstly performing cross join between all the 3 tables by specifying all the table names. After that, we are filtering the rows using the condition specified in the WHERE clause on output that we got after performing cross join.
In second query, you need to join test_table3 only once. Since now you have all the required aliases A,B,C as in the first query so you can specify 2 conditions after the last join as below:
SELECT A.* FROM TEST_TABLE1 A
LEFT JOIN TEST_TABLE2 B
ON A.COL11 = B.COL11
left join TEST_TABLE3 C
on B.COL12 =C.COL12 AND A.COL3 = C.COL13;

Select statement with columns that are select statement but not subqueries

SQL Masters,
I don't understand part of this query. In the select statement there are what look like independent 'select statements'almost like a function. This code is vendor written Blackbaud CRM. As independent code there is no join in the code for the info they bring into the data set as you can see in the from clause. One last odd item is that in the column aliased Spouse_id the column SPOUSE.RECIPROCALCONSTITUENTID dose not even exist in the table referred to. Any BBCRM people out there that can explain this?
Thanks
select
CONSTITUENT.ID,
CONSTITUENT.ISORGANIZATION,
CONSTITUENT.KEYNAME,
CONSTITUENT.FIRSTNAME,
CONSTITUENT.MIDDLENAME,
CONSTITUENT.MAIDENNAME,
CONSTITUENT.NICKNAME,
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID
and SPOUSE.ISSPOUSE = 1) as [SPOUSE_ID],
(select MARITALSTATUSCODE.DESCRIPTION
from dbo.MARITALSTATUSCODE
where MARITALSTATUSCODE.ID = CONSTITUENT.MARITALSTATUSCODEID) as [MARITALSTATUSCODEID_TRANSLATION]
From
dbo.constituent
left join
dbo.ORGANIZATIONDATA on ORGANIZATIONDATA.ID = CONSTITUENT.ID
where
(CONSTITUENT.ISCONSTITUENT = 1)
These are correlated subqueries. Although there is no explicit JOIN, there is a link to the outer table which behaves like a join (although more constrained than explicit JOINs):
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID AND
-------^ correlation clause connecting to outer table
SPOUSE.ISSPOUSE = 1
) as [SPOUSE_ID],
This behaves like a LEFT JOIN. If no rows match, then the result is NULL.
Note that in this context, the correlated subquery is also a scalar subquery. That means that it returns exactly one column and at most one row.
If the query returned more than one column, you would get a compile-time error on the query. If the query returns more than one row, you will get a run-time error on the query.

Semi-join vs Subqueries

What is the difference between semi-joins and a subquery? I am currently taking a course on this on DataCamp and i'm having a hard time making a distinction between the two.
Thanks in advance.
A join or a semi join is required whenever you want to combine two or more entities records based on some common conditional attributes.
Unlike, Subquery is required whenever you want to have a lookup or a reference on same table or other tables
In short, when your requirement is to get additional reference columns added to existing tables attributes then go for join else when you want to have a lookup on records from the same table or other tables but keeping the same existing columns as o/p go for subquery
Also, In case of semi join it can act/used as a subquery because most of the times we dont actually join the right table instead we mantain a check via subquery to limit records in the existing hence semijoin but just that it isnt a subquery by itself
I don't really think of a subquery and a semi-join as anything similar. A subquery is nothing more interesting than a query that is used inside another query:
select * -- this is often called the "outer" query
from (
select columnA -- this is the subquery inside the parentheses
from mytable
where columnB = 'Y'
)
A semi-join is a concept based on join. Of course, joining tables will combine both tables and return the combined rows based on the join criteria. From there you select the columns you want from either table based on further where criteria (and of course whatever else you want to do). The concept of a semi-join is when you want to return rows from the first table only, but you need the 2nd table to decide which rows to return. Example: you want to return the people in a class:
select p.FirstName, p.LastName, p.DOB
from people p
inner join classes c on c.pID = p.pID
where c.ClassName = 'SQL 101'
group by p.pID
This accomplishes the concept of a semi-join. We are only returning columns from the first table (people). The use of the group by is necessary for the concept of a semi-join because a true join can return duplicate rows from the first table (depending on the join criteria). The above example is not often referred to as a semi-join, and is not the most typical way to accomplish it. The following query is a more common method of accomplishing a semi-join:
select FirstName, LastName, DOB
from people
where pID in (select pID
from class
where ClassName = 'SQL 101'
)
There is no formal join here. But we're using the 2nd table to determine which rows from the first table to return. It's a lot like saying if we did join the 2nd table to the first table, what rows from the first table would match?
For performance, exists is typically preferred:
select FirstName, LastName, DOB
from people p
where exists (select pID
from class c
where c.pID = p.pID
and c.ClassName = 'SQL 101'
)
In my opinion, this is the most direct way to understand the semi-join. There is still no formal join, but you can see the idea of a join hinted at by the usage of directly matching the first table's pID column to the 2nd table's pID column.
Final note. The last 2 queries above each use a subquery to accomplish the concept of a semi-join.

SQL subquery multiple times error

I am making a subquery but I am getting a strange error
The column 'RealEstateID' was specified multiple times for 'NotSold'.
here is my code
SELECT *
FROM
(SELECT *
FROM RealEstatesInfo AS REI
LEFT JOIN Purchases AS P
ON P.RealEstateID=REI.RealEstateID
WHERE DateBought IS NULL) AS NotSold
INNER JOIN OwnerEstate AS OE
ON OE.RealEstateID=NotSold.RealEstateID
It's on SQL server by the way.
That's because there will be 2 realestiteids in your subquery. You need to change it to explicitly list the columns from both table and only include 1 realestateid. It doesn't matter which as you use it for your join.
If you're very Lazy you can select rei.* and only name the p cols apart from realestateid.
Btw select * is probably never a good idea in sub queries or derived tables or ctes.

Difference visibility in subquery join and where

I had problems with a simple join:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place
AND wpw.id_worker = wo.id_worker)
)
The error I had was ORA-00904: "WO"."ID_WORKER": not valid identifier.
Then I decided to move the union of tables from join clause to the where clause:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place)
WHERE wpw.id_worker = wo.id_worker
)
And this last query works perfect.
Why is not possible to make it in the join? The table should be visible like it is in the where clause. Am I missing something?
In
FROM working_place wp
JOIN working_place_worker wpw ON ...
WHERE ...
the ON clause refers only to the two tables participating in the join, namely wp and wpw. Names from the outer query are not visible to it.
The WHERE clause (and its cousin HAVING is the means by which the outer query is correlated to the subquery. Names from the outer query are visible to it.
To make it easy to remember,
ON is about the JOIN, how two tables relate to form a row (or rows)
WHERE is about the selection criteria, the test the rows must pass
While the SQL parser will admit literals (which aren't column names) in the ON clause, it draws the line at references to columns outside the join. You could regard this as a favor that guards against errors.
In your case, the wo table is not part of the JOIN, and is rejected. It is part of the whole query, and is recognized by WHERE.