Difference visibility in subquery join and where - sql

I had problems with a simple join:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place
AND wpw.id_worker = wo.id_worker)
)
The error I had was ORA-00904: "WO"."ID_WORKER": not valid identifier.
Then I decided to move the union of tables from join clause to the where clause:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place)
WHERE wpw.id_worker = wo.id_worker
)
And this last query works perfect.
Why is not possible to make it in the join? The table should be visible like it is in the where clause. Am I missing something?

In
FROM working_place wp
JOIN working_place_worker wpw ON ...
WHERE ...
the ON clause refers only to the two tables participating in the join, namely wp and wpw. Names from the outer query are not visible to it.
The WHERE clause (and its cousin HAVING is the means by which the outer query is correlated to the subquery. Names from the outer query are visible to it.
To make it easy to remember,
ON is about the JOIN, how two tables relate to form a row (or rows)
WHERE is about the selection criteria, the test the rows must pass
While the SQL parser will admit literals (which aren't column names) in the ON clause, it draws the line at references to columns outside the join. You could regard this as a favor that guards against errors.
In your case, the wo table is not part of the JOIN, and is rejected. It is part of the whole query, and is recognized by WHERE.

Related

BigQuery : WITH clause behavior in multiple JOIN conditions

For readability, I have defined "org_location_ext" clause in the query as follows.
This "org_location_ext" is first used to join with the main fact-table "LOCATION_SALES".
It is used in other JOIN conditions as well.
According to the BigQuery documentation : https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#with_clause
The WITH clause contains one or more named subqueries which execute
every time a subsequent SELECT statement references them
I want to know the behavior for this case.
Does this query executes the "org_location_ext" WITH clause multiple times ?
Or when the SELECT query gets executed, a temporary table is created for "org_location_ext" and use this temporary table for all the JOINs.
Basically, after the first JOIN with the fact-table , later joins use that "filtered" result for their joins , or do they rerun the WITH clause ?
WITH org_location_ext AS (
SELECT *
FROM ORG_LOC_MASTER AS loc_master
JOIN LOC_REGN1 as regn1 ON loc_master.id = regn1.id
JOIN ...
JOIN ...
)
SELECT
..
org_location_ext.store_class,
org_location_ext.country,
org_location_ext.
..
..
FROM LOCATION_SALES AS sales
JOIN org_location_ext ON org_location_ext.area_id = sales.area_id AND org_location_ext.date = sales.date
JOIN ....
JOIN ....
JOIN COUNTRY_VAT AS vat ON vat.key1 =TBL_Y.key1 AND vat.country_code = org_location_ext.country_code
It depends on the query plan. Consider checking a query plan. You'll see how many times any specific table is accessed.

Select statement with columns that are select statement but not subqueries

SQL Masters,
I don't understand part of this query. In the select statement there are what look like independent 'select statements'almost like a function. This code is vendor written Blackbaud CRM. As independent code there is no join in the code for the info they bring into the data set as you can see in the from clause. One last odd item is that in the column aliased Spouse_id the column SPOUSE.RECIPROCALCONSTITUENTID dose not even exist in the table referred to. Any BBCRM people out there that can explain this?
Thanks
select
CONSTITUENT.ID,
CONSTITUENT.ISORGANIZATION,
CONSTITUENT.KEYNAME,
CONSTITUENT.FIRSTNAME,
CONSTITUENT.MIDDLENAME,
CONSTITUENT.MAIDENNAME,
CONSTITUENT.NICKNAME,
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID
and SPOUSE.ISSPOUSE = 1) as [SPOUSE_ID],
(select MARITALSTATUSCODE.DESCRIPTION
from dbo.MARITALSTATUSCODE
where MARITALSTATUSCODE.ID = CONSTITUENT.MARITALSTATUSCODEID) as [MARITALSTATUSCODEID_TRANSLATION]
From
dbo.constituent
left join
dbo.ORGANIZATIONDATA on ORGANIZATIONDATA.ID = CONSTITUENT.ID
where
(CONSTITUENT.ISCONSTITUENT = 1)
These are correlated subqueries. Although there is no explicit JOIN, there is a link to the outer table which behaves like a join (although more constrained than explicit JOINs):
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID AND
-------^ correlation clause connecting to outer table
SPOUSE.ISSPOUSE = 1
) as [SPOUSE_ID],
This behaves like a LEFT JOIN. If no rows match, then the result is NULL.
Note that in this context, the correlated subquery is also a scalar subquery. That means that it returns exactly one column and at most one row.
If the query returned more than one column, you would get a compile-time error on the query. If the query returns more than one row, you will get a run-time error on the query.

Using ON clause to JOIN tables with same column name

I wanted to ask about the condition of an ON clause while joining tables:
SELECT c_User.ID
FROM c_User
WHERE EXISTS (
SELECT *
FROM c_Group
JOIN c_Member ON (c_Group.Group_Name LIKE 'mcp%')
WHERE
c_Group.Name = c_Member.Parent_Name
AND c_Member.Child_Name = c_User.Lower_User_Name
)
I know that tables c_Member and c_Group have one column with the same name, Directory_ID. What I expected was c_Member and c_Group to join on that column using something like:
c_Group JOIN c_Member ON (c_Group.Directory_ID = c_Member.Directory_ID)
WHERE c_Group.Group_Name like 'mcp%'
How is this condition able to match the rows?
c_Member ON (c_Group.Group_Name LIKE 'mcp%')
Is this is a shorter way of referring to two tables joining on a column with the same name, while applying the LIKE condition?
If so, then can such a style work for a table with multiple column names that are the same?
This is your correlated subquery:
SELECT *
FROM c_Group
JOIN c_Member ON (c_Group.Group_Name LIKE 'mcp%')
WHERE
c_Group.Name = c_Member.Parent_Name
AND c_Member.Child_Name = c_User.Lower_User_Name
This subquery works, but the way it is spelled makes it quite unclear:
The join condition (c_Group.Group_Name LIKE 'mcp%') is not actually not related to the table being joined (c_Member) ; what it actually does is apply filter on table c_Group that makes it filter on (there is no magic such as shorter way of referring to two tables joining on a column with the same name, while applying the LIKE condition). It would make more sense to move it to the WHERE clause (this would still be functionaly equivalent).
On the other hand, the WHERE clause contains conditions that relate to the tables being joined (for example: c_Group.Name = c_Member.Parent_Name). A more sensible option would be to put them in the ON clause of the JOIN.
Other remarks:
when using NOT EXISTS, you usually would prefer SELECT 1 instead of SELECT *, (most RDBMS will optimize this under the hood for you, but this makes the intent clearer).
table aliases can be used to make the query more readable
I would suggest the following syntax for the query (which is basically syntaxically equivalent to the original, but a lot clearer):
SELECT u.ID
FROM c_User u
WHERE EXISTS (
SELECT 1
FROM c_Group g
JOIN c_Member m ON g.Name = m.Parent_Name AND m.Child_Name = u.Lower_User_Name
WHERE g.Group_Name LIKE 'mcp%'
)

SQL optimization (inner join or selects)

I have a dilema, i had a teacher that thought me basically that inner joins are hell (he reproved me because I missed the delivery of the final proyect by 3 mins...), now i have another that tells me that using just selects is inefficient, so I don't know what is white nor black... could someone enlighten me with their knowledge?
Joins
SELECT
NombreP AS Nombre,
Nota
FROM Lleva
INNER JOIN Estudiante ON CedEstudiante = Estudiante.Cedula
WHERE
Lleva.SiglaCurso='CI1312';
No Joins
SELECT
NombreP AS Nombre,
Nota
FROM (
SELECT
Nota,
CedEstudiante
FROM Lleva
WHERE
SiglaCurso='CI1312'
) AS Lleva, (
SELECT
NombreP,
Cedula
FROM Estudiante
) AS Estudiante
WHERE
CedEstudiante = Estudiante.Cedula;
So wich one is more efficient?
Lets re-write the code so it's easier to understand:
SELECT E.NombreP AS Nombre
,L.Nota
FROM Lleva L INNER JOIN Estudiante E ON L.CedEstudiante = E.Cedula
WHERE
L.SiglaCurso='CI1312';
A table using a subquery may look like this:
SELECT L.Nota
,(SELECT E.Nombre
FROM Estudiante E
WHERE E.Cedula = L.CedEstudiante
)
FROM Lleva L
WHERE
L.SiglaCurso='CI1312'
What you actually did in your original query was an implicit join. This is similar to inner join, without declaring the exact joining conditions. Implicit joins will attempt to join on similarly named columns between tables. Most programmers do not advise or use implicit join.
As for join versus subquery, they are applied in different situations.
They are not equivalent. Notice what I have put in bold below:
A subquery will guarantee 1 returned value or return NULL; if there are multiple values returned in the subquery you will get an error for returning more than 1 value and have to solve the problem with an aggregation perhaps (max value, top 1 value). Subqueries that return no match will return NULL without affecting the rest of the row.
The JOIN (INNER JOIN) operates differently. It can match rows and get the single value you're looking for just like a subquery. But a join can multiply returned rows if the joining conditions are not distinct (singular/non-repeating). This is why joins are usually done on Primary Keys (PK's). PK's are distinct by definition. The INNER JOIN will also remove rows if a joining condition doesn't occur between tables. This may be what your first professor was trying to explain- an INNER JOIN can work in many cases - similar to a subquery- but can also can return additional rows or remove rows from the output.

Group by in SQL Server giving wrong count

I have a query which works, goes like this:
Select
count(InsuranceOrderLine.AntallPotensiale) as potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name
This query over returns what I need, but when I try to add more table (InsuranceOrder) to be able to get the regardingUser column, then all the count values are way high.
Select
count(InsuranceOrderLine.AntallPotensiale) as Potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori,
RegardingUser
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory, InsuranceSalesLead
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name,RegardingUser
Thanks in advance
You're adding one more table to your FROM statement, but you don't specify any JOIN condition for that table - so your previous result set will do a FULL OUTER JOIN (cartesian product) with your new table! Of course you'll get duplication of data....
That's one of the reasons that I'm recommending never to use that old, legacy style JOIN - do not simply list a comma-separated bunch of tables in your FROM statement.
Always use the new ANSI standard JOIN syntax with INNER JOIN, LEFT OUTER JOIN and so on:
SELECT
count(iol.AntallPotensiale) as Potensiale,
COUNT(iol.AntallSolgt) as Solgt,
ip.Name,
ipc.Name as Kategori,
isl.RegardingUser
FROM
dbo.InsuranceOrderLine iol
INNER JOIN
dbo.InsuranceProduct ip ON iol.FKInsuranceProductId = ip.InsuranceProductID
INNER JOIN
dbo.InsuranceProductCategory ipc ON ip.FKInsuranceProductCategory = ipc.InsuranceProductCategoryID
INNER JOIN
dbo.InsuranceSalesLead isl ON ???????? -- JOIN condition missing here !!
When you do this, you first of all see right away that you're missing a JOIN condition here - how is this new table InsuranceSalesLead linked to any of the other tables already used in this SQL statement??
And secondly, your intent is much clearer, since the JOIN conditions linking the tables are where they belong - right with the JOIN - and don't clutter up your WHERE clauses ...
It looks like you added the table join which slightly multiplies count of rows - make sure, that you properly joining the table. And be careful with aggregate functions over several joined tables - joins very often lead to duplicates