Select statement with columns that are select statement but not subqueries - sql

SQL Masters,
I don't understand part of this query. In the select statement there are what look like independent 'select statements'almost like a function. This code is vendor written Blackbaud CRM. As independent code there is no join in the code for the info they bring into the data set as you can see in the from clause. One last odd item is that in the column aliased Spouse_id the column SPOUSE.RECIPROCALCONSTITUENTID dose not even exist in the table referred to. Any BBCRM people out there that can explain this?
Thanks
select
CONSTITUENT.ID,
CONSTITUENT.ISORGANIZATION,
CONSTITUENT.KEYNAME,
CONSTITUENT.FIRSTNAME,
CONSTITUENT.MIDDLENAME,
CONSTITUENT.MAIDENNAME,
CONSTITUENT.NICKNAME,
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID
and SPOUSE.ISSPOUSE = 1) as [SPOUSE_ID],
(select MARITALSTATUSCODE.DESCRIPTION
from dbo.MARITALSTATUSCODE
where MARITALSTATUSCODE.ID = CONSTITUENT.MARITALSTATUSCODEID) as [MARITALSTATUSCODEID_TRANSLATION]
From
dbo.constituent
left join
dbo.ORGANIZATIONDATA on ORGANIZATIONDATA.ID = CONSTITUENT.ID
where
(CONSTITUENT.ISCONSTITUENT = 1)

These are correlated subqueries. Although there is no explicit JOIN, there is a link to the outer table which behaves like a join (although more constrained than explicit JOINs):
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID AND
-------^ correlation clause connecting to outer table
SPOUSE.ISSPOUSE = 1
) as [SPOUSE_ID],
This behaves like a LEFT JOIN. If no rows match, then the result is NULL.
Note that in this context, the correlated subquery is also a scalar subquery. That means that it returns exactly one column and at most one row.
If the query returned more than one column, you would get a compile-time error on the query. If the query returns more than one row, you will get a run-time error on the query.

Related

How did this old SQL query work without a join in the subquery

Here is the T-SQL. The code has been around for years and it was handed to me to migrate to another SQL server. It apparently works, but I don't know why. The execution plan doesn't show any predicates being used, so how does it know which rows to exclude. If I run the subquery I get 1146 rows with the value 1
SELECT EM.PERSON_ID
FROM EMP_BEN_ELECTS EBE, EMPLOYEE_MAP EM
WHERE EBE.BW_ID = EM.BW_ID
AND CHANGE_BENEFIT_EVENT_DATE IS NULL
AND OPTION_ID <> 'WAIVE'
AND NOT EXISTS (SELECT 1 FROM EMPLOYEE_BILLING WHERE BILLING_GROUPING_ID
IN('HWMONTHLY','HWINDIVIDUALBILLED') AND END_DATE IS NULL)
I plan rewrite it without the subquery and use a left join instead, but this just boggled me that it works. The only time I seen code written like this without the join being qualified was when I seen code coming from an Oracle developer.
The subquery of NOT EXISTS is not used to return any (of the 1146) rows.
It is used to check if at least 1 row exists in the table EMPLOYEE_BILLING with the specified conditions:
BILLING_GROUPING_ID IN('HWMONTHLY','HWINDIVIDUALBILLED') AND END_DATE IS NULL
If there is such a row, then NOT EXISTS returns FALSE and since all the conditions in the WHERE clause of the main query are linked with the operator AND, then the final result is WHERE FALSE, making the query to not return any rows.
Don't rewrite the query with a LEFT join.
EXISTS and NOT EXISTS provide usually better performance than joins.
What you must change though, is that archaic join syntax with the ,.
Change it to a proper INNER join with an ON clause:
SELECT EM.PERSON_ID
FROM EMP_BEN_ELECTS EBE INNER JOIN EMPLOYEE_MAP EM
ON EBE.BW_ID = EM.BW_ID
WHERE CHANGE_BENEFIT_EVENT_DATE IS NULL
AND OPTION_ID <> 'WAIVE'
AND NOT EXISTS (
SELECT 1
FROM EMPLOYEE_BILLING
WHERE BILLING_GROUPING_ID IN('HWMONTHLY','HWINDIVIDUALBILLED')
AND END_DATE IS NULL
)
Also, you should qualify all the column names with the table's name/alias they belong to (CHANGE_BENEFIT_EVENT_DATE and OPTION_ID which I left unqualified because I don't know which alias to use).

SQL optimization (inner join or selects)

I have a dilema, i had a teacher that thought me basically that inner joins are hell (he reproved me because I missed the delivery of the final proyect by 3 mins...), now i have another that tells me that using just selects is inefficient, so I don't know what is white nor black... could someone enlighten me with their knowledge?
Joins
SELECT
NombreP AS Nombre,
Nota
FROM Lleva
INNER JOIN Estudiante ON CedEstudiante = Estudiante.Cedula
WHERE
Lleva.SiglaCurso='CI1312';
No Joins
SELECT
NombreP AS Nombre,
Nota
FROM (
SELECT
Nota,
CedEstudiante
FROM Lleva
WHERE
SiglaCurso='CI1312'
) AS Lleva, (
SELECT
NombreP,
Cedula
FROM Estudiante
) AS Estudiante
WHERE
CedEstudiante = Estudiante.Cedula;
So wich one is more efficient?
Lets re-write the code so it's easier to understand:
SELECT E.NombreP AS Nombre
,L.Nota
FROM Lleva L INNER JOIN Estudiante E ON L.CedEstudiante = E.Cedula
WHERE
L.SiglaCurso='CI1312';
A table using a subquery may look like this:
SELECT L.Nota
,(SELECT E.Nombre
FROM Estudiante E
WHERE E.Cedula = L.CedEstudiante
)
FROM Lleva L
WHERE
L.SiglaCurso='CI1312'
What you actually did in your original query was an implicit join. This is similar to inner join, without declaring the exact joining conditions. Implicit joins will attempt to join on similarly named columns between tables. Most programmers do not advise or use implicit join.
As for join versus subquery, they are applied in different situations.
They are not equivalent. Notice what I have put in bold below:
A subquery will guarantee 1 returned value or return NULL; if there are multiple values returned in the subquery you will get an error for returning more than 1 value and have to solve the problem with an aggregation perhaps (max value, top 1 value). Subqueries that return no match will return NULL without affecting the rest of the row.
The JOIN (INNER JOIN) operates differently. It can match rows and get the single value you're looking for just like a subquery. But a join can multiply returned rows if the joining conditions are not distinct (singular/non-repeating). This is why joins are usually done on Primary Keys (PK's). PK's are distinct by definition. The INNER JOIN will also remove rows if a joining condition doesn't occur between tables. This may be what your first professor was trying to explain- an INNER JOIN can work in many cases - similar to a subquery- but can also can return additional rows or remove rows from the output.

Join within an "exists" subquery

I am wondering why when you join two tables on a key within an exists subquery the join has to happen within the WHERE clause instead of the FROM clause.
This is my example:
Join within FROM clause:
SELECT payer_id
FROM Population1
WHERE NOT EXISTS
(Select *
From Population2 join Population1
On Population2.payer_id = Population1.payer_id)
Join within WHERE clause:
SELECT payer_id
FROM Population1
WHERE NOT EXISTS
(Select *
From Population2
WHERE Population2.payer_id = Population1.payer_id)
The first query gives me 0 results, which I know is incorrect, while the second query gives the the thousands of results I am expecting to see.
Could someone just explain to me why where the join happens in an EXISTS subquery matters? If you take the subqueries without the parent query and run them they literally give you the same result.
It would help me a lot to remember to not continue to make this mistake when using exists.
Thanks in advance.
You need to understand the distinction between a regular subquery and a correlated subquery.
Using your examples, this should be easy. The first where clause is:
where not exists (Select 1
from Population2 join
Population1
on Population2.payer_id = Population1.payer_id
)
This condition does exactly what it says it is doing. The subquery has no connection to the outer query. So, the not exists will either filter out all rows or keep all rows.
In this case, the engine runs the subquery and determines that at least one row is returned. Hence, not exists returns false in all cases, and the nothing is returned.
In the second case, the subquery is a correlated subquery. So, for each row in population1 the subquery is run using the value of Population1.payer_id. In some cases, matching rows exist in Population2; these are filtered out. In other cases, matching rows do not exist; these are in the result set.
The first example is not actually reffering to the base table which creates a logic that is unpredictable.
Another way to do the same logic would be:
SELECT payer_id
FROM Population1 P1
LEFT JOIN Population2 P2 ON
P2.Payer_Id = P1.Payer_Id
WHERE
P2.Payer_Id IS NULL
You qry return ROW EXISTS always if exist even if there is one result row.
Select *
from Population2
join Population1 on Population2.payer_id = Population1.payer_id
If exist at least one row from this join (and for sure there exists), you can imagine your subqry looks like:
select 'ROW EXISTS'
And result of:
select *
from Population1
where not exists (select 'ROW EXISTS')
So your anti-semijoin return:
payer_id 1 --> some ROW EXISTS -> dont't return this row
payer_id 2 --> some ROW EXISTS -> dont't return this row

Difference visibility in subquery join and where

I had problems with a simple join:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place
AND wpw.id_worker = wo.id_worker)
)
The error I had was ORA-00904: "WO"."ID_WORKER": not valid identifier.
Then I decided to move the union of tables from join clause to the where clause:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place)
WHERE wpw.id_worker = wo.id_worker
)
And this last query works perfect.
Why is not possible to make it in the join? The table should be visible like it is in the where clause. Am I missing something?
In
FROM working_place wp
JOIN working_place_worker wpw ON ...
WHERE ...
the ON clause refers only to the two tables participating in the join, namely wp and wpw. Names from the outer query are not visible to it.
The WHERE clause (and its cousin HAVING is the means by which the outer query is correlated to the subquery. Names from the outer query are visible to it.
To make it easy to remember,
ON is about the JOIN, how two tables relate to form a row (or rows)
WHERE is about the selection criteria, the test the rows must pass
While the SQL parser will admit literals (which aren't column names) in the ON clause, it draws the line at references to columns outside the join. You could regard this as a favor that guards against errors.
In your case, the wo table is not part of the JOIN, and is rejected. It is part of the whole query, and is recognized by WHERE.

How does Subquery in select statement work in oracle

I have looked all over for an explanation, to how does the subquery in a select statement work and still I cannot grasp the concept because of very vague explanations.
I would like to know how do you use a subquery in a select statement in oracle and what exactly does it output.
For example, if i had a query that wanted to display the names of employees and the number of profiles they manage from these tables
Employee(EmpName, EmpId)
Profile(ProfileId, ..., EmpId)
how do I use the subquery?
I was thinking a subquery is needed in the select statement to implement the group by function to count the number of profiles being managed for each employee, but I am not too sure.
It's simple-
SELECT empname,
empid,
(SELECT COUNT (profileid)
FROM profile
WHERE profile.empid = employee.empid)
AS number_of_profiles
FROM employee;
It is even simpler when you use a table join like this:
SELECT e.empname, e.empid, COUNT (p.profileid) AS number_of_profiles
FROM employee e LEFT JOIN profile p ON e.empid = p.empid
GROUP BY e.empname, e.empid;
Explanation for the subquery:
Essentially, a subquery in a select gets a scalar value and passes it to the main query. A subquery in select is not allowed to pass more than one row and more than one column, which is a restriction. Here, we are passing a count to the main query, which, as we know, would always be only a number- a scalar value. If a value is not found, the subquery returns null to the main query. Moreover, a subquery can access columns from the from clause of the main query, as shown in my query where employee.empid is passed from the outer query to the inner query.
Edit:
When you use a subquery in a select clause, Oracle essentially treats it as a left join (you can see this in the explain plan for your query), with the cardinality of the rows being just one on the right for every row in the left.
Explanation for the left join
A left join is very handy, especially when you want to replace the select subquery due to its restrictions. There are no restrictions here on the number of rows of the tables in either side of the LEFT JOIN keyword.
For more information read Oracle Docs on subqueries and left join or left outer join.
In the Oracle RDBMS, it is possible to use a multi-row subquery in the select clause as long as the (sub-)output is encapsulated as a collection. In particular, a multi-row select clause subquery can output each of its rows as an xmlelement that is encapsulated in an xmlforest.