Do subselects do an implicit join? - sql

I have a sql query that seems to work but I dont really understand why. Therefore I would very much appreciate if someone could help explain whats going on:
THE QUERY RETURNS: All organisations that dont have any comments that were not created by the consultant who created the organisation record.
SELECT \"organisations\".*
FROM \"organisations\"
WHERE \"organisations\".\"id\" NOT IN
(SELECT \"comments\".\"commentable_id\"
FROM \"comments\"
WHERE \"comments\".\"commentable_type\" = 'Organisation'
AND (comments.author_id != organisations.consultant_id)
ORDER BY \"comments\".\"created_at\" ASC
)
It seems to do so correctly.
The part I dont understand is why (comments.author_id != organisations.consultant_id) is working!? I dont understand how postgres even knows what "organisations" is inside that subselect? It is not defined in here.
If this was written as a join where I had joined comments to organisations then I would totally understand how you could do something like this but in this case its a subselect. How does it know how to map the comments and organisations table and exclude the ones where (comments.author_id != organisations.consultant_id)

That subselect happens in a row so it can see all columns of that row. You will probably get better performance with this
select organisations.*
from organisations
where not exists (
select 1
from comments
where
commentable_type = 'organisation' and
author_id != organisations.consultant_id
)
Notice that it is not necessary to qualify commentable_type since the one in comments has priority over any other outside the subselect. And if comments does not have a consultant_id column then it would be possible to take its qualifier out, although not recommended for better legibility.
The order by in your query buys you nothing, just added cost.

You are running a correlated subquery. http://technet.microsoft.com/en-us/library/ms187638(v=sql.105).aspx
This is commonly used in all databases. A subquery in the WHERE clause can refer to tables used in the parent query, and they often do.
That being said, your current query could likely be written better.
Here is one way, using an outer join with comments, where no matches are found based on your criteria -
select o.*
from organizations o
left join comments c
on c.commentable_type <> 'Organisation'
and c.author_id = o.consultant_id
where c.commentable_id is null

Related

Natural Join Explanation

I am a student trying to learn Microsoft SQL and I am frustrated that I cannot understand how natural join works. I have a problem with a solution but I cannot understand how this solution was created through natural join. My friend sent my his old solution but cant remember how he got the result. I really want to figure out how this natural join works. Can someone please explain how this answer was achieved?
The question taken from "Database Design, Application, and Administration 6th Ed.:
Show the result of a natural join that combines the Customer and OrderTbl tables.
Edit: The book doesn't give a query statement on how to get the results for the natural join in the chapters I have read so far. Which makes things much more confusing for me as I cannot simply add the table and send a query.
Edit2: The reason why some entries in the order table have null values is because the question is implying that some orders were processed through the internet. A friend of mine told me that was the reason why he got the question right opposed to some people who argued against it. :S
If I read your question correctly, you are trying to learn two concepts at the same time. The first is INNER JOIN (sometimes called equijoin). The second is natural join.
It's easier to explain natural join, assuming you already know how INNER JOIN works. A natural join is simply an equijoin where the column names indicate to you what the join condition ought to be. In your case, the fact that CustNo appears in both tables is the only clue you need in order to devise the correct join condition. You also include the join field only once in the result.
Column names are actually quite arbitrary, and could have been made very different in this case. for example, if the column Customer.CustNo had been named Customer.ID instead, you wouldn't be able to do a natural join.
for a correct solution in your case, see the answer provided by JamieC.
If you simply want the query which will result in the final table in your question here it is,
SELECT
o.OrdNo,
o.OrdDate,
o.EmpNo,
o.CustNo,
c.CustFirstName
c.CustLastName,
c.CustCity,
c.CustSatate,
c.CustZip,
c.CustBal
FROM OrderTbl o
INNER JOIN Customer c
ON o.CustNo = c.CustNo
ORDER BY c.CustNo
So, by way of explanation; this query selects all data from Customer and OrderTbl joining the two using CustNo which is the primary key (presumably) in Customer and a foreign key in OrderTbl. The ordering of the result is a little more tricky, and based almost purely on guesswork, I suspect the result is ordered by CustNo as well.
The Employee table does not feature at all in the result, however as the OrderTbl table has some blanks for EmpNo, you would almost certainly want a LEFT JOIN/RIGHT JOIN (as appropriate) if you wanted to retrieve any information about the employee from the orders table.
MS SQL Server doesn't support NATURAL JOIN. However, if you were using a platform that would support it, a simple:
SELECT * FROM Customer NATURAL JOIN OrderTbl;
should do the trick.
https://en.wikipedia.org/wiki/Join_(SQL)#Natural_join is quite good.

Intersystems Cache coding query

SELECT Distinct visitid As Visit_ID,
AreaId->FacilityID As Facility_ID,
visitid-PatientSecondaryNumber As Patient_MRN,
visitid->PatientName As Patient_Name,
visitid-statustext As visit_Status,
visitid->LastVisitTypeID->shortname As visit_Type,
visitid-LastVisitActivationTime As Last_Visit_Activation,
(SELECT VisitConversionID->VisitTypeID-shortname
FROM qcpr_arf_OC.VisitActivationTime
WHERE visitid = qcpr_arf_RG.AreaBedHistoryEventTime.visitid AND
VisitConversionID->VisitTypeID-shortname LIKE 'Emergency%' ) AS Last_Visit FROM qcpr_arf_rg.AreaBed INNER JOIN qcpr_arf_RG.AreaBedHistoryEventTime ON
qcpr_arf_rg.AreaBed.AreaBedID = qcpr_arf_RG.AreaBedHistoryEventTime.AreaBedID
WHERE AreaBedHistoryEventTimeSubID LIKE 'Ç910%' AND visitid <> ''
Hi the above query Have been retain by a previous employee and I'm trying to figure out what "->" means could anyone please help me out.
To expand on #Ben's answer, and to include more information in this thread rather than just the external link he supplied.
-> syntax is a Cache SQL shorthand that represents an implicit LEFT OUTER JOIN in cases where a property is a reference to another table.
As an example your SQL query includes the following column in the SELECT clause:
AreaId->FacilityID As Facility_ID
This expression is equivalent to a LEFT OUTER JOIN with the table that AreaId references using ON {table.ROWID} = AreaID, and returning that FacilityID if such an AreaId exists, or NULL if it does not.
At first glance, the syntax may not make much sense, but it can reduce the amount of SQL in a query. That said, this query might be easier to follow if you made the JOIN on visitid explicit.
I am including #Ben's link to the InterSystems documentation: http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_specialfeatures#GSQL_specialfeatures_impjoin
The focus of the documentation describes the behaviour of the feature from a more OO perspective, as well as giving some basic query rewrites illustrating the feature.
The -> syntax is an implicit join. See http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_specialfeatures#GSQL_specialfeatures_impjoin for a full explanation.

SQL and implicit join

w3schools.com has the following SQL statement example:
SELECT
Product_Orders.OrderID, Persons.LastName, Persons.FirstName
FROM
Persons,
Product_Orders
WHERE
Persons.LastName='Hansen' AND Persons.FirstName='Ola'
What is not clear to me is how the product order table is joined with the persons table. Is this SQL proper, and what does it imply about the result set sent back?
It's technically proper, though I would avoid using that type of syntax for a couple reasons which I won't go into right here.
That syntax is colloquially called the "old style" join syntax and is equivalent to:
SELECT Product_Orders.OrderID, Persons.LastName, Persons.FirstName
FROM Persons
CROSS JOIN Product_Orders
WHERE Persons.LastName='Hansen' AND Persons.FirstName='Ola'
The answer of how they "join" together is that they don't. The result of the query is the cross product of the entire Product_Orders table with the "Ola Hanson" row(s) in the Persons table. It doesn't make much sense in most of the use cases I can think of, to be quite honest with you.
The really strange thing about the results you'll get here is that the OrderID from the Product_Orders table will not necessarily align with the person on the order. It looks to be a mistake of omission on the page -- though to be fair the example intends to demonstrate aliasing, not joins.
W3Schools doesn't have a good reputation around here.
Firs of all you have to understand how Joins works. From you query:
A cross product is done between the two tables supplied in the SQL statement, you can see this from this FROM Persons, Product_Orders.
For every record in Persons table is joined to each record in the Product_Orders table. This will give you a huge temp table if both tables are big.
Then your filter i.e. the qualifying condition will be applied to this temp table to give you your result.
I hope this helps.

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.

SQL - table alias scope

I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...