SQL ANSI joins and the order of tables in it - sql

The following query is automatically translated from the "old" syntax to ANSI syntax and gives an error:
select *
from ods_trf_pnb_stuf_lijst_adrsrt2 lst
join ods_stg_pnb_stuf_pers_adr pas
on (pas.soort_adres = lst.soort_adres)
right outer join ods_stg_pnb_stuf_pers_nat nat
on (prs.id = nat.prs_id) <<<prs.id invalid identifier
join ods_stg_pnb_stuf_adr adr
on (adr.id = pas.adr_id)
join ods_stg_pnb_stuf_np prs
on (prs.id = pas.prs_id)
I guess this is because table prs is referenced before it has been declared. Moving the prs join up in the query solves the problem:
select *
from ods_trf_pnb_stuf_lijst_adrsrt2 lst
join ods_stg_pnb_stuf_pers_adr pas
on (pas.soort_adres = lst.soort_adres)
join ods_stg_pnb_stuf_np prs <<< this first
on (prs.id = pas.prs_id)
right outer join ods_stg_pnb_stuf_pers_nat nat
on (prs.id = nat.prs_id) <<< now prs.id is known
join ods_stg_pnb_stuf_adr adr
on (adr.id = pas.adr_id)
where lst.persoonssoort = 'PERSOON'
and pas.einddatumrelatie is null
Is there a way to write this query so that the order is less restrictive, still using the ANSI syntax?

If the broken query was generated by a tool from the old non-ANSI syntax, the tools is generated broken code. However, using ANSI-style joins should yield the same result regardless of the order of tables in the from clause. That is
select *
from t1
join t2 on t2.id = t1.id
left join t3 on t3.id = t1.id
will give you the same results (albeit a different ordering of columns in the result set) as
select *
from t1
left join t3 on t3.id = t1.id
join t2 on t2.id = t1.id
Note that the from clause can't be reordered in such a way as to break the dependencies implied by the join criteria. However, you may also, restate/refactor the from clause so as to express the query in a different way that will yield the same result set. For instance, the above query is equivalent to
select *
from t3
right join t1 on t1.id = t3.id
join t2 on t2.id = t1.id

You simply cannot reference a table unless it has been in the join list earlier. That is normal and expected behavior. Why is this a problem?

A normal ("INNER") JOIN
SELECT ...
FROM a
JOIN b ON (a.x = b.y)
is equivalent to a SELECT with two tables and an appropiate WHERE clause
SELECT ...
FROM a, b
WHERE a.x = b.y
For left/right/outer joins, you are still handicapped by "the asymmetric" join syntax.

I think the original SQL code should be something looks like this,
select *
from ods_trf_pnb_stuf_lijst_adrsrt2 lst
, ods_stg_pnb_stuf_pers_adr pas
, ods_stg_pnb_stuf_pers_nat nat
, ods_stg_pnb_stuf_adr adr
, ods_stg_pnb_stuf_np prs
where
pas.soort_adres = lst.soort_adres
and prs.id(+) = nat.prs_id
and adr.id = pas.adr_id
and prs.id = pas.prs_id
and lst.persoonssoort = 'PERSOON'
and pas.einddatumrelatie is null
ods_stg_pnb_stuf_np prs is at the end of from clause which is valid in Oracle proprietary joins,
But when convert this to ANSI SQL syntax, table prs should be joined first before it was referenced. This is a common mistake that people made when convert Oracle proprietary joins to ANSI SQL syntax.
There are some other issues when convert Oracle proprietary joins to ANSI SQL syntax:
additional join condition was missing.
condition in where clause was broken after moving some conditions to join clause.
If your colleague need to rewrite Oracle proprietary joins to ANSI SQL syntax, demos(both in java and C#) listed in this article should be helpful.

Related

Joins with WHERE - splitting WHERE clauses

I solved the query at this link
Can you return a list of characters and TV shows that are not named "Willow Rosenberg" and not in the show "How I Met Your Mother"?
with the following code:
SELECT ch.name,sh.name
FROM character ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE ch.name != "Willow Rosenberg" AND sh.name !="How I Met Your Mother"
;
However, my first try was:
SELECT ch.name,sh.name
FROM character ch
WHERE ch.name != "Willow Rosenberg" /*This here*/
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
;
because I thought that in this way only the table character would have been filtered before doing the joins and, therefore, it would have been less computationally heavy.
Does it make any sense?
Is there a way to "split" the WHERE clause when joining multiple tables?
Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.
You can using subqueries
SELECT ch.name,sh.name
FROM (
SELECT ch.name
FROM character ch
WHERE ch.name != "Willow Rosenberg") ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
but i think it don't have sense. subqueries will make temp table.
First query will be optimized by database server, and likely select only rows from character table that need
JOIN and WHERE clauses are not necessarily executed in the order you write them. In general, the query optimizer will rearrange things to make them as efficient as possible (or at least what it thinks is most efficient), so adding a second WHERE clause wouldn't be any different from adding another AND condition (which is why it's not allowed).
Your idea wasn't bad, but it's just not how databases actually work.
A SELECT can only have 1 WHERE clause.
And it comes after the JOIN's.
But you can have additional WHERE clauses in the sub-queries you join.
And sometimes a criteria that you've added to a WHERE clause can be moved to the ON of a JOIN.
For example the queries below would return the same results
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t1.Col1 = 'foo'
AND t2.Col1 = 'bar'
SELECT *
FROM
(
SELECT *
FROM Table1
WHERE Col1 = 'foo'
) AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t2.Col1 = 'bar'
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON (t2.ID = t1.table2ID AND t2.Col1 = 'bar')
WHERE t1.Col1 = 'foo'

what operation does "select from table1, table2 " imply? [duplicate]

This question already has answers here:
Select from Table1, Table2
(3 answers)
Closed 4 years ago.
I know different joins, but I wanted to know which of them is being used when we run queries like this:
select * from table1 t1, table2 t2
is it full outer join or natural join for example?
Also does it have a unique meaning among different databases or all do the same?
UPDATE: what if we add where clause ? will it be always inner join?
The comma in the from clause -- by itself -- is equivalent to cross join in almost all databases. So:
from table1 t1, table2 t2
is functionally equivalent to:
from table1 t1 cross join table2 t2
They are not exactly equivalent, because the scoping rules within the from clause are slightly different. So:
from table1 t1, table2 t2 join
table3 t3
on t1.x = t3.x
generates an error, whereas the equivalent query with cross join works.
In general, conditions in the WHERE clause will always result in the INNER JOIN. However, some databases have extended the syntax to support outer joins in the WHERE clause.
I can think of one exception where the comma does not mean CROSS JOIN. Google's BigQuery originally used the comma for UNION ALL. However, that is only in Legacy SQL and they have removed that in Standard SQL.
Commas in the FROM clause have been out of fashion since the 1900s. They are the "original" form of joining tables in SQL, but explicit JOIN syntax is much better.
To me, they also mean someone who learned SQL decades ago and refused to learn about outer joins, or someone who has learned SQL from ancient materials -- and doesn't know a lot of other things that SQL does.
demo: db<>fiddle
This is a CROSS JOIN (cartesian product). So both of the following queries are equal
SELECT * FROM table1, table2 -- implicit CROSS JOIN
SELECT * FROM table1 CROSS JOIN table1 -- explicit CROSS JOIN
concerning UPDATE
A WHERE clause makes the general CROSS JOIN to an INNER JOIN. An INNER JOIN can be got by three ways:
SELECT * FROM table1, table2 WHERE table1.id = table2.id -- implicit CROSS JOIN notation
SELECT * FROM table1 CROSS JOIN table2 WHERE table1.id = table2.id -- really unusual!: explicit CROSS JOIN notation
SELECT * FROM table1 INNER JOIN table2 ON (table1.id = table2.id) -- explicit INNER JOIN NOTATION
Further reading (wikipedia)

'WHERE' syntax equivalent of LEFT OUTER JOIN in PostgreSQL

I know there is a WHERE representation of LEFT JOIN in Oracle that has syntax like:
FROM t1, t2
WHERE t1.id = t2.id(+)
instead of:
FROM t1 LEFT JOIN t2
ON t1.id = t2.id
Is there anything similar in PostgreSQL? I searched for documentation, but failed to find such feature.
There is no such operator in Postgres (or standard SQL).
The only way to write an outer join in Postgres is to use an ANSI explicit JOIN syntax:
select *
from table t1
left join table t2 on t1.id = t2.id;
(or it might be the other way round - it has been ages since I last used the Oracle (+) operator)
More details in the manual: http://www.postgresql.org/docs/current/static/queries-table-expressions.html#QUERIES-FROM
You shouldn't be using the (+) operator in Oracle in the first place. Oracle has supported ANSI joins since 9i and Oracle recommends stop using the (+) operator (the above statement will work just fine in Oracle as well)

Inner join vs join [duplicate]

Both these joins will give me the same results:
SELECT * FROM table JOIN otherTable ON table.ID = otherTable.FK
vs
SELECT * FROM table INNER JOIN otherTable ON table.ID = otherTable.FK
Is there any difference between the statements in performance or otherwise?
Does it differ between different SQL implementations?
They are functionally equivalent, but INNER JOIN can be a bit clearer to read, especially if the query has other join types (i.e. LEFT or RIGHT or CROSS) included in it.
No, there is no difference, pure syntactic sugar.
INNER JOIN = JOIN
INNER JOIN is the default if you don't specify the type when you use the word JOIN.
You can also use LEFT OUTER JOIN or RIGHT OUTER JOIN, in which case the word OUTER is optional, or you can specify CROSS JOIN.
OR
For an INNER JOIN, the syntax is:
SELECT ...
FROM TableA
[INNER] JOIN TableB
(In other words, the INNER keyword is optional--results are the same with or without it.)
Does it differ between different SQL implementations?
Yes, Microsoft Access doesn't allow just join. It requires inner join.
Similarly with OUTER JOINs, the word "OUTER" is optional. It's the LEFT or RIGHT keyword that makes the JOIN an "OUTER" JOIN.
However for some reason I always use "OUTER" as in LEFT OUTER JOIN and never LEFT JOIN, but I never use INNER JOIN, but rather I just use "JOIN":
SELECT ColA, ColB, ...
FROM MyTable AS T1
JOIN MyOtherTable AS T2
ON T2.ID = T1.ID
LEFT OUTER JOIN MyOptionalTable AS T3
ON T3.ID = T1.ID
As the other answers already state there is no difference in your example.
The relevant bit of grammar is documented here
<join_type> ::=
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN
Showing that all are optional. The page further clarifies that
INNER Specifies all matching pairs of rows are returned. Discards
unmatched rows from both tables. When no join type is specified, this
is the default.
The grammar does also indicate that there is one time where the INNER is required though. When specifying a join hint.
See the example below
CREATE TABLE T1(X INT);
CREATE TABLE T2(Y INT);
SELECT *
FROM T1
LOOP JOIN T2
ON X = Y;
SELECT *
FROM T1
INNER LOOP JOIN T2
ON X = Y;

SQL style question: INNER JOIN in FROM clause or WHERE clause?

If you are going to join multiple tables in a SQL query, where do you think is a better place to put the join statement: in the FROM clause or the WHERE clause?
If you are going to do it in the FROM clause, how do you format it so that it is clear and readable? (I'm talking about indents, newlines, whitespace in general.)
Are there any advantages/disadvantages to each?
I tend to use the FROM clause, or rather the JOIN clause itself, indenting like this (and using aliases):
SELECT t1.field1, t2.field2, t3.field3
FROM table1 t1
INNER JOIN table2 t2
ON t1.id1 = t2.id1
INNER JOIN table3 t3
ON t1.id1 = t3.id3
This keeps the join condition close to where the join is made. I find it easier to understand this way then trying to look through the WHERE clause to figure out what exactly is joined how.
When making OUTER JOINs (ANSI-89 or ANSI-92), filtration location matters because criteria specified in the ON clause is applied before the JOIN is made. Criteria against an OUTER JOINed table provided in the WHERE clause is applied after the JOIN is made. This can produce very different result sets.
In comparison, it doesn't matter for INNER JOINs if the criteria is provided in the ON or WHERE clauses -- the result will be the same. That said, I strive to keep the WHERE clause clean -- anything related to JOINed tables will be in their respective ON clause. Saves hunting through the WHERE clause, which is why ANSI-92 syntax is more readable.
I prefer the FROM clause if for no other reason that it distinguishes between filtering results (from a Cartesian product) merely between foreign key relationships and between a logical restriction. For example:
SELECT * FROM Products P JOIN ProductPricing PP ON P.Id = PP.ProductId
WHERE PP.Price > 10
As opposed to
SELECT * FROM Products P, ProductPricing PP
WHERE P.Id = PP.ProductID AND Price > 10
I can look at the first one and instantly know that the only logical restriction I'm placing is the price, as opposed to the implicit machinery of joining tables together on the relationship key.
I almost always use the ANSI 92 joins because it makes it clear that these conditions are for JOINING.
Typically I write it this way
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
sometimes I write it this way when it trivial
FROM
foo f
INNER JOIN bar b ON f.id = b.id
INNER JOIN baz b2 ON b.id = b2.id
When its not trivial I do the first way
e.g.
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
and b.type = 1
or
FROM
foo f
INNER JOIN (
SELECT max(date) date, id
FROM foo
GROUP BY
id) lastF
ON f.id = lastF.id
and f.date = lastF.Date
Or really the weird (not sure if I got the parens correctly but its supposed to be an LEFT join to table bar but bar needs an inner join to baz)
FROM
foo f
LEFT JOIN (bar b
INNER JOIN baz b2
ON b.id = b2.id
)ON f.id = b.id
You should put joins in Join clauses which means the From clause. A different question could be had about where to put filtering statements.
With respect to indenting, there are many styles. My preference is to indent related joins and keep main clauses like Select, From, Where, Group By, Having and Order By indented at the same level. In addition, I put each of these main attributes and the first line of an On clause on its own line.
Select ..
From Table1
Join Table2
On Table2.FK = Table1.PK
And Table2.OtherCol = '12345'
And Table2.OtherCol2 = 9876
Left Join (Table3
Join Table4
On Table4.FK = Table3.PK)
On Table3.FK = Table2.PK
Where ...
Group By ...
Having ...
Order By ...
Use the FROM clause to be compliant with ANSI-92 standards.
This:
select *
from a
inner join b
on a.id = b.id
where a.SomeColumn = 'x'
Not this:
select *
from a, b
where a.id = b.id
and a.SomeColumn = 'x'
I definitely always do my JOINS (of whatever type) in my FROM clause.
The way I indent them is this:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
In fact, I'll generally go a step farther and move as much of my constraining logic from the WHERE clause to the FROM clause, because this (at least in MS SQL) front-loads the constraint, meaning that it reduces the size of the recordset sooner in the query construction (I've seen documentation that contradicts this, but my execution plans are invariably more efficient when I do it this way).
For example, if I wanted to only select things in the above query where t3.id = 3, you could but that in the WHERE clause, or you could do it this way:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
AND
t3.id = 3
I personally find queries laid out in this way to be very readable and maintainable, but this is certainly a matter of personal preference, so YMMV.
Regardless, I hope this helps.
ANSI joins. I omit any optional keywords from the SQL as they only add noise to the equation. There's no such thing as a left inner join, is there? And by default, a simple join is an inner join, so there's no particular point to saying 'inner join'.
Then I column align things as much as possible.
The point being that a large complex SQL query can be very difficult to comprehend, so the more order that is imposed on it to make it more readable, the better. Any body looking at the query to fix, modify or tune it, needs to be able to answer a few things off right off the bat:
what tables/views are involved in the query?
what are the criteria for each join? What's the cardinality of each join?
what/how many columns are returned by the query
I like to write my queries so they look something like this:
select PatientID = rpt.ipatientid ,
EventDate = d.dEvent ,
Side = d.cSide ,
OutsideHistoryDate = convert(nchar, d.devent,112) ,
Outcome = p.cOvrClass ,
ProcedureType = cat.ctype ,
ProcedureCategoryMajor = cat.cmajor ,
ProcedureCategoryMinor = cat.cminor
from dbo.procrpt rpt
join dbo.procd d on d.iprocrptid = rpt.iprocrptid
join dbo.proclu lu on lu.iprocluid = d.iprocluid
join dbo.pathlgy p on p.iProcID = d.iprocid
left join dbo.proccat cat on cat.iproccatid = lu.iproccatid
where procrpt.ipatientid = #iPatientID