join without a join in PostgresSQL - sql

I have code with the following psql query that I try to understand (It was simplified for the question):
select id, time, t.type, t.sub_type from TABLE_A r,TABLE_B t,TABLE_C l where r.type = t.type and t.type = l.type
Is it the same as doing a join, or am I missing something here?

You are doing an old school implicit join. This is the syntax used before the ANSI-92 SQL standard which started recommending explicit joins like this version of your query:
select id, time, t.type, t.sub_type -- but please always use aliases here
from TABLE_A r
inner join TABLE_B t
on r.type = t.type
inner join TABLE_C l
on t.type = l.type;
The above version is considered the "correct" way of expressing your logic. Explicit joins solved a number of problems inherent with implicit joins. Implicit joins place both the join logic and the filter criteria into the same WHERE clause, potentially making it difficult to tease apart what is happening there. In addition, it was harder to express left/right/inner/cross join using the implicit syntax, with different vendors sometimes having different syntax. The ANSI-92 explicit join is pretty much the same across all the major SQL vendors at this point.

Related

SQL INNER JOIN implemented as implicit JOIN

Recently, I came across an SQL query which looked like this:
SELECT * FROM A, B WHERE A.NUM = B.NUM
To me, it seems as if this will return exactly the same as an INNER JOIN:
SELECT * FROM A INNER JOIN B ON A.NUM = B.NUM
Is there any sane reason why anyone would use a CROSS JOIN here? Edit: it seems as if most SQL applications will automatically use a INNER JOIN here.
The database is HSQLDB
The older syntax is a SQL antipattern. It should be replaced with an inner join anytime you see it. Part of why it is an antipattern is because it is impoosible to tell if a cross join was intended or not if the where clasues is ommitted. This causes many accidental cross joins espcially in complex queries. Further, in some databases (espcially Sql server) the implict outer joins do not work correctly and so people try to combine explicit and implict joins and get bad results without even realizing it. All in all it is a poor practice to even consider using an implict join.
Yes, your both statements will return the same result. Which one is to be used is a matter of taste. Every sane database system will use a join for both if possible, no sane optimizer will really use a cross product in the first case.
But note that your first syntax is not a cross join. It is just an implicit notation for a join which does not specify which kind of join to use. Instead, the optimizer must check the WHERE clauses to determine whether to use an inner join or a cross join: If an applicable join condition is found in the WHERE clause, this will result in an inner join. If no such clause is found it will result in a cross join. Since your first example specifies an applicable join condition (WHERE A.NUM = B.NUM) this results in an INNER JOIN and thus exactly equivalent to your second case.

What if i dont use Join Keyword in query?

I have a query where i am retrieving data from more than two tables. I am using the filter criteria in where clause but not using any join keyword
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d, h_claim_hdr hh, car_svc s
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
and
d.audit_nbr=hh.audit_nbr
and
s.car_svc_nbr=hh.aut_nbr
Is there a better way of writing this?
Although you are not using a JOIN keyword, your query does perform a JOIN.
A more "modern" way of writing your query (i.e. one following the ANSI SQL standard) would be as follows:
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d
join
h_claim_hdr hh on d.audit_nbr=hh.audit_nbr
join
car_svc s on s.car_svc_nbr=hh.aut_nbr
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
Note that this is simply a modern syntax. It expresses the same query, and it will not impact the performance.
Note that in order for a row to appear in the output of this query, the corresponding rows must exist in all three queries (i.e. it's an inner join). If you would like to return rows of claim_detail for which no h_claim_hdr and / or car_svc existed, use left outer join instead.
A comma in the from clause is essentially the same as a cross join. You really don't want to use a cross join, unless you really know what you are doing.
Proper join syntax has several advantages. The most important of which is the ability to express other types of joins easily and compatibly across databases.
Most people would probably find this version easier to follow and maintain:
select d.proc_code, d.dos, s.svc_type
from claim_detail d join
h_claim_hdr hh
on d.bu_id = hh.bu_id and d.audit_nbr = hh.audit_nbr
car_svc s
on d.bu_id = s.bu_id and s.car_svc_nbr = hh.aut_nbr
where d.bu_id = "$inp_bu_id"
d.audit_nbr = "$inp_audit_nbr";
Using the WHERE clause instead of the JOIN keyword is essentially a different syntax for doing a join. I believe it is called Theta syntax, where using the JOIN clause is called ANSI syntax.
I believe ANSI syntax is almost universally recommended, and some databases require ANSI syntax for outer JOINs.
If you do not use JOIN it will be an implicit inner join. As is in your example with the join criteria on your WHERE clause. So you could me missing records. Lets say you want all records from the first table even if there is not a corresponding record in the second. Your current code would only return the records from the first table that have a matching record in the second.
Joins

Recommended way to write database query

This anything wrong with this database query
select
abstract_author.name,
title,
affiliation_number,
af_name
from
abs_affiliation_name,
abstract_affiliation,
abstracts_item,
abstract_author,
authors_abstract
where
abstracts_item._id = authors_abstract.abstractsitem_id and
abstract_author._id = authors_abstract.abstractauthor_id and
abstract_affiliation._id = abstract_author._id and
abs_affiliation_name._id = abstracts_item._id
I'm getting my expected result. But, someone said It's not recommended way or a good practice. Would you please tell me what is recommended way to write my query(I mean which have joins) ?
It's not recommended to do your joins in the where clause. Instead it's better to use explicit JOIN conditions. So your query would be
SELECT
abstract_author.name
, title
, affiliation_number
, af_name
FROM abstracts_item
JOIN authors_abstract ON abstracts_item._id = authors_abstract.abstractsitem_id
JOIN abstract_author ON abtract_author.id = authors_abstract.abstractauthor_id
JOIN abstract_affiliation ON abstract_affiliation._id = abstract_author._id
JOIN abs_affiliation_name ON abs_affiliation_name._id = abstracts_item.id
I'd highly recommend you using aliases on your tables though as you'll avoid confusion. In this example, if you introduced a title field to one of the other tables, the query would most likely break as it would know which table to target. I'd do something like
SELECT
au.name
, af.title
, af.affiliation_number
, af.af_name
FROM abstracts_item ai
JOIN authors_abstract aa ON ai._id = aa.abstractsitem_id
JOIN abstract_author au ON au.id = aa.abstractauthor_id
JOIN abstract_affiliation af ON af._id = au._id
JOIN abs_affiliation_name an ON an._id = ai.id
You'll need to change the aliases in the select bit though as I've guessed which tables they're from
I recommend you to use joins and aliases as below
select aath.name, /*alias*/title, /*alias*/affiliation_number,/*alias*/af_name
from abs_affiliation_name aan
join abstracts_item ai on aan._id = ai._id
join abstract_affiliation aa on aa._id = aath._id
join authors_abstract aAbs on ai._id = aAbs.abstractsitem_id
join abstract_author aath on aath._id = aAbs.abstractauthor_id
No there is nothing wrong with your query. It is personal preference, the ANSI-89 impicit joins you have used are however over 20 years out of date, they were replaced in ANSI-92 with explicit JOIN syntax.
Aaron Bertrand has written a compelling article on why in most instances it is prefereable to use the newer join syntax, and the potential pitfalls of using ANSI-89 joins. In most cases the execution plan for both methods will be exactly the same (assuming you haven't accidentally cross joined with implict joins). It is worth noting though that on occassion Oracle will produce different execution plans and the ANSI-89 join syntax can produce the more efficient of the two. (I have seen an example of this posted in response to one of my answers but I can't find it at the moment, you'll have to take my word for it for now). I would not however use this as a reason to always use ANSI-89 joins, another key reason to use the ANSI-92 join syntax is that outer joins can be achieved with ANSI syntax, whereas the outer join syntax on implicit joins varies by DBMS.
e.g. on Oracle
SELECT *
FROM a, b
WHERE a.id = b.id(+)
On SQL-Server (deprecated)
SELECT *
FROM a, b
WHERE a.id *= b.id
However, the following works on both:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
If you always use explicit joins you end up with more consistent (and in my opinion more readable) queries.

Queries that implicit SQL joins can't do?

I've never learned how joins work but just using select and the where clause has been sufficient for all the queries I've done. Are there cases where I can't get the right results using the WHERE clause and I have to use a JOIN? If so, could someone please provide examples? Thanks.
Implicit joins are more than 20 years out-of-date. Why would you even consider writing code with them?
Yes, they can create problems that explicit joins don't have. Speaking about SQL Server, the left and right join implicit syntaxes are not guaranteed to return the correct results. Sometimes, they return a cross join instead of an outer join. This is a bad thing. This was true even back to SQL Server 2000 at least, and they are being phased out, so using them is an all around poor practice.
The other problem with implicit joins is that it is easy to accidentally do a cross join by forgetting one of the where conditions, especially when you are joining too many tables. By using explicit joins, you will get a syntax error if you forget to put in a join condition and a cross join must be explicitly specified as such. Again, this results in queries that return incorrect values or are fixed by using distinct to get rid of the cross join which is inefficient at best.
Moreover, if you have a cross join, the maintenance developer who comes along in a year to make a change doesn't know if it was intended or not when you use implicit joins.
I believe some ORMs also now require explicit joins.
Further, if you are using implied joins because you don't understand how joins operate, chances are high that you are writing code that, in fact, does not return the correct result because you don't know how to evaluate what the correct result would be since you don't understand what a join is meant to do.
If you write SQL code of any flavor, there is no excuse for not thoroughly understanding joins.
Yes. When doing outer joins. You can read this simple article on joins. Joins are not hard to understand at all so you should start learning (and using them where appropriate) right away.
Are there cases where I can't get the right results using the WHERE clause and I have to use a JOIN?
Any time your query involves two or more tables, a join is being used. This link is great for showing the differences in joins with pictures as well as sample result sets.
If the join criteria is in the WHERE clause, then the ANSI-89 JOIN syntax is being used. The reason for the newer JOIN syntax in the ANSI-92 format, is that it made LEFT JOIN more consistent across various databases. For example, Oracle used (+) on the side that was optional while in SQL Server you had to use =*.
Implicit join syntax by default uses Inner joins. It is sometimes possible to modify the implicit join syntax to specify outer joins, but it is vendor dependent in my experience (i know oracle has the (-) and (+) notation, and I believe sqlserver uses *= ). So, I believe your question can be boiled down to understanding the differences between inner and outer joins.
We can look at a simple example for an inner vs outer join using a simple query..........
The implicit INNER join:
select a.*, b.*
from table a, table b
where a.id = b.id;
The above query will bring back ONLY rows where the 'a' row has a matching row in 'b' for it's 'id' field.
The explicit OUTER JOIN:
select * from
table a LEFT OUTER JOIN table b
on a.id = b.id;
The above query will bring back EVERY row in a, whether or not it has a matching row in 'b'. If no match exists for 'b', the 'b' fields will be null.
In this case, if you wanted to bring back EVERY row in 'a' regardless of whether it had a corresponding 'b' row, you would need to use the outer join.
Like I said, depending on your database vendor, you may still be able to use the implicit join syntax and specify an outer join type. However, this ties you to that vendor. Also, any developers not familiar wit that specialized syntax may have difficulty understanding your query.
Any time you want to combine the results of two tables you'll need to join them. Take for example:
Users table:
ID
FirstName
LastName
UserName
Password
and Addresses table:
ID
UserID
AddressType (residential, business, shipping, billing, etc)
Line1
Line2
City
State
Zip
where a single user could have his home AND his business address listed (or a shipping AND a billing address), or no address at all. Using a simple WHERE clause won't fetch a user with no addresses because the addresses are in a different table. In order to fetch a user's addresses now, you'll need to do a join as:
SELECT *
FROM Users
LEFT OUTER JOIN Addresses
ON Users.ID = Addresses.UserID
WHERE Users.UserName = "foo"
See http://www.w3schools.com/Sql/sql_join.asp for a little more in depth definition of the different joins and how they work.
Using Joins :
SELECT a.MainID, b.SubValue AS SubValue1, b.SubDesc AS SubDesc1, c.SubValue AS SubValue2, c.SubDesc AS SubDesc2
FROM MainTable AS a
LEFT JOIN SubValues AS b ON a.MainID = b.MainID AND b.SubTypeID = 1
LEFT JOIN SubValues AS c ON a.MainID = c.MainID AND b.SubTypeID = 2
Off-hand, I can't see a way of getting the same results as that by using a simple WHERE clause to join the tables.
Also, the syntax commonly used in WHERE clauses to do left and right joins (*= and =*) is being phased out,
Oracle supports LEFT JOIN and RIGHT JOIN using their special join operator (+) (and SQL Server used to support *= and =* on join predicates, but no longer does). But a simple FULL JOIN can't be done with implicit joins alone:
SELECT f.title, a.first_name, a.last_name
FROM film f
FULL JOIN film_actor fa ON f.film_id = fa.film_id
FULL JOIN actor a ON fa.actor_id = a.actor_id
This produces all films and their actors including all the films without actor, as well as the actors without films. To emulate this with implicit joins only, you'd need unions.
-- Inner join part
SELECT f.title, a.first_name, a.last_name
FROM film f, film_actor fa, actor a
WHERE f.film_id = fa.film_id
AND fa.actor_id = a.actor_id
-- Left join part
UNION ALL
SELECT f.title, null, null
FROM film f
WHERE NOT EXISTS (
SELECT 1
FROM film_actor fa
WHERE fa.film_id = f.film_id
)
-- Right join part
UNION ALL
SELECT null, a.first_name, a.last_name
FROM actor a
WHERE NOT EXISTS (
SELECT 1
FROM film_actor fa
WHERE fa.actor_id = a.actor_id
)
This will quickly become very inefficient both syntactically as well as from a performance perspective.

SQL JOIN: ON vs Equals

Is there any significant difference between the following?
SELECT a.name, b.name FROM a, b WHERE a.id = b.id AND a.id = 1
AND
SELECT a.name, b.name FROM a INNER JOIN b ON a.id = b.id WHERE a.id = 1
Do SO users have a preference of one over the other?
There is no difference, but the readability of the second is much better when you have a big multi-join query with extra where clauses for filtering.
Separating the join clauses and the filter clauses is a Good Thing :)
The former is ANSI 89 syntax, the latter is ANSI 92.
For that specific query there is no difference. However, with the former you lose the ability to separate a filter from a join condition in complex queries, and the syntax to specify LEFT vs RIGHT vs INNER is often confusing, especially if you have to go back and forth between different db vendors. There are also certain kinds of join that cannot be written with the old syntax.
In fact, the former syntax has been obsolete for more than 30 years now, and should not be used for new development.
There is no difference to the sql query engine.
For readability, the latter is much easier to read if you use linebreaks and indentation.
For INNER JOINs, it does not matter if you put "filters" and "joins" in ON or WHERE clause, the query optimizer should decide what to do first anyway (it may chose to do a filter first, a join later, or vice versa
For OUTER JOINs however, there is a difference, and sometimes youll want to put the condition in the ON clause, sometimes in the WHERE. Putting a condition in the WHERE clause for an OUTER JOIN can turn it into an INNER JOIN (because of how NULLs work)
For example, check the readability between the two following samples:
SELECT c.customer_no, o.order_no, a.article_no, r.price
FROM customer c, order o, orderrow r, article a
WHERE o.customer_id = c.customer_id
AND r.order_id = o.order_id
AND a.article_id = r.article_id
AND o.orderdate >= '2003-01-01'
AND o.orderdate < '2004-01-01'
AND c.customer_name LIKE 'A%'
ORDER BY r.price DESC
vs
SELECT c.customer_no, o.order_no, a.article_no, r.price
FROM customer c
INNER JOIN order o
ON o.customer_id = c.customer_id
AND o.orderdate >= '2003-01-01'
AND o.orderdate < '2004-01-01'
INNER JOIN orderrow r
ON r.order_id = o.order_id
INNER JOIN article a
ON a.article_id = r.article_id
WHERE c.customer_name LIKE 'A%'
ORDER BY r.price DESC
Whilst you can perform most tasks using both and in your case there is no difference whatsoever, I will always use the second at all times.
It's the current supported standard
It keeps joins in the FROM clause and filters in the WHERE clause
It makes more complex LEFT, RIGHT, FULL OUTER joins much easier
MSSQL Help is all based around that syntax therefore much easier to get help on your problem queries
While there is no difference technically, you need to be extra careful about doing joins using the first method. If you get it wrong by accident, you could end up doing a cartesian join between your a and b tables (a very long, memory & cpu intensive query - it will match each single row in a with all rows in b. Bad if a and b are large tables to begin with). Using an explicit INNER JOIN is both safer and easier to read.
No difference. I find the first format more readable and use the second format only when doing other types of joins (OUTER, LEFT INNER, etc).
The second form is SQL92 compliant syntax. This should mean that it is supported by all current and future databases vendors. However , the truth is that the first form is so pervasive that it is also guaranteed to be around for longer than we care.
Otherwise they are same in all respects in how databases treat the two.