SQL JOIN where to place the WHERE condition? - sql

I have two following examples.
1. Example (WHERE)
SELECT 1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t2.field = true
2. Example (JOIN AND)
SELECT 1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id AND t2.field = true
What is the faster way in terms of performance? What do you prefer?

If a filter enters in a JOIN condition functionally (i.e. it is an actual join condition, not just a filter), it must appear in the ON clause of that join.
Worth noting:
If you place it in the WHERE clause instead, the performances are the same if the join is INNER, otherwise it differs. As mentioned in the comments it does not really matter since anyway the outcome is different.
Placing the filter in the WHERE clause when it really is an OUTER JOIN condition implicitely cancels the OUTER nature of the condition ("join even when there are no records") as these filters imply there must be existing records in the first place. Example:
... table1 t LEFT JOIN table2 u ON ... AND t2.column = 5 is correct
... table1 t LEFT JOIN table2 u ON ...
WHERE t2.column = 5
is incorrect, as t2.column = 5 tells the engine that records from t2 are expected, which goes against the outer join. Exception to this would be an IS NULL filter, such as WHERE t2.column IS (NOT) NULL (which is in fact a convenient way to build conditional outer joins)
LEFT and RIGHT joins are implicitely OUTER joins.
Hope it helped.

JOIN conditions should normally be independent from filter conditions. You define rules of your join (the how) with ON. You filter what you want with WHERE. Performance wise, there's no general rule across all engines and designs, so your mileage will vary greatly.

I think the faster way is to put the filter in the where clause, because it will procees that filter in the where first , and then the join clause, so there will be no need of permutation of filters.

Related

joining three tables between each other

Is it possible to join three tables in this way .
select T1.[...],T2.[...],T3.[...]
from T1
full outer join T2 on T1.[key]=T2.[key]
full outer join T3 on T1.[key]=T3.[key]
full outer join T2 on T2.[key]=T3.[key]
My question is : Is this a valid Form?
And if no is there a way to do such operation?
It is "valid" but the full joins are not correct. The on conditions will change them to some other type of join.
Your query has other errors. But I speculate that you want:
select T1.[...], T2.[...], T3.[...]
from T1 full join
T2
on T2.[key] = T1.[key] full join
T3 join
on T3.[key] = coalesce(T2.[key], T1.[key]);
It is possible to join three tables, and your example could run with some changes, but you have syntax and scoping errors in the FROM clause.
Even those aside, I don't think it will do what you intend it to do. You'll probably want to use GROUP BY
See the examples / discussion here :
Multiple FULL OUTER JOIN on multiple tables
I also used this site as a source, as its been a while since I've touched SQL, it may be helpful to you also :
https://learnsql.com/blog/how-to-join-3-tables-or-more-in-sql/

Joining multiple tables: where to filter efficiently

I have a number of tables, around four, that I wish to join together. To make my code cleaner and readable (to me), I wish to join all at once and then filter at the end:
SELECT f1, f2, ..., fn
FROM t1 INNER JOIN t2 ON t1.field = t2.field
INNER JOIN t3 ON t2.field = t3.field
INNER JOIN t4 ON t3.field = t4.field
WHERE // filters here
But I suspect that placing each table in subqueries and filtering in each scope would make performance better.
SELECT f1, f2, ..., fn
FROM (SELECT t1_f1, t1_f2, ..., t1_fi FROM t1 WHERE // filter here) AS a
INNER JOIN
(SELECT t2_f1, t2_f2, ..., t2_fj FROM t2 WHERE // filter here) AS b
ON // and so on
Kindly advise which would lead to better performance and/or if my hunch is correct. I am willing to sacrifice performance to readability.
If indeed filtering in each subquery will be more efficient, does the architecture of database platform would make any difference or is this holds true for all RDBMS SQL flavors?
I'm using both SQL Server and Postgres.
The query optimizer will always attempt to take care of finding the most optimal plan from your SQL.
You should concentrate more on writing readable, maintainable code and then by analyzing the execution plan find the inefficient parts of your query (and more likely) the inefficient parts of your database and indexing design.
Moving your filtering around from the where clause to the join clause without any meaningful analysis is likely to be wasted effort.
Your first approach will always be better as the SQL engine will evaluate where conditions first and then perform joins. So while evaluating where clause, it will filter records if conditions are available.
SELECT f1, f2, ..., fn
FROM t1 INNER JOIN t2 ON t1.field = t2.field
INNER JOIN t3 ON t2.field = t3.field
INNER JOIN t4 ON t3.field = t4.field
WHERE // filters here
Join will always perform better if you have indexed properly.

which way is better performance (select nested tables or join)? [duplicate]

This question already has answers here:
Explicit vs implicit SQL joins
(12 answers)
Closed 9 years ago.
SELECT * FROM dbo.table1,
dbo.table2 AS T2,
dbo.table3 AS T3,
dbo.table4 AS T4
WHERE dbo.table1.ID = T2.ID
AND T2.ID = T3.ID
AND T3.ID = T4.ID
(OR)
SELECT
*
FROM dbo.table1 T1
INNER JOIN dbo.table2 T2 ON T1.ID = T2.ID
INNER JOIN dbo.table3 T3 ON T2.ID = T3.ID
INNER JOIN dbo.table4 T4 ON T3.ID = T4.ID
Both have no difference.It is better to stay away from “comma joins” because a) the ANSI join syntax is more expressive and you’re going to use it anyway for LEFT JOIN, and mixing styles is asking for trouble, so you might as well just use one style; b) ANSI style is clearer.
Both will take same time to execute, there is no performance difference .
Without Join keyword it behave as Cross Joins, produce results that consist of every combination of rows from two or more tables. That means if table table2 has 6 rows and table table3 has 3 rows, a cross join will result in 18 rows. There is no relationship established between the two tables – you literally just produce every possible combination.
With an inner join, column values from one row of a table are combined with column values from another row of another (or the same) table to form a single row of data.
If a WHERE clause is added to a cross join, it behaves as an inner join as the WHERE imposes a limiting factor.
In both the cases you mentioned above, there wont be any difference in the way sql engine executes them in the background. The only thing affects on performance is how effective are your indexes on joining columns in case of join and where clause in case of comma separated tables names.
So just make sure you have proper indexes,statistics updated etc
And one more important thing is you are using select "*", if possible try to use only the columns you are interested.
Both are joins, first is implicit, which will perform cross join as pointed in previous answer, the latter one is an explicit inner join notion. Though it should not make a difference in terms of performance.

Difference between "and" and "where" in joins

Whats the difference between
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
and table2.Id IN (2728)
and
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
where table2.Id IN (2728)
both return the same result and both have the same explain output
Firstly there is a semantic difference. When you have a join, you are saying that the relationship between the two tables is defined by that condition. So in your first example you are saying that the tables are related by cd.Company = table2.Name AND table2.Id IN (2728). When you use the WHERE clause, you are saying that the relationship is defined by cd.Company = table2.Name and that you only want the rows where the condition table2.Id IN (2728) applies. Even though these give the same answer, it means very different things to a programmer reading your code.
In this case, the WHERE clause is almost certainly what you mean so you should use it.
Secondly there is actually difference in the result in the case that you use a LEFT JOIN instead of an INNER JOIN. If you include the second condition as part of the join, you will still get a result row if the condition fails - you will get values from the left table and nulls for the right table. If you include the condition as part of the WHERE clause and that condition fails, you won't get the row at all.
Here is an example to demonstrate this.
Query 1 (WHERE):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
WHERE table2.Id IN (2728);
Result:
field1
200
Query 2 (AND):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
AND table2.Id IN (2728);
Result:
field1
100
200
Test data used:
CREATE TABLE table1 (Company NVARCHAR(100) NOT NULL, Field1 INT NOT NULL);
INSERT INTO table1 (Company, Field1) VALUES
('FooSoft', 100),
('BarSoft', 200);
CREATE TABLE table2 (Id INT NOT NULL, Name NVARCHAR(100) NOT NULL);
INSERT INTO table2 (Id, Name) VALUES
(2727, 'FooSoft'),
(2728, 'BarSoft');
SQL comes from relational algebra.
One way to look at the difference is that JOINs are operations on sets that can produce more records or less records in the result than you had in the original tables. On the other side WHERE will always restrict the number of results.
The rest of the text is extra explanation.
For overview of join types see article again.
When I said that the where condition will always restrict the results, you have to take into account that when we are talking about queries on two (or more) tables you have to somehow pair records from these tables even if there is no JOIN keyword.
So in SQL if the tables are simply separated by a comma, you are actually using a CROSS JOIN (cartesian product) which returns every row from one table for each row in the other.
And since this is a maximum number of combinations of rows from two tables then the results of any WHERE on cross joined tables can be expressed as a JOIN operation.
But hold, there are exceptions to this maximum when you introduce LEFT, RIGHT and FULL OUTER joins.
LEFT JOIN will join records from the left table on a given criteria with records from the right table, BUT if the join criteria, looking at a row from the left table is not satisfied for any records in the right table the LEFT JOIN will still return a record from the left table and in the columns that would come from the right table it will return NULLs (RIGHT JOIN works similarly but from the other side, FULL OUTER works like both at the same time).
Since the default cross join does NOT return those records you can not express these join criteria with WHERE condition and you are forced to use JOIN syntax (oracle was an exception to this with an extension to SQL standard and to = operator, but this was not accepted by other vendors nor the standard).
Also, joins usually, but not always, coincide with existing referential integrity and suggest relationships between entities, but I would not put as much weight into that since the where conditions can do the same (except in the before mentioned case) and to a good RDBMS it will not make a difference where you specify your criteria.
The join is used to reflect the entity relations
the where clause filters down results.
So the join clauses are 'static' (unless the entity relations change), while the where clauses are use-case specific.
There is no difference. "ON" is like a synonym for "WHERE", so t he second kind of reads like:
JOIN table2 WHERE cd.Company = table2.Name AND table2.Id IN (2728)
There is no difference when the query optimisation engine breaks it down to its relevant query operators.

TABLE1 T1, TABLE2 T2 WHERE T1.Blah = T2.Blah - VS - INNER JOIN

Provided that the tables could essentially be inner joined, since the where clause excludes all records that don't match, just exactly how bad is it to use the first of the following 2 query statement syntax styles:
SELECT {COLUMN LIST}
FROM TABLE1 t1, TABLE2 t2, TABLE3 t3, TABLE4 t4 (etc)
WHERE t1.uid = t2.foreignid
AND t2.uid = t3.foreignid
AND t3.uid = t4.foreignid
etc
instead of
SELECT {COLUMN LIST}
FROM TABLE1 t1
INNER JOIN TABLE2 t2 ON t1.uid = t2.foreignid
INNER JOIN TABLE3 t3 ON t2.uid = t3.foreignid
INNER JOIN TABLE4 t4 ON t3.uid = t4.foreignid
I'm not sure if this is limited to microsoft SQL, or even a particular version, but my understanding is that the first scenario does a full outer join to make all possible correlations accessible.
I've used the first approach in the past to optimise queries that access two significantly large stores of data that each have peripheral table joined to them, with the product of those joins coming together late in the query. By allowing each of the "larger" table to join to their respective lookup tables, and only combining a specific subset of each of the larger tables, I found that there were notable speed improvements over introducing the large tables to each other prior to specific filtering.
Under normal (simple joins) circumstance, would it not be far better to use the second scenario? I find it to be more easily readable and it seems like it'll be much faster.
INNER JOIN ON vs WHERE clause
Maybe the best way to answer this is to take a look at how the database handles the query internally. If you're on SQL Server, use Profiler to see how many reads etc. each query takes and the query plan to see what route is being taken through the data. Statistics, skewing etc. will also most likely play a role.
The first query doesn't produce a full OUTER join (which is the union of both LEFT and RIGHT joins). Essentially unless there are some [internal] SQL parser - specific optimizations, both queries are equal.
Personally I would never use the first syntax. It may be the same performancewise but it is harder to maintain and far more subject to accidental cross joins when things get complex. If you miss an ON condition, it will fail the syntax check , if you miss one of the WHERE conditions that is the equivalent of an ON condition, it will happily do a cross join. It is also a syntax that is 17 years out of date for goodness sakes!
Further, the left and right join syntax in the old syntax are broken in SQL Server and do NOT always return the correct results (it can sometimes interpet the results as a corss join instead of an outerjoin) and they have been deprecated and will not be useable at all in the next version. If you need to change one of the queries to use an outer join, then you can be looikng at a major rewrite as it is especially bad to try to mix the two kinds of syntax.