SQL join performance operation order - sql

I am trying to come up with how to order a join query to improve its performance.
Lets say we have two tables to join, to which some filters must be applied.
Is it the same to do:
table1_result = select * from table1 where field1 = 'A';
table2_result = select * from table2 where field1 = 'A';
result = select * from table1 as one inner join table2 as two on one.field1 = two.field1;
to doing this:
result = select * from table1 as one inner join table2 as two on one.field1 = two.field1
where one.field1 = 'A' and two.field1 = 'A';
or even doing this:
result = select * from table1 as one inner join table2 as two on one.field1 = two.field1 and one.field1 = 'A';
Thank you so much!!

Some common optimization techniques to improve your queries are here:
Index the columns used in joining. If they are foreign keys, normally databases like MySql already index them.
Index the columns used in conditions or WHERE clause.
Avoid * and explicitly select the columns that you really need.
The order of joining in most of the cases won't matter, because DB-Engines are inteligent enough to decide that.
So its better to analyze your structure of both the joining tables, have indexes in place.
And if anyone is further intrested, how changing conditions order can help getting the better performance. I've a detailed answer over here mysql Slow query issue.

Related

Joins with WHERE - splitting WHERE clauses

I solved the query at this link
Can you return a list of characters and TV shows that are not named "Willow Rosenberg" and not in the show "How I Met Your Mother"?
with the following code:
SELECT ch.name,sh.name
FROM character ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE ch.name != "Willow Rosenberg" AND sh.name !="How I Met Your Mother"
;
However, my first try was:
SELECT ch.name,sh.name
FROM character ch
WHERE ch.name != "Willow Rosenberg" /*This here*/
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
;
because I thought that in this way only the table character would have been filtered before doing the joins and, therefore, it would have been less computationally heavy.
Does it make any sense?
Is there a way to "split" the WHERE clause when joining multiple tables?
Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.
You can using subqueries
SELECT ch.name,sh.name
FROM (
SELECT ch.name
FROM character ch
WHERE ch.name != "Willow Rosenberg") ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
but i think it don't have sense. subqueries will make temp table.
First query will be optimized by database server, and likely select only rows from character table that need
JOIN and WHERE clauses are not necessarily executed in the order you write them. In general, the query optimizer will rearrange things to make them as efficient as possible (or at least what it thinks is most efficient), so adding a second WHERE clause wouldn't be any different from adding another AND condition (which is why it's not allowed).
Your idea wasn't bad, but it's just not how databases actually work.
A SELECT can only have 1 WHERE clause.
And it comes after the JOIN's.
But you can have additional WHERE clauses in the sub-queries you join.
And sometimes a criteria that you've added to a WHERE clause can be moved to the ON of a JOIN.
For example the queries below would return the same results
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t1.Col1 = 'foo'
AND t2.Col1 = 'bar'
SELECT *
FROM
(
SELECT *
FROM Table1
WHERE Col1 = 'foo'
) AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t2.Col1 = 'bar'
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON (t2.ID = t1.table2ID AND t2.Col1 = 'bar')
WHERE t1.Col1 = 'foo'

Which is better for performance, selecting all the columns or select only the required columns while performng join?

I am been asked to do performance tuning of a SQL Server query which has so many joins in it.
For example
LEFT JOIN
vw_BILLABLE_CENSUS_R CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
There are almost 25 columns present in vw_Billing_Cenus_R but we wanted to use only 3 of them. So I wanted to know instead of selecting all the columns from the view or table, if I only select those columns which are required and then perform join like this
LEFT JOIN (SELECT [Column_1], [Column_2], [Column_3]
FROM vw_BILLABLE_CENSUS_R) CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
So Will this improve the performance or not?
The important part is the columns you are actually using on the outmost SELECT, not the ones to are selecting to join. The SQL Server engine is smart enough to realize that he does not need to retrieve all columns from the referenced table (or view) if he doesn't need them.
So the following 2 queries should yield the exact same query execution plan:
SELECT
A.SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
*
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
SELECT
A.SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.SomeColumn
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
The difference would be if you actually use the selected column (in a conditional where or actually retrieving the value), as in here:
SELECT
A.SomeColumn,
X.* -- * has all X columns
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.*
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
SELECT
A.SomeColumn,
X.* -- * has only X's SomeColumn
FROM
MyTable AS A
LEFT JOIN (
SELECT
B.SomeColumn
FROM
OtherTable AS B) AS X ON A.SomeColumn = X.SomeColumn
I would rather use this approach:
LEFT JOIN
vw_BILLABLE_CENSUS_R CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
than this
LEFT JOIN (SELECT [Column_1], [Column_2], [Column_3]
FROM vw_BILLABLE_CENSUS_R) CEN ON DE.Client = CEN.Client
AND CAL.REPORTING_MONTH = CEN.REPORTING_MONTH
Since in this case:
you make your query simpler,
you does not have to rely on query optimizer smartness and expect that it will eliminate unnecessary columns and rows
finally, you can select as many columns in the outer SELECT as necessary without using derived tables techniques.
In some cases, derived tables are welcome, when you want to eliminate duplicates in a table you want to join on a fly, but, imho, not in your case.
It depends on how many records are stored, but generally it will improve performance.
In this case read #LukStorms ' comments, I think he is right

sql left join explanation

Found this code example online regarding sql left joins and I want to make sure i get it correctly ( since I am no expert )
SELECT table1.column1, table2.column2...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field AND table1.common_field_2 = table2.common_field_2
WHERE table1.column3 = ... AND table2.common_field IS NULL
My question comes for the AND table2.common_field IS NULL part and how it affects the ON above.
For me it seems that join result will contain only those that they exist on table1, but not on table2 based on the common_field.
Is that correct? Can it be written simpler since the above seems confusing to me.
The first step in any SQL development is to check the data that is actually stored in the tables you intend to use in your query.
How the data is stored will affect the results of the query, particularly when filtering for NULLs or checking for the existence of a row.
Using EXISTS or NOT EXISTS to check for existence/non-existence of one or more rows is very effective, providing the WHERE clause within the EXISTS sub-query doesn't have conflicting logic (e.g. NOT EXISTS and <> are used together), which can be confusing and produce results that are difficult to test.
Does table2.common_field contain any NULLs? If it does, it would be wise to filter on those in a nested query, CTE or view first, then use the results of that in the main query.
If table2.common_field doesn't contain NULLs or has a NOT NULL constraint, then perhaps you are using table2.common_field IS NULL to filter on the results of the LEFT JOIN, where there is no match on the join criteria for table2. If this is the case and you want to stick with using LEFT JOIN, I recommend to nest your query and filter on the NULL in the outer query.
Here's a couple of options:
Option 1: Use LEFT JOIN, filter on NULL in the outer query.
Note the careful use of an alias for table2.common_field which is important.
SELECT
result.*
FROM
(
SELECT table1.column1, table2.column2, table2.common_field as table2_common_field...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field AND table1.common_field_2 = table2.common_field_2
WHERE table1.column3 = ...
) result
WHERE result.table2_common_field IS NULL;
Option 2 (recommended): Use NOT EXISTS.
SELECT table1.column1, table2.column2...
FROM table1
WHERE NOT EXISTS (
select 1
from table2
where table2.common_field = table1.common_field
AND table2.common_field_2 = table1.common_field_2
)
AND table1.column3 = ...

SQL subquery to joins -

Is it possible to remove the subquery from this SQL?
Table has 2 attributes "id" and "field"
Many field could have the same Id.
These table has many registers with the same Id and different Value
In need get all same Id values using one of them like filter.
select *
from Table
where id = (select id from Table where value = 'someValue')
I think it could be really easy but I don't know how to do.
Self Join can be done
select T.Id,T.Field
from Table T
INNER JOIN Table TT
ON T.ID = TT.ID
AND TT.Value = 'someValue'
Not sure if you over simplified your example too much but you could make this a little simpler.
select *
from Table
where value = 'someValue'
This should work
select T1.* from Table T1 JOIN Table T2 ON T1.id = T2.id AND T2.value = 'someValue'
Edited (Correct Answer):
What I assume your problem is:
You have a value. Let´s pretend it´s "testValue". Now you want to get the id of this value and find all other datasets with the same id.
What has to be cleared is that, "ID" is not the Primary Key and is not Unique.
You should be able to solve this by a simple self join:
select t.* from Table t right join Table tt on tt.id = t.id where tt.value = 'someValue';
So because of the join you will get a result that returns simply the table. With the where clause you shrink the result to your value. You should get the set of ids.
Old Answer:
This should do the trick:
select * from Table a inner join Table2 b on a.id = b.id where b.value = 'someValue';
You mentioned only one table in your question. I think this must be a mistake. If not, you have to change only the Table2 in my query. But that would have no sense as you could do a simple query, too:
select * from Table where value = 'someValue';
this would be the result of the first query with a self join.

In clause versus OR clause performance wise

I have a query as below:
select *
from table_1
where column_name in ('value1','value2','value3');
considering that the data in such a table may be in millions, will the below restructuring help better??
select *
from table_1 where
column_name = 'value1'
or column_name = 'value2'
or column_name ='value3';
or
select *
from table_1
where column_name = any ('value1','value2','value3');
I need to know performance benefits also if possible.
Thanks in advance
the query doesn't matter much in case of 3 value checking only.
Oracle will re-write the query anyways to match the best option available.
in case there were more values and that too dynamic then the in clause or inner join could have been better.
its best to leave the query as it is currently
There is a 3rd way which is faster than 'IN' or multiple 'WHERE' conditions:
select *
from table_1 as tb1
inner join table_2 as tb2
where tb1.column_name = tb2.column_name
Here table_2 (or query) would have required values that were listed in 'IN' and 'WHERE' conditions in your example.