Sql JoiN and Performance - sql

I have a question about JOINS.
Does Sql JOINs reduce performance in a query?
I have query with many JOIN in it . Can I say that the bad performance is come from these JOINS? if yes ,what should I do instead of JOIN in a query?
here is a piece of my query
......
FROM (((((tb_Pinnummern INNER JOIN tb_Fahrzeug ON tb_Pinnummern.SG = tb_Fahrzeug.Motor_SG) INNER JOIN tb_bauteile ON tb_Pinnummern.Bauteil = tb_bauteile.ID) LEFT JOIN Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Plus ON Fehlercodes_akt_Liste_FC_Plus.ID=tb_bauteile.[FC_Plus]) LEFT JOIN Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Minus ON Fehlercodes_akt_Liste_FC_Minus.ID=tb_bauteile.[FC_Minus]) LEFT JOIN Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Unterbrechung ON Fehlercodes_akt_Liste_FC_Unterbrechung.ID=tb_bauteile.[FC_Unterbrechung]) LEFT JOIN Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Aderschl ON Fehlercodes_akt_Liste_FC_Aderschl.ID=tb_bauteile.[FC_Aderschl]
WHERE (((tb_Fahrzeug.ID)=[forms]![frm_fahrzeug]![id]));

Yes it does.. increasing Number of records and joins among tables will increase time of execution.. A LEFT/RIGHT JOIN is absolutely not faster than an INNER JOIN. INDEXING on right column of tables will improve query performance.
If you have too much join in your query and its execution frequency is high, take an alternative i.e. create SQL VIEW or Materialized VIEW (Materialized VIEW - if you are using Oracle).

Well joins obviously need to be processed and this processing will consume cpu, memory and IO.
As well as this we have to consider that joins can perform really, really badly if the right indexes etc are not in place.
However, an SQL join with the correct supporting indexes will produce the result you require faster than any other method.
Just consider what you would need to do to calculate the same result as your SQL above. Read the first table, then sort into the correct order, then read the second table and sort it then merge the two result sets before proceeding to the third table ......
Or read all the rows from the first table and for each row issue SQL to retrieve the matching rows.

Joins will definitely degrade the performance the SQL query that you will be executing.
You should generate the SQL plan for the SQL that you have written and look at methods to reduce the cost of the SQL. Any query analyzing tool should help you with that.
From what I understand in the query that you have defined above, you are trying to fetch all rows from the tables that are in the inner joins and get specific columns (if present) from the tables in the left join.
That being the case, a query written in the below given format should help :
select (select Fehlercodes_akt_Liste_FC_Plus.column1 from Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Plus where Fehlercodes_akt_Liste_FC_Plus.ID=tb_bauteile.[FC_Plus]),
(select Fehlercodes_akt_Liste_FC_Minus.column2 from Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Minus where Fehlercodes_akt_Liste_FC_Minus.ID=tb_bauteile.[FC_Minus]),
(select Fehlercodes_akt_Liste_FC_Unterbrechung.column3 from Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Unterbrechung where Fehlercodes_akt_Liste_FC_Unterbrechung.ID=tb_bauteile.[FC_Unterbrechung]),
(select Fehlercodes_akt_Liste_FC_Aderschl.column4 from Fehlercodes_akt_Liste AS Fehlercodes_akt_Liste_FC_Aderschl where Fehlercodes_akt_Liste_FC_Aderschl.ID=tb_bauteile.[FC_Aderschl]),
<other columns>
FROM
(tb_Pinnummern INNER JOIN tb_Fahrzeug ON tb_Pinnummern.SG = tb_Fahrzeug.Motor_SG)
INNER JOIN tb_bauteile ON tb_Pinnummern.Bauteil = tb_bauteile.ID) as <aliastablename>
WHERE <aliastablename>.ID=[forms]![frm_fahrzeug]![id];

Sql Joins do not at all reduce performance : on the contrary : very often they will exponentially speed up a query, assuming offcousre the underlaying database model is well implemented. Indexes are very important in this matter.

Related

SQL select Query performance is very slow with joins

Please correct the below query to increase performance back-end. I am using Oracle database
Query execution is very slow:
SELECT
A.USER_PROFILE_ID,
B.LAST_NAME||','||B.FIRST_NAME||' - '||B.USER_PROFILE_ID AS EXPR1,
A.DEPARTMENT_CODE_ID,
C.NAME AS EXPR2,
A.EFFECTIVE_DATE,
A.EFFECTIVE_STATUS,
A.INSRT_USER,
A.INSRT_TS,
A.MOD_USER,
A.MOD_TS
FROM
'USER_PROFILE_DEPARTMENT' A,
'USER_PROFILE' B, 'DEPARTMENT_CODE' C
WHERE
A.USER_PROFILE_ID = B.USER_PROFILE_ID
AND A.DEPARTMENT_CODE_ID = C.DEPARTMENT_CODE_ID
ORDER BY
EXPR1
I couldn't find any please help
Avoid using the WHERE clause to join tables. Instead, use the JOIN
keyword to explicitly specify the relationship between the tables being joined.
For example, instead of writing:
SELECT *
FROM "USER_PROFILE_DEPARTMENT" A, "USER_PROFILE" B, "DEPARTMENT_CODE" C
WHERE A.USER_PROFILE_ID = B.USER_PROFILE_ID AND A.DEPARTMENT_CODE_ID = C.DEPARTMENT_CODE_ID
You can use:
SELECT *
FROM "USER_PROFILE_DEPARTMENT" A
INNER JOIN "USER_PROFILE" B ON A.USER_PROFILE_ID = B.USER_PROFILE_ID
INNER JOIN "DEPARTMENT_CODE" C ON A.DEPARTMENT_CODE_ID = C.DEPARTMENT_CODE_ID
Some tips when it comes to performance in SQL:
Use appropriate indexes on the columns used in the JOIN and WHERE
clauses to improve the performance of the query. This can speed up
the process of finding matching rows in the tables being joined.
Consider using sub-queries or derived tables to retrieve the data you need, rather than joining multiple large tables. This can improve performance by reducing the amount of data that needs to be scanned and processed.
Use the LIMIT keyword to limit the number of rows returned by the query, if you only need a small subset of the data. This can reduce the amount of data that needs to be processed and returned, which can improve performance.
If the query is running slowly despite these changes, consider using EXPLAIN or EXPLAIN ANALYZE to understand where the bottlenecks are and how to optimize the query further.

Optimizing OUTER JOIN queries using filters from WHERE clause.(Query Planner)

I am writing a distributed SQL query planner(Query Engine). Data will be fetched from RDBMS(PostgreSQL) nodes involving network I/O.
I want to optimize JOIN queries.
Logical Order of Execution is:
Do JOIN(make use of ON clause)
Apply WHERE clause on the joined result.
I was thinking about applying Filter(WHERE clause specific to a table) first itself, and then do join.
In what cases would that result in wrong results?
Example:
SELECT *
FROM tableA
LEFT JOIN tableB ON(tableA.col1 = tableB.col1)
LEFT JOIN tableC ON(tableB.col2 = tableC.col1)
WHERE tableA.colY < 100 AND tableB.colX > 50
Logical Execution:
joinResult = (tableA left join tableB ON() ) left join tableC ON()
Filter joinResult using given WHERE clause.
Proposed Execution:
filteredA = tableA WHERE tableA.colY < 100
filteredB = tableB WHERE tableB.colX > 50
Result = (filteredA left join filteredB ON(..))left join tableC ON(..)
Can I optimize any query like this? That is filtering the table first and then applying join above that.
Edit:
Some people are confusing and talking about this specific example. I am not talking about this specific example query, I am writing a query planner and I want to handle all type of queries
Please note that, each of the tables is sharded and stored in different machines, and the current execution model is to fetch each of the tables and then do join locally. So if I apply the WHERE filter before fetching, it would be better.
This is actually a complex topic.
We can filter the table in some cases. We can also reorder outer joins and then push the filter quals inside.
I was going through a research paper regarding this, but I haven't completed it yet(may not complete it also).
So for now, for those who are looking for answers, you could probably go through this research paper particularly section 2.2. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.2531&rep=rep1&type=pdf
For now I'm relying on PostgreSQL's planner and taking its output and reconstructing the query for my requirements.

SQL is evaluated Right to Left or Left to Right?

I have a SQL query Like below -
A left JOIN B Left Join C Left JOIN D
Say table A is a big table whereas tables B, C, D are small.
Will Spark join will execute like-
A with B and subsequent results will be joined with C then D
or,
Spark will automatically optimize i.e it will join B, C and D and then
results will be joined with A.
My question is what is order of execution or join evaluation? Does it go left to right or right to left?
Spark can optimize join order, if it has access to information about cardinialities of those joins.
For example, if those are parquet tables or cached dataframes, then it has estimates on total counts of the tables, and can reorder join order to make it less expensive. If a "table" is a jdbc dataframe, Spark may not have information on row counts.
Spark Query Optimizer can also choose a different join type in case if it has statistics (e.g. it can broadcast all smaller tables, and run broadcast hash join instead of sort merge join).
If statistics aren't available, then it'll will just follow the order as in the SQL query, e.g. from left to right.
Update:
I originally missed that all the joins in your query are OUTER joins (left is equivalent to left outer).
Normally outer joins can't be reordered, because this would change result of the query. I said "normally" because sometimes Spark Optimizer can convert an outer join to an inner join (e.g. if you have a WHERE clause that filters out NULLs - see conversion logic here).
For completeness of the answer, reordering of joins is driven by two different codepaths, depending is Spark CBO is enabled or not (spark.sql.cbo.enabled first appeared in Spark 2.2 and is off by default). If spark.sql.cbo.enabled=true and spark.sql.cbo.joinReorder.enabled=true (also off by default), and statistics are available/collected manually through ANALYZE TABLE .. COMPUTE STATISTICS then reordering is based on estimated cardinality of the join I mentioned above.
Proof that reordering only works for INNER JOINS is here (on example of CBO).
Update 2: Sample queries that show that reordering of outer joins produce different results, so outer joins are never reordered :
The order of interpretation of joins does not matter for inner joins. However, it can matter for outer joins.
Your logic is equivalent to:
FROM ((A LEFT JOIN
B
) ON . . . LEFT JOIN
C
ON . . . LEFT JOIN
)
D
ON . . .
The simplest way to think about chains of LEFT JOIN is that they keep all rows in the first table and columns from matching rows in the subsequent tables.
Note that this is the interpretation of the code. The SQL optimizer is free to rearrange the JOINs in any order to arrive at the same result set (although with outer joins this is generally less likely than with inner joins).

Left join Ignore

I have recently noticed that SQL Server 2016 appears to be ignoring left joins if any column is not used in the select or where clause. Also not found in Actual execution plan.
This is good for if anyone added extra join but still not affecting performance.
I have query that took 9 sec, if I add column in select clause for Left join tables but without that only 1 sec.
Can anyone please check and suggest, Is that true or not?
Query with Actual execution plan. You can see there is no any table from left join in execution plan.
I'm not 100% sure what the question is asking, but a SQL optimizer can ignore left join. Consider this type of query:
select a.*
from a left join
b
on a.b_id = b.id;
If b.id is declared as unique (or equivalently a primary key) then the above query returns exactly the same result set as:
select a.*
from a;
I am not per se aware that SQL Server started putting this optimization in 2016. But the optimization is perfectly valid and (I believe) other optimizers do implement it.
Remember, SQL is a declarative language, not a procedural language. The SQL query describes the result set, not how it is produced.
If you have a left join and your matching condition don't return any data from the joined table it will return data as inner join return, when select statement does not contains columns from right tables. Not only in ms server 2016 but most of the DB's.
Left join reduces the performance of the query if there are large amount of data available in join tables.

What are the differences between these?

What are the differences between the two queries?
SELECT CountryMaster.Id
FROM Districts INNER JOIN
CountryMaster ON Districts.CountryId = CountryMaster.Id
SELECT CountryMaster.Id
FROM CountryMaster INNER JOIN
Districts ON Districts.CountryId = CountryMaster.Id
I know the output will be same, but I want to know is there any drastic effects of the same if I neglect positions of tables and columns in complex queries or tables having tons of data like hundreds of thousands of rows.
No difference whatsoever. The order of the joins is irrelevant. The query optimizer inside the database engine will decide on a merge plan to actually process the records from the two tables based on the stored statistics for the data in those tables.
In fact, in many cases, the query optimizer's will generate exactly the same plan for both a query phrased using joins as it would for a query phrased with a correlated sub-query.
The lesson here I have learned is:
Always start with the syntax, or representation, that most clearly represents the meaning of the process you are trying to create, and trust the query optimizer to do its job. Having said that, the query optimizer is not perfect, so if there is a performance issue, use the query show plan with alternate constructions and see if it improves...
One quick comment on performance of inner vs. outer joins. It is simply not true that inner joins are intrinsically faster than outer joins. The relative performance depends entirely on which of the three types of processing joins are used by the query engine;
1. Nested Loop Join, 2., Merge Join, or 3. Hash Join.
The Nested Loop join, for example, is used when the set of records on one side of the join is very much smaller than on the other side, and the larger set is indexed on the join column[s]. In this case, if the smaller set is the "outer" side, then an outer join will be faster. The reason is that the nested loop join takes the entire set of records from that smaller set, and iterates through each one, finding the records from the larger set that match. An inner join has to perform a second step of removing rows from the smaller side when no matches were found in the larger set. The outer join does not do this second step.
Each of the three possible types of join processes has its own characterisitic behavior patterns... See Nested Loop Joins, Merge Joins and Hash Joins for the details.
As written they are identical. Excellent answer from Charles.
If you want to know if they will have different execution plans then simply display the execution plan in SSMS.
As for speed have the columns used in the join indexed.
Maintain the indexes - a fragmented index is not nearly as effective.
The query plan will not always be the same.
The query optimizer keeps statistics and as the profile of the data changes the optimal plan may change.
Thousands of rows is not a lot.
Once you get into millions then tune indexes and syntax (with hints).
Some times you have to get into millions before you have enough data to tune.
There is also a UNION operator that is equivalent and sometimes faster.
The join hint Loop is not symmetric so in that case the query plan is different for the following but they are still that same results.
If one is a PK table I always put it first.
In this case the first is twice as fast as the second.
select top 10 docSVsys.sID, docMVtext.fieldID
from docSVsys
inner loop join docMVtext
on docMVtext.sID = docSVsys.sID
where docSVsys.sID < 100
order by docSVsys.sID, docMVtext.fieldID
select top 10 docSVsys.sID, docMVtext.fieldID
from docMVtext
inner loop join docSVsys
on docMVtext.sID = docSVsys.sID
where docSVsys.sID < 100
order by docSVsys.sID, docMVtext.fieldID
Advanced Query Tuning Concepts