Subquery VS join with respect to performance - sql

Which is better in performance [Subquery] or [join]?
I have 3 tables related to each other, and i need to select data from one table that has some fields related to the other 2 tables, which one from the following 2 SQL statements is better from the view of performance :
select Table1.City, Table1.State, Table2.Name, Table1.Code, Table3.ClassName
from Table1 inner join Table2 on Table1.EmpId = Table2.Id inner join Table3
on Table1.ClassId = Table3.Id where Table.Active = 1
OR
select City, State, (select Name from Table2 where Id = Table1.EmpId) as Name, Code,
(select ClassName from Table3 where Id = Table1.ClassId) as ClassName from Table1
where Active = 1
I have tried the execution plan but its statistics is not expressive to me because the current data is a test data not real one, so i can't imagine the amount of data when tables are live of course they will be more than the test one.
Note : The Id field in Table2 and Table3 is primary key
Thanks in advance

The first approach, with joins, is by far faster. In second the query will be executed for each row. Some databases optimize nested queries into joins though.
Join vs. sub-query

I use subqueries often if I expect large joins with big tables or many joins.
Especially with left joins it can happen that the query exceeds the size of the join cache.

Related

SQL: Except Join on multiple queries

Getting a bit stuck trying to build this query. (SQL SERVER)
I'm trying to join two tables on similar rows, but then stack the unique rows from both table 1 and table 2 on the result set. I was first shooting for a full outer join, but it leaves my key fields blank when the data comes from only one of the tables.
Example: Full Outer Join
Here's what I would like for the query to be able to do:
Essentially, I would like to have a result table where the key fields (Part and Operation) are all returned in two columns (so like a union), but the Estimated and Actual Rate columns returned side by side where there is a matching row between table 1 and table 2.
I've also been trying to inner join the two tables to make a subquery, then using that inner join for except clause on each of the tables, then stacking the original inner join with the two except unions.
Current Attempt: One Join, Two Excepts, Two Unions
UPDATE: I got the current attempt to return values! It's a bit complicated though, Appreciate any advice or feedback though! Great answers below thanks, I will need to do some comparisons
Thanks
SELECT ISNULL(t1.part,t2.part) AS Part,
ISNULL(t1.operation,t2.operation) AS Operation,
ISNULL('Estimated Rate',0) AS 'Estimated Rate',
ISNULL('Actual Rate',0) AS 'Actual Rate'
FROM table1 t1
FULL OUTER JOIN table2 t2
ON t1.part = t2.part
AND t1.operation = t2.operation
I would do this as a union all and group by:
select part, operation,
sum(estimatedrate) as estimatedrate, sum(actualrate) as actualrate
from ((select part, operation, estimatedrate, 0 as actualrate
from table1
) union all
(select part, operation, 0 as estimatedrate, 0 actualrate
from table1
)
) er
group by part, operation;

INNER JOIN with complex condition dramatically increases the execution time

I have 2 tables with several identical fields needed to be linked in JOIN condition. E.g. in each table there are fields: P1, P2. I want to write the following join query:
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
OR Table1.P2 = Table2.P2
OR Table1.P1 = Table2.P2
OR Table1.P2 = Table2.P1
In the case I have huge tables this request is executing a lot of time.
I tried to test how long will be the request of a query with one condition only. First, I have modified the tables in such way all data from P2 & P1 where copied as new rows into Table1 & Table2. So my query is simple:
SELECT ... FROM Table1 INNER JOIN Table2 ON Table1.P = Table2.P
The result was more then surprised: the execution time from many hours (the 1st case) was reduced to 2-3 seconds!
Why is it so different? Does it mean the complex conditions are always reduce performance? How can I improve the issue? May be P1,P2 indexing will help? I want to remain the 1st DB schema and not to move to one field P.
The reason the queries are different is because of the join strategies being used by the optimizer. There are basically four ways that two tables can be joined:
"Hash join": Creates a hash table on one of the tables which it uses to look up the values in the second.
"Merge join": Sorts both tables on the key and then readsthe results sequentially for the join.
"Index lookup": Uses an index to look up values in one table.
"Nested Loop": Compars each value in each table to all the values in the other table.
(And there are variations on these, such as using an index instead of a table, working with partitions, and handling multiple processors.) Unfortunately, in SQL Server Management Studio both (3) and (4) are shown as nested loop joins. If you look more closely, you can tell the difference from the parameters in the node.
In any case, your original join is one of the first three -- and it goes fast. These joins can basically only be used on "equi-joins". That is, when the condition joining the two tables includes an equality operator.
When you switch from a single equality to an "in" or set of "or" conditions, the join condition has changed from an equijoin to a non-equijoin. My observation is that SQL Server does a lousy job of optimization in this case (and, to be fair, I think other databases do pretty much the same thing). Your performance hit is the hit of going from a good join algorithm to the nested loops algorithm.
Without testing, I might suggest some of the following strategies.
Build an index on P1 and P2 in both tables. SQL Server might use the index even for a non-equijoin.
Use the union query suggested in another solution. Each query should be correctly optimized.
Assuming these are 1-1 joins, you can also do this as a set of multiple joins:
from table1 t1 left outer join
table2 t2_11
on t1.p1 = t2_11.p1 left outer join
table2 t2_12
on t1.p1 = t2_12.p2 left outer join
table2 t2_21
on t1.p2 = t2_21.p2 left outer join
table2 t2_22
on t1.p2 = t2_22.p2
And then use case/coalesce logic in the SELECT to get the value that you actually want. Although this may look more complicated, it should be quite efficient.
you can use 4 query and Union there result
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P2
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P2
Does using CTEs help performance?
;WITH Table1_cte
AS
(
SELECT
...
[P] = P1
FROM Table1
UNION
SELECT
...
[P] = P2
FROM Table1
)
, Table2_cte
AS
(
SELECT
...
[P] = P1
FROM Table2
UNION
SELECT
...
[P] = P2
FROM Table2
)
SELECT ... FROM Table1_cte x
INNER JOIN
Table2_cte y
ON x.P = y.P
I suspect, as far as the processor is concerned, the above is just different syntax for the same complex conditions.

What's the best way to join on the same table twice?

This is a little complicated, but I have 2 tables. Let's say the structure is something like this:
*Table1*
ID
PhoneNumber1
PhoneNumber2
*Table2*
PhoneNumber
SomeOtherField
The tables can be joined based on Table1.PhoneNumber1 -> Table2.PhoneNumber, or Table1.PhoneNumber2 -> Table2.PhoneNumber.
Now, I want to get a resultset that contains PhoneNumber1, SomeOtherField that corresponds to PhoneNumber1, PhoneNumber2, and SomeOtherField that corresponds to PhoneNumber2.
I thought of 2 ways to do this - either by joining on the table twice, or by joining once with an OR in the ON clause.
Method 1:
SELECT t1.PhoneNumber1, t1.PhoneNumber2,
t2.SomeOtherFieldForPhone1, t3.someOtherFieldForPhone2
FROM Table1 t1
INNER JOIN Table2 t2
ON t2.PhoneNumber = t1.PhoneNumber1
INNER JOIN Table2 t3
ON t3.PhoneNumber = t1.PhoneNumber2
This seems to work.
Method 2:
To somehow have a query that looks a bit like this -
SELECT ...
FROM Table1
INNER JOIN Table2
ON Table1.PhoneNumber1 = Table2.PhoneNumber OR
Table1.PhoneNumber2 = Table2.PhoneNumber
I haven't gotten this to work yet and I'm not sure if there's a way to do it.
What's the best way to accomplish this? Neither way seems simple or intuitive... Is there a more straightforward way to do this? How is this requirement generally implemented?
First, I would try and refactor these tables to get away from using phone numbers as natural keys. I am not a fan of natural keys and this is a great example why. Natural keys, especially things like phone numbers, can change and frequently so. Updating your database when that change happens will be a HUGE, error-prone headache. *
Method 1 as you describe it is your best bet though. It looks a bit terse due to the naming scheme and the short aliases but... aliasing is your friend when it comes to joining the same table multiple times or using subqueries etc.
I would just clean things up a bit:
SELECT t.PhoneNumber1, t.PhoneNumber2,
t1.SomeOtherFieldForPhone1, t2.someOtherFieldForPhone2
FROM Table1 t
JOIN Table2 t1 ON t1.PhoneNumber = t.PhoneNumber1
JOIN Table2 t2 ON t2.PhoneNumber = t.PhoneNumber2
What i did:
No need to specify INNER - it's implied by the fact that you don't specify LEFT or RIGHT
Don't n-suffix your primary lookup table
N-Suffix the table aliases that you will use multiple times to make it obvious
*One way DBAs avoid the headaches of updating natural keys is to not specify primary keys and foreign key constraints which further compounds the issues with poor db design. I've actually seen this more often than not.
The first is good unless either Phone1 or (more likely) phone2 can be null. In that case you want to use a Left join instead of an inner join.
It is usually a bad sign when you have a table with two phone number fields. Usually this means your database design is flawed.
You could use UNION to combine two joins:
SELECT Table1.PhoneNumber1 as PhoneNumber, Table2.SomeOtherField as OtherField
FROM Table1
JOIN Table2
ON Table1.PhoneNumber1 = Table2.PhoneNumber
UNION
SELECT Table1.PhoneNumber2 as PhoneNumber, Table2.SomeOtherField as OtherField
FROM Table1
JOIN Table2
ON Table1.PhoneNumber2 = Table2.PhoneNumber
The first method is the proper approach and will do what you need. However, with the inner joins, you will only select rows from Table1 if both phone numbers exist in Table2. You may want to do a LEFT JOIN so that all rows from Table1 are selected. If the phone numbers don't match, then the SomeOtherFields would be null. If you want to make sure you have at least one matching phone number you could then do WHERE t2.PhoneNumber IS NOT NULL OR t3.PhoneNumber IS NOT NULL
The second method could have a problem: what happens if Table2 has both PhoneNumber1 and PhoneNumber2? Which row will be selected? Depending on your data, foreign keys, etc. this may or may not be a problem.
My problem was to display the record even if no or only one phone number exists (full address book). Therefore I used a LEFT JOIN which takes all records from the left, even if no corresponding exists on the right. For me this works in Microsoft Access SQL (they require the parenthesis!)
SELECT t.PhoneNumber1, t.PhoneNumber2, t.PhoneNumber3
t1.SomeOtherFieldForPhone1, t2.someOtherFieldForPhone2, t3.someOtherFieldForPhone3
FROM
(
(
Table1 AS t LEFT JOIN Table2 AS t3 ON t.PhoneNumber3 = t3.PhoneNumber
)
LEFT JOIN Table2 AS t2 ON t.PhoneNumber2 = t2.PhoneNumber
)
LEFT JOIN Table2 AS t1 ON t.PhoneNumber1 = t1.PhoneNumber;
SELECT
T1.ID
T1.PhoneNumber1,
T1.PhoneNumber2
T2A.SomeOtherField AS "SomeOtherField of PhoneNumber1",
T2B.SomeOtherField AS "SomeOtherField of PhoneNumber2"
FROM
Table1 T1
LEFT JOIN Table2 T2A ON T1.PhoneNumber1 = T2A.PhoneNumber
LEFT JOIN Table2 T2B ON T1.PhoneNumber2 = T2B.PhoneNumber
WHERE
T1.ID = 'FOO';
LEFT JOIN or JOIN also return same result. Tested success with PostgreSQL 13.1.1 .

Difference between "and" and "where" in joins

Whats the difference between
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
and table2.Id IN (2728)
and
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
where table2.Id IN (2728)
both return the same result and both have the same explain output
Firstly there is a semantic difference. When you have a join, you are saying that the relationship between the two tables is defined by that condition. So in your first example you are saying that the tables are related by cd.Company = table2.Name AND table2.Id IN (2728). When you use the WHERE clause, you are saying that the relationship is defined by cd.Company = table2.Name and that you only want the rows where the condition table2.Id IN (2728) applies. Even though these give the same answer, it means very different things to a programmer reading your code.
In this case, the WHERE clause is almost certainly what you mean so you should use it.
Secondly there is actually difference in the result in the case that you use a LEFT JOIN instead of an INNER JOIN. If you include the second condition as part of the join, you will still get a result row if the condition fails - you will get values from the left table and nulls for the right table. If you include the condition as part of the WHERE clause and that condition fails, you won't get the row at all.
Here is an example to demonstrate this.
Query 1 (WHERE):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
WHERE table2.Id IN (2728);
Result:
field1
200
Query 2 (AND):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
AND table2.Id IN (2728);
Result:
field1
100
200
Test data used:
CREATE TABLE table1 (Company NVARCHAR(100) NOT NULL, Field1 INT NOT NULL);
INSERT INTO table1 (Company, Field1) VALUES
('FooSoft', 100),
('BarSoft', 200);
CREATE TABLE table2 (Id INT NOT NULL, Name NVARCHAR(100) NOT NULL);
INSERT INTO table2 (Id, Name) VALUES
(2727, 'FooSoft'),
(2728, 'BarSoft');
SQL comes from relational algebra.
One way to look at the difference is that JOINs are operations on sets that can produce more records or less records in the result than you had in the original tables. On the other side WHERE will always restrict the number of results.
The rest of the text is extra explanation.
For overview of join types see article again.
When I said that the where condition will always restrict the results, you have to take into account that when we are talking about queries on two (or more) tables you have to somehow pair records from these tables even if there is no JOIN keyword.
So in SQL if the tables are simply separated by a comma, you are actually using a CROSS JOIN (cartesian product) which returns every row from one table for each row in the other.
And since this is a maximum number of combinations of rows from two tables then the results of any WHERE on cross joined tables can be expressed as a JOIN operation.
But hold, there are exceptions to this maximum when you introduce LEFT, RIGHT and FULL OUTER joins.
LEFT JOIN will join records from the left table on a given criteria with records from the right table, BUT if the join criteria, looking at a row from the left table is not satisfied for any records in the right table the LEFT JOIN will still return a record from the left table and in the columns that would come from the right table it will return NULLs (RIGHT JOIN works similarly but from the other side, FULL OUTER works like both at the same time).
Since the default cross join does NOT return those records you can not express these join criteria with WHERE condition and you are forced to use JOIN syntax (oracle was an exception to this with an extension to SQL standard and to = operator, but this was not accepted by other vendors nor the standard).
Also, joins usually, but not always, coincide with existing referential integrity and suggest relationships between entities, but I would not put as much weight into that since the where conditions can do the same (except in the before mentioned case) and to a good RDBMS it will not make a difference where you specify your criteria.
The join is used to reflect the entity relations
the where clause filters down results.
So the join clauses are 'static' (unless the entity relations change), while the where clauses are use-case specific.
There is no difference. "ON" is like a synonym for "WHERE", so t he second kind of reads like:
JOIN table2 WHERE cd.Company = table2.Name AND table2.Id IN (2728)
There is no difference when the query optimisation engine breaks it down to its relevant query operators.

Nested sql joins process explanation needed

I want to understand the process of nested join clauses in sql queries. Can you explain this example with pseudo codes? (What is the order of joining tables?)
FROM
table1 AS t1 (nolock)
INNER JOIN table2 AS t2 (nolock)
INNER JOIN table3 as t3 (nolock)
ON t2.id = t3.id
ON t1.mainId = t2.mainId
In SQl basically we have 3 ways to join two tables.
Nested Loop ( Good if one table has small number of rows),
Hash Join (Good if both table has very large rows, it does expensive hash formation in memory)
Merge Join (Good when we have sorted data to join).
From your question it seems that you want for Nested Loop.
Let us say t1 has 20 rows, t2 has 500 rows.
Now it will be like
For each row in t1
Find rows in t2 where t1.MainId = t2.MainId
Now out put of that will be joined to t3.
Order of Joining depends on Optimizer, Expected Row count etc.
Try EXPLAIN query.
It tells you exactly what's going on. :)
Of course that doesn't work in SQL Server. For that you can try Razor SQLServer Explain Plan
Or even SET SHOWPLAN_ALL
If you're using SQL Server Query Analyzer, look for "Show Execution Plan" under the "Query" menu, and enable it.