Altering Order of tables in JOIN condition - sql

Given the below scenario:
Table A has 1000 rows and Table B has 5000 rows.
Q1: Select * from Table_A Left Outer Join Table_B
ON condition
Q2: Select * from Table_B Left Outer Join Table_A
ON condition
Does this make any difference ? Would there be any performance difference in these situations?

Yes, it makes a big difference for a LEFT JOIN. The two statements are not the same, and the execution paths are likely to be different.
The first query keeps all rows in Table A, plus any matching values from Table B. So this version returns at least 1000 rows.
The second keeps all rows in Table B, plus any matching values from Table A. This is not the same thing. This version returns at least 5000 rows.
For an INNER JOIN (or FULL OUTER JOIN) then the order of the tables in the FROM clause does not affect the result set. However, depending on the optimizer it could affect how the joins are processed (I am thinking of long chains of joins where optimizers take short-cuts).

Does this make any difference ?
Yes it does. LEFT JOIN Definition: returns all rows from left table + matching rows in both table. Matching row means intersection of both tables.
So in your case, the number of rows returned will be very different.
Q1: Select * from Table_A Left Outer Join Table_B ON condition
In this case number of rows returned will be 1000 (since your tableA has 1000 rows and in left side of JOIN) plus the match (intersection between the tables)
Q2: Select * from Table_B Left Outer Join Table_A ON condition
In this case number of rows returned will be 5000 (since your tableB has 5000 rows and in left side of JOIN) plus the match (intersection between the tables)
See the visual representation of the same [Image taken from This CodeProject Post]:

The two queries will have in different results.
See W3 Schools Left Join
and go to the Try It Yourself page. The SQL can be edited for a LEFT OUTER JOIN.

Related

How to swap the sides of a left join in SQL?

I have two tables that I would like to query, tableA has ~53_000 rows while tableB has ~530M rows.
SELECT
b.some_field AS field,
a.*
FROM tableA a -- 53_462
LEFT JOIN tableB b -- 527_795_032
ON a.user_id = b.user_id
AND a.numeric_field >= b.numeric_field
AND a.numeric_field <= b.other_numeric_field
;
This kills the query engine because the right hand side is much bigger than the left, so I think for every row on the left it has to query the right.
In such case (size of right hand side being much bigger than left) what is the best to do:
I am thinking about two possibilities:
switching up sides and using right join
creating a potentially much smaller table by querying the rows that exist in the right hand side and join that table
I have ended up creating a temp table that has only the matching rows.
CREATE TABLE b_temp
....
IN (
SELECT
DISTINCT user_id
FROM
a
)
This allows me to filter down b table to a reasonable size.

SQL Server JOINS: Are 'JOIN' Statements 'LEFT OUTER' Associated by Default in SQL Server? [duplicate]

This question already has answers here:
What is the difference between "INNER JOIN" and "OUTER JOIN"?
(28 answers)
Closed 8 years ago.
I have about 6 months novice experience in SQL, TSQL, SSIS, ETL. As I find myself using JOIN statements more and more in my intern project I have been experimenting with the different JOIN statements. I wanted to confirm my findings. Are the following statements accurate pertaining to the conclusion of JOIN statements in SQL Server?:
1)I did a LEFT OUTER JOIN query and did the same query using JOIN which yielded the same results; are all JOIN statements LEFT OUTER associated in SQL Server?
2)I did a LEFT OUTER JOIN WHERE 2nd table PK (joined to) IS NOT NULL and did the same query using an INNER JOIN which yielded the same results; is it safe to say the the INNER JOIN statement will yield only matched records? and is the same as LEFT OUTER JOIN where joined records IS NOT NULL ?
The reason I'm asking is because I have been only using LEFT OUTER JOINS because that is what I was comfortable with. However, I want to eliminate as much code as possible when writing queries to be more efficient. I just wanted to make sure my observations are correct.
Also, are there any tips that you could provide on easily figuring out which JOIN statement is appropriate for specific queries? For instance, what JOIN would you use if you wanted to yield non-matching records?
Thanks.
A join or inner join (same thing) between table A and table B on, for instance, field1, would narrow in on all rows of table A and B sharing the same field1 value.
A left outer join between A and B, on field1, would show all rows of table A, and only those rows of table B that have a field1 existing in table A.
Where the rows of field1 on table A have a field1 value that doesn't exist in table B, the table B value would show null for field1, but the row of table A would be retained because it is an outer join. These are rows that wouldn't show up in a join which is an implied inner join.
If you get the same results doing a join between table A and table B as you do a left outer join between table A and B, then whatever fields you're joining on have values that exist in both tables. No value for any of the joined fields in A or B exist exclusively in A or B, they all exist in both A and B.
It is also possible you're putting criteria into the where clause that belongs in the on clause of the outer join, which may be causing your confusion. In my example above of tables A and B, where A is being left outer joined with B, you would put any criteria related to table B in the on clause, not the where clause, otherwise you would essentially be turning the outer join into an inner join. For example if you had b.field4 = 12 in the WHERE clause, and table B didn't have a match with A, it would be null and that criteria would fail, and it'd no longer come back even though you used a left outer join. That may be what you are referring to.
JOIN's are mapped to 'INNER JOIN' by default

SQL Join between tables with conditions

I'm thinking about which should be the best way (considering the execution time) of doing a join between 2 or more tables with some conditions. I got these three ways:
FIRST WAY:
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
where
B.PARAM=VALUE
SECOND WAY
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
and B.PARAM=VALUE
THIRD WAY
select * from
TABLE A inner join (Select * from TABLE B where B.PARAM=VALUE) J ON A.KEY=J.KEY
Consider that tables have more than 1 milion of rows.
What your opinion? Which should be the right way, if exists?
Usually putting the condition in where clause or join condition has no noticeable differences in inner joins.
If you are using outer joins ,putting the condition in the where clause improves query time because when you use condition in the where clause of
left outer joins, rows which aren't met the condition will be deleted from the result set and the result set becomes smaller.
But if you use the condition in join clause of left outer joins ,no rows deletes and result set is bigger in comparison to using condition in the where clause.
for more clarification,follow the example.
create table A
(
ano NUMBER,
aname VARCHAR2(10),
rdate DATE
)
----A data
insert into A
select 1,'Amand',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 2,'Alex',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 3,'Angel',to_date('20130201','yyyymmdd') from dual;
commit;
create table B
(
bno NUMBER,
bname VARCHAR2(10),
rdate DATE
)
insert into B
select 3,'BOB',to_date('20130201','yyyymmdd') from dual;
commit;
insert into B
select 2,'Br',to_date('20130101','yyyymmdd') from dual;
commit;
insert into B
select 1,'Bn',to_date('20130101','yyyymmdd') from dual;
commit;
first of all we have normal query which joins 2 tables with each other:
select * from a inner join b on a.ano=b.bno
the result set has 3 records.
now please run below queries:
select * from a inner join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
select * from a inner join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
as you see above results row counts have no differences,and According to my experience there is no noticeable performance differences for data in large volume.
please run below queries:
select * from a left outer join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
in this case,the count of output records will be equal to table A records.
select * from a left outer join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
in this case , records of A which didn't met the condition deleted from the result set and as I said the result set will have less records(in this case 2 records).
According to above examples we can have following conclusions:
1-in case of using inner joins,
there is no special differences between putting condition in where clause or join clause ,but please try to put tables in from clause in order to have minimum intermediate result row counts:
(http://www.dba-oracle.com/art_dbazine_oracle10g_dynamic_sampling_hint.htm)
2-In case of using outer joins,whenever you don't care of exact result row counts (don't care of missing records of table A which have no paired records in table B and fields of table B will be null for these records in the result set),put the condition in the where clause to delete a set of rows which aren't met the condition and obviously improve query time by decreasing the result row counts.
but in special cases you HAVE TO put the condition in the join part.for example if you want that your result row count will be equal to table 'A' row counts(this case is common in ETL processes) you HAVE TO put the condition in the join clause.
3-avoiding subquery is recommended by lots of reliable resources and expert programmers.It usually increase the query time and you can use subquery just when its result data set is small.
I hope this will be useful:)
1M rows really isn't that much - especially if you have sensible indexes. I'd start off with making your queries as readable and maintainable as possible, and only start optimizing if you notice a perforamnce problem with the query (and as Gordon Linoff said in his comment - it's doubtful there would even be a difference between the three).
It may be a matter of taste, but to me, the third way seems clumsy, so I'd cross it out. Personally, I prefer using JOIN syntax for the joining logic (i.e., how A and B's rows are matched) and WHERE for filtering (i.e., once matched, which rows interest me), so I'd go for the first way. But again, it really boils down to personal taste and preferences.
You need to look at the execution plans for the queries to judge which is the most computationally efficient. As pointed out in the comments you may find they are equivalent. Here is some information on Oracle execution plans. Depending on what editor / IDE you use the may be a shortcut for this e.g. F5 in PL/SQL Developer.

SQL Need help joining tables

I have a select SQL query which is really big and it should be pulling in about 5000 records. But when I use the JOIN It cuts the number of records to say 1000 because it only shows records where a value exists on the joined value, how would I go about pulling all records no matter whether the Join finds that a value exists or NOT?
Left outer join : MSDN Outer Joins
Instead of performing an inner join, perform a left outer join

Left join or select from multiple table using comma (,) [duplicate]

This question already has answers here:
SQL left join vs multiple tables on FROM line?
(12 answers)
Closed 8 years ago.
I'm curious as to why we need to use LEFT JOIN since we can use commas to select multiple tables.
What are the differences between LEFT JOIN and using commas to select multiple tables.
Which one is faster?
Here is my code:
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT no as nonvs,
owner,
owner_no,
vocab_no,
correct
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
...and:
SELECT *
FROM vocab_stats vs,
mst_words mw
WHERE mw.no = vs.vocab_no
AND vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
First of all, to be completely equivalent, the first query should have been written
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT *
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
So that mw.* and nvs.* together produce the same set as the 2nd query's singular *. The query as you have written can use an INNER JOIN, since it includes a filter on nvs.correct.
The general form
TABLEA LEFT JOIN TABLEB ON <CONDITION>
attempts to find TableB records based on the condition. If the fails, the results from TABLEA are kept, with all the columns from TableB set to NULL. In contrast
TABLEA INNER JOIN TABLEB ON <CONDITION>
also attempts to find TableB records based on the condition. However, when fails, the particular record from TableA is removed from the output result set.
The ANSI standard for CROSS JOIN produces a Cartesian product between the two tables.
TABLEA CROSS JOIN TABLEB
-- # or in older syntax, simply using commas
TABLEA, TABLEB
The intention of the syntax is that EACH row in TABLEA is joined to EACH row in TABLEB. So 4 rows in A and 3 rows in B produces 12 rows of output. When paired with conditions in the WHERE clause, it sometimes produces the same behaviour of the INNER JOIN, since they express the same thing (condition between A and B => keep or not). However, it is a lot clearer when reading as to the intention when you use INNER JOIN instead of commas.
Performance-wise, most DBMS will process a LEFT join faster than an INNER JOIN. The comma notation can cause database systems to misinterpret the intention and produce a bad query plan - so another plus for SQL92 notation.
Why do we need LEFT JOIN? If the explanation of LEFT JOIN above is still not enough (keep records in A without matches in B), then consider that to achieve the same, you would need a complex UNION between two sets using the old comma-notation to achieve the same effect. But as previously stated, this doesn't apply to your example, which is really an INNER JOIN hiding behind a LEFT JOIN.
Notes:
The RIGHT JOIN is the same as LEFT, except that it starts with TABLEB (right side) instead of A.
RIGHT and LEFT JOINS are both OUTER joins. The word OUTER is optional, i.e. it can be written as LEFT OUTER JOIN.
The third type of OUTER join is FULL OUTER join, but that is not discussed here.
Separating the JOIN from the WHERE makes it easy to read, as the join logic cannot be confused with the WHERE conditions. It will also generally be faster as the server will not need to conduct two separate queries and combine the results.
The two examples you've given are not really equivalent, as you have included a sub-query in the first example. This is a better example:
SELECT vs.*, mw.*
FROM vocab_stats vs, mst_words mw
LEFT JOIN vocab_stats vs ON mw.no = vs.vocab_no
WHERE vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111