This question might seem quite trival, but being new to sql programming I'm having some trouble understanding the left joins.
To illustrate, I have the following scenario -
I have to perform left joins on the following tables -
from T1.id to T2.id
from T2.Oi to T3.Oi
from T1.Pi to T4.Pi
from t4.Si to T5.Si
from T6.Ki to T7.Ki
I'm trying to do the following method, but not sure if its correct approach, if so, then not sure if its an efficient approach
select /*(whatever I want)*/
from
T1 left join T2 on T1.id = T2.id
left join T4 on T1.Pi = T4.Pi
left join T5 on T4.Si = T5.Si
left join T3 on T2.Oi = T3.Oi
(Getting stuck on joining T6 and T7)
Can someone help me in understanding if my above approach is right and how solve in joining T6 and T7
Cheers!
joining tables T1..T5 should be like that:
SELECT *
FROM T1
LEFT JOIN T2
ON T2.ID=T1.ID
LEFT JOIN T3
ON T3.OI=T2.OI
LEFT JOIN T4
ON T4.PI=T1.PI
LEFT JOIN T5
ON T5.SI=T4.SI
I don't know what you have in those tables so please consider cartesian product (of course it can be desired result in some cases). Read more here.
I don't know what about tables T6 and T7. If records are in the same form you may want to use UNION (please consider UNION ALL operator - read about difference:
SELECT *
FROM T1
LEFT JOIN T2
ON T2.ID=T1.ID
LEFT JOIN T3
ON T3.OI=T2.OI
LEFT JOIN T4
ON T4.PI=T1.PI
LEFT JOIN T5
ON T5.SI=T4.SI
UNION
SELECT *
FROM T6
LEFT JOIN T7
ON T7.KI=T6.KI
Related
I am currently trying to join a few tables together (maybe join 2 additional more if possible) but with how my query is written right now, I cant even see the results with 3 tables
select t1.x,
t1.y,
t1.z,
t4.a,
t4.b,
t4.c,
t4.d
from t1
left join t2 on t1.id=t2.id
left join t3 on t2.id=t3.id
left join t4 on t1.id2=t4.id
where t1.date between 'x' and'x'
and t1.city not in ('x')
and t3.column = x;
Is there a way to optimize this code to run faster and perhaps make it able to add more tables to it?
Thank you in advance!
Your query has some logic issues that might help with the speed.
t2 is joined to t1 when they have the same id value.
t3 is then pulled in, if and only if, there was a row in t2 and it has the same value as t1 and t2.
Finally, in your where clause, the t3.column has to be x else it's filtered.
This means a row in t3 has to exist. Every t1 record that doesn't have a t2 record and a t3 record will be filtered out with that where. Thus you don't need a left join, you need an INNER join.
select t1.x,
t1.y,
t1.z,
t4.a,
t4.b,
t4.c,
t4.d
from t1
inner join t2 on t1.id=t2.id
inner join t3 on t2.id=t3.id
left join t4 on t1.id2=t4.id
where t1.date between 'x' and'x'
and t1.city not in ('x')
and t3.column = x;
In some DBMS you can move the t3.column clause to the join command which can help filter out the rows earlier in the plan.
select t1.x,
t1.y,
t1.z,
t4.a,
t4.b,
t4.c,
t4.d
from t1
inner join t2 on t1.id=t2.id
inner join t3 on t2.id=t3.id and t3.column = x
left join t4 on t1.id2=t4.id
where t1.date between 'x' and'x'
and t1.city not in ('x');
My final advise is to take a close look at t2 to see if you really need it. Ask yourself, is there a reason a row has to exist in t2 in order for me to get the right results? ... because if t1.id = t2.id then t1.id = t3.id and you can eliminate the t2 table completely.
bit of a novice question, I am running a query and left joining and wanted to know whether there was a difference when you specify a filter in terms of performance, in e.g below, top I filter straight after first join and below I do all joins and then filter:
Select t1.*,t2.* from t1 t1
left join t2 t2
on t1.key = t2.key
and t1.date < today
left join t3 t3
on t2.key2 = t3.key
vs
Select t1.*,t2.* from t1 t1
left join t2 t2
on t1.key = t2.key
left join t3 t3
on t2.key2 = t3.key
and t1.date < today
Learn what LEFT JOIN ON returns: INNER JOIN ON rows UNION ALL unmatched left table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN.
In general your queries have different inner join & null-extended rows for the 1st left joins & then further differences due to more joining. Unless certain constraints hold, the 2 queries return different functions of their inputs. So comparing their performance seems moot.
I have 3 tables:
table1: col1(id), col2(segment), col3(sector), col4(year)
mapping table2:
col1(segment1) => values are the same as from table1.col2,
col2(segmnet2) =>values are the same as from table3.col2
table3: col1(id), col2(segment), col3(sector), col4(year)
Now, Im doing FULL OUTER JOIN:
select t1.id, t3.id
from table1 t1
full outer join table3 t3 on
t1.year = t3.year and....
But I also need to join by COL2 - SEGMENT, with using mapping table.
How to do correctly do it?
If I understood you correctly, you just need to add another full outer join:
select t1.id, t3.id
from table1 t1
full outer join mapping t2 on( t1.col2= t2.col1)
full outer join table3 t3 on(t1.year = t3.year and t2.col2 = t3.col2
Just to make sure - a full outer join keeps all the records from both tables being joined, no matter if there is a match or not! I've added another full outer join but change it to the kind of join you need if it isn't full.
I was given a query that uses some very weird syntax for a join and I need to understand how this join is being used:
SELECT
T1.Acct#
, T2.New_Acct#
, T3.Pool#
FROM DB.dbo.ACCT_TABLE T1
LEFT JOIN DB.dbo.CROSSREF_TABLE T2
INNER JOIN DB.dbo.POOL_TABLE T3
ON T2.Pool# = T3.Pool#
ON T1.Acct# = T2.Prev_Acct#
T1 is a distinct account list
T2 is a distinct account list for each Pool #
T3 is a distinct pool list (group of accounts)
I need to return the previous account number held in T2 for each record in T1. I also need the T3 Pool# returned for each pool.
What I'm trying to understand is why someone would write the code this way. It doesn't make sense to me.
A little indenting will show you better what was intended
SELECT
T1.Acct#
, T2.New_Acct#
, T3.Pool#
FROM DB.dbo.ACCT_TABLE T1
LEFT JOIN DB.dbo.CROSSREF_TABLE T2
INNER JOIN DB.dbo.POOL_TABLE T3
ON T2.Pool# = T3.Pool#
ON T1.Acct# = T2.Prev_Acct#
This is a valid syntax that forces the join order a bit. Basically it is asksing for only the records in table T2 that are also in table T3 and then left joining them to T1. I don't like it personally as it is confusing for maintenance. I would prefer a derived table as I find those much clearer and much easier to change when I need to do maintenance six months later:
SELECT
T1.Acct#
, T2.New_Acct#
, T3.Pool#
FROM DB.dbo.ACCT_TABLE T1
LEFT JOIN (select T2.New_Acct#, T3.Pool#
FROM DB.dbo.CROSSREF_TABLE T2
INNER JOIN DB.dbo.POOL_TABLE T3
ON T2.Pool# = T3.Pool#) T4
ON T1.Acct# = T4.Prev_Acct#
An OUTER APPLY would be clearer here:
SELECT
T1.Acct#,
T4.New_Acct#,
T4.Pool#
FROM DB.dbo.ACCT_TABLE T1
OUTER APPLY
(
SELECT
T2.New_Acct#,
T3.Pool#
FROM DB.dbo.CROSSREF_TABLE T2
INNER JOIN DB.dbo.POOL_TABLE T3 ON T2.Pool# = T3.Pool#
WHERE T1.Acct# = T4.Prev_Acct#
) T4
I should probably know this by now, but what, if any is the difference between the two statements below?
The nested join:
SELECT
t1.*
FROM
table1 t1
INNER JOIN table2 t2
LEFT JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
ON t2.table2_ID = t1.table1_ID
The more traditional join:
SELECT
t1.*
FROM
table1 t1
INNER JOIN table2 t2 ON t2.table2_ID = t1.table1_ID
LEFT JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
Well, it's the order of operations..
SELECT
t1.*
FROM
table1 t1
INNER JOIN table2 t2
LEFT JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
ON t2.table2_ID = t1.table1_ID
could be rewritten as:
SELECT
t1.*
FROM
table1 t1 -- inner join t1
INNER JOIN
(table2 t2 LEFT JOIN table3 t3 ON t3.table3_ID = t2.table2_ID) -- with this
ON t2.table2_ID = t1.table1_ID -- on this condition
So basically, first you LEFT JOIN t2 with t3, based on the join condition: table3_ID = table2_ID, then you INNER JOIN t1 with t2 on table2_ID = table1_ID.
In your second example you first INNER JOIN t1 with t2, and then LEFT JOIN the resulting inner join with table t3 on the condition table2_ID = table1_ID.
SELECT
t1.*
FROM
table1 t1
INNER JOIN table2 t2 ON t2.table2_ID = t1.table1_ID
LEFT JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
could be rewritten as:
SELECT
t1.*
FROM
(table1 t1 INNER JOIN table2 t2 ON t2.table2_ID = t1.table1_ID) -- first inner join
LEFT JOIN -- then left join
table3 t3 ON t3.table3_ID = t2.table2_ID -- the result with this
EDIT
I apologize. My first remark was wrong. The two queries will produce the same results but there may be a difference in performance as the first query may perform slower than the second query in some instances ( when table 1 contains only a subset of the elements in table 2) as the LEFT JOIN will be executed first - and only then intersected with table1. As opposed to the second query which allows the query optimizer to do it's job.
For your specific example, I don't think there should be any difference in the query plans generated, but there's definitely a difference in readability. Your 2nd example is MUCH easier to follow.
If you were to reverse the types of joins in the example, you could end up with much different results.
SELECT t1.*
FROM table1 t1
LEFT JOIN table2 t2 ON t2.table2_ID = t1.table1_ID
INNER JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
-- may not produce the same results as...
SELECT t1.*
FROM table1 t1
LEFT JOIN table2 t2
INNER JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
ON t2.table2_ID = t1.table1_ID
Based on the fact that order of the joins DOES matter in many cases - careful thought should go into how you're writing your join syntax. If you find that the 2nd example is what you're really trying to accomplish, i'd consider rewriting the query so that you can put more emphasis on the order of your joins...
SELECT t1.*
FROM table2 t2
INNER JOIN table3 t3 ON t3.table3_ID = t2.table2_ID
RIGHT JOIN table1 t1 ON t2.table2_ID = t1.table1_ID
The best way to see what is different in these two queries is to compare the Query Plan for both these queries.
There is no difference in the result sets for these IF there are always rows in table3 for a given row in table2.
I tried it on my database and the difference in the query plans was that
1. For the first query, the optimizer chose to do the join on table2 and table 3 first.
2. For the second query, the optimizer chose to join table1 and table2 first.
You should see no difference at all between the two queries, provided your DBMS' optimizer is up to scratch. That, however, even for big-iron, high-cost platforms, is not an assumption I'd be confident in making, so I'd be fairly unsurprised to discover that query plans (and consequently execution times) varied.