Left outer join on multiple tables - sql

I have the following sql statement:
select
a.desc
,sum(bdd.amount)
from t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination def a on (a.id =bds.sId)
where c.repId=1000000134
group by a.desc;
When I run it I get the following result:
desc amount
NW 12.00
SW 10
When I try to add another left outer join to get another set of values:
select
a.desc
,sum(bdd.amount)
,sum(i.amt)
from t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination def a on (a.id =bdd.sId)
left outer join t_ind i on (i.id=c.id)
where c.repId=1000000134
group by a.desc;
It basically doubles the amount field like:
desc amount amt
NW 24.00 234.00
SE 20.00 234.00
While result should be:
desc amount amt
NW 12.00 234.00
SE 10.00 NULL
How do I fix this?

If you really need to receive the data as you mentioned, your can use sub-queries to perform the needed calculations. In this case you code may looks like the following:
select x.[desc], x.amount, y.amt
from
(
select
c.[desc]
, sum (bdd.amount) as amount
, c.id
from t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination_def bdd on (bdd.id = bds.sId)
where c.repId=1000000134
group by c.id, c.[desc]
) x
left join
(
select t.id, sum (t.amt) as amt
from t_ind t
inner join t_main c
on t.id = c.id
where c.repID = 1000000134
group by t.id
) y
on x.id = y.id
In the first sub-select you will receive the aggregated data for the two first columns: desc and amount, grouped as you need.
The second select will return the needed amt value for each id of the first set.
Left join between those results will gives the needed result. The addition of the t_main table to the second select was done because of performance issues.
Another solution can be the following:
select
c.[desc]
, sum (bdd.amount) as amount
, amt = (select sum (amt) from t_ind where id = c.id)
from #t_main c
left outer join t_direct bds on (bds.repid=c.id)
left outer join tm_defination_def bdd on (bdd.id = bds.sId)
where c.repId = 1000000134
group by c.id, c.[desc]
The result will be the same. Basically, instead of using of nested selects the calculating of the amt sum is performing inline per each row of the result joins. In case of large tables the performance of the second solution will be worse that the first one.

Your new left outer join is forcing some rows to be returned in the result set a few times due to multiple relations most likely. Remove your SUM and just review the returned rows and work out exactly which ones you require (maybe restrict it to on certain type of t_ind record if that is applicable??), then adjust your query accordingly.

Left Outer Join - Driving Table Row Count
A left outer join may return more rows than there are in the driving table if there are multiple matches on the join clause.
Using MS SQL-Server:
DECLARE #t1 TABLE ( id INT )
INSERT INTO #t1 VALUES ( 1 ),( 2 ),( 3 ),( 4 ),( 5 );
DECLARE #t2 TABLE ( id INT )
INSERT INTO #t2 VALUES ( 2 ),( 2 ),( 3 ),( 10 ),( 11 ),( 12 );
SELECT * FROM #t1 t1
LEFT OUTER JOIN #t2 t2 ON t2.id = t1.id
This gives:
1 NULL
2 2
2 2
3 3
4 NULL
5 NULL
There are 5 rows in the driving table (t1), but 6 rows are returned because there are multiple matches for id 2.
So if an aggregate function is used, eg SUM() etc, grouped by the driving table column(s), this will give the wrong results.
To fix this, use derived tables or sub-queries to calculate the aggregate values, as already stated.
Left Outer Join - Multiple Tables
Where there are left outer joins over multiple tables, or any join for that matter, the query generates a series of derived tables in the order of joins.
SELECT * FROM t1
LEFT OUTER JOIN t2 ON t2.col2 = <...>
LEFT OUTER JOIN t3 ON t3.col3 = <...>
This is equivalent to:
SELECT * FROM
(
SELECT * FROM t1
LEFT OUTER JOIN t2 ON t2.col2 = <...>
) dt1
LEFT OUTER JOIN t3 ON t3.col3 = <...>
Here, for both queries, the results of the 1st left outer join are put into a derived table (dt1) which is then left outer joined to the 3rd table (t3).
For left outer joins over multiple tables, the order of the tables in the join clauses is critical.

Related

Most efficient way to join two tables on multiple fields?

I'm working with an Oracle SQL DB and attempting to join 2 tables together. My issue is that there are 3 different dimensions (4 total fields) upon which the two tables may be joined and I'm looking to identify all records where any one of those methods delivers a match and then pull in a certain field from that 2nd table in those instances.
My current plan is as follows:
SELECT a.*,
CASE
WHEN b.field_1 IS NOT NULL THEN b.field_5
WHEN c.field_2 IS NOT NULL THEN c.field_5
WHEN d.field_3 IS NOT NULL THEN c.field_5
END AS match
FROM table_1 a
LEFT JOIN table_2 b ON a.field_1 = b.field_1
LEFT JOIN table_3 c ON a.field_2 = c.field_2
LEFT JOIN table_4 d ON a.field_3 = d.field3 AND a.field_4 = d.field4
I believe this will give me the results I'm looking for, but I imagine this isn't the most efficient way to accomplish that. Any thoughts on a better approach?
[TL;DR] Your query is fine.
You need to use JOINs to correlate the relationships between the four tables.
If you want to be able to include rows from the driving table when there are no rows in the related tables then the join wants to be an OUTER JOIN.
If you put the driving table first then it will be a LEFT OUTER JOIN (or just LEFT JOIN)
You do not have much option on this.
If you want to get the field_5 values then you either want:
SELECT a.*,
b.field_5 AS b_match,
c.field_5 AS c_match,
d.field_5 AS d_match
FROM table_1 a
LEFT JOIN table_2 b ON a.field_1 = b.field_1
LEFT JOIN table_3 c ON a.field_2 = c.field_2
LEFT JOIN table_4 d ON a.field_3 = d.field3 AND a.field_4 = d.field4
If you want all the matches.
Or, you want to use your query:
SELECT a.*,
CASE
WHEN b.field_1 IS NOT NULL THEN b.field_5
WHEN c.field_2 IS NOT NULL THEN c.field_5
WHEN d.field_3 IS NOT NULL THEN c.field_5 -- Should this be d.field_5?
END AS match
FROM table_1 a
LEFT JOIN table_2 b ON a.field_1 = b.field_1
LEFT JOIN table_3 c ON a.field_2 = c.field_2
LEFT JOIN table_4 d ON a.field_3 = d.field3 AND a.field_4 = d.field4
If you want to get a single match in preference order of tables b, c and then d.
If you are using Oracle 12 or later, a third alternative could be to use UNION ALL in a LATERAL join:
SELECT a.*, l.field_5
FROM table_1 a
LEFT OUTER JOIN LATERAL (
SELECT 1 AS priority, b.field_5
FROM table_2 b
WHERE a.field_1 = b.field_1
UNION ALL
SELECT 2 AS priority, c.field_5
FROM table_3 c
WHERE a.field_2 = c.field_2
UNION ALL
SELECT 3 AS priority, d.field_5
FROM table_3 d
WHERE a.field_3 = d.field_3
AND a.field_4 = d.field_4
ORDER BY priority ASC
FETCH FIRST ROW WITH TIES
) l
ON (1 = 1)
Which may reduce the number of duplicate rows from not having multiple JOINs (that you are potentially ignoring with your CASE expression) but you should test whether it does return your desired results and if it would be more or less performant.

Combining 2 select statements

I have 2 select statements having a common column POL.SP_NUM which I wish to combine. I am new to SQL and haven't the slightest clue how to go about with the same.
Query 1:
select POL.SP_NUM POL#
, POL.ASSET_NUM COV#
, count(distinct(POLX.ATTRIB_06)) COUNT_ADDENDA
, count(distinct(POLX.ATTRIB_07)) COUNT_CERT
, sum(POL.QTY) SI
from S_ASSET POL
, S_ASSET_X POLX
Where POL.ROW_ID = POLX.ROW_ID
and POL.SP_NUM in ('000','111','222')
group by
POL.SP_NUM
, POL.ASSET_NUM
Query 1 output:
POL# COV# COUNT_ADDENDA COUNT_CERT SI
000 856 2 0 1000
111 123 0 0 500
222 567 0 1 2000
Query 2:
select POL#, sum(DOCI)
from (
select POL.SP_NUM POL#, sum(Q.AMT + POL.AMT) DOCI
from S_ASSET POL
, S_QUOTE_ITEM Q
where POL.X_QUOTE_ID = Q.ROW_ID
and POL.SP_NUM in ('000','111','222')
group by POL.SP_NUM
UNION ALL
select POL.SP_NUM POL#, sum(QXM.AMT) DOCI
from S_ASSET POL
, S_QUOTE_ITEM Q
, S_QUOTE_ITEM_XM QXM
where POL.X_QUOTE_ID = Q.ROW_ID
and Q.ROW_ID = QXM.PAR_ROW_ID
and POL.SP_NUM in ('000','111','222')
group by POL.SP_NUM
)
group by POL#
Query 2 output:
POL# sum(DOCI)
000 90
111 0
222 10
Desired output:
POL# COV# COUNT_ADDENDA COUNT_CERT SI sum(DOCI)
000 856 2 0 1000 90
111 123 0 0 500 0
222 567 0 1 2000 10
If there is a better way to code this? Suggestions are welcome.
This is no answer to the question, but an answer to the request to explain the join types made in the comments setion.
INNER JOIN (or short: JOIN)
select * from t1 join t2 on t1.colx = t2.coly
only gives you matches. This is the most common join. You could replace the ON clause with a USING clause in case the columns in the ON clause have the same names in the tables. Sometimes usefull to quickly write a query, but I would generally not recommend USING.
LEFT OUTER JOIN (or short: LEFT JOIN)
select * from t1 left join t2 on t1.colx = t2.coly
gives you all t1 records, no matter whether they have a math in t2. So when there is a match or more for a t1 record, then you join these just as wih an inner join, but when a t1 record has no match in t2 then you get the t1 record along with an empty t2 record (all columns are NULL, even the columns you used in the ON clause, which is t2.coly in above example). In other words: you get all records you'd get with an inner join plus all t1 records that have no match in t2.
You can also use a RIGHT JOIN so you'd keep t2 records when there is no t1 match:
select * from t1 right join t2 on t1.colx = t2.coly
but this is regarded less readable by many people, so better don't use right outer joins, but simply swap tables then:
select * from t2 left join t1 on t1.colx = t2.coly
FULL OUTER JOIN (or short: FULL JOIN)
select * from t1 full outer join t2 on t1.colx = t2.coly
this gives you all records from both t1 and t2, no matter whether they have a match in the other table or not. Again: You get all records you'd get with an inner join plus all t1 with no t2 match plus all t2 with no t1 match.
When having several full outer joins the USING clause can come in handy:
select product, sum(p1.amount), sum(p2.amount), sum(p3.amount)
from p1
full outer join p2 using (product)
full outer join p3 using (product);
CROSS JOIN
A cross join joins a table without any criteria, so as to combine each of its records with each of the records already present. This is used to get all combinations and usually followed by a left outer join:
select products.product_id, regions.region_id, count(*)
from products
cross join regions
left join sales on sales.product_id = products.product_id
and sales.region_id = regions.region_id
group by products.product_id, regions.region_id
order by products.product_id, regions.region_id;
This gives you all possible combinations of products and regions and counts the sales therein. So you get a result record even for product / region combinations where nothing was sold (i.e. no entry in table sales).
NATURAL JOIN
looks at common column names to magically join tables. My simple advice: never use this join type.
ANTI JOIN
This is not a join type actually, but a usage of a join, namely an outer join. Here you want to get all records from a table except the matches. You achieve this by outer-joining the tables and then removing matches in the where clause.
select t1.*
from t1
left join t2 on t1.colx = t2.coly
where t2.coly is null;
This looks queer, because we have EXISTS (and IN) to check for existence:
select *
from t1
where not exists (select * from t2 where t2.coly = t1.colx);
So why would one obfuscate things and use the anti join pattern instead? It is a trick used on weak DBMS. When a DBMS is written, joins are the most important thing and the developers of the DBMS put all their effort into making them fast. They may neglect EXISTS and IN at first and only later care about their performance. So it may help then to use a join technique (the anti join) instead. My recommendation: Only use the anti join pattern when running into performance issues with a straight-forward query. So far I've never had to use anti joins it in more than twenty years. (It's good to have that option though. And it's good to know about them, so as to not be confused when stumbling upon such query some time :-)
You can join the queries:
select *
from (your query 1 here) query1
join (your query 2 here) query2 on query2.pol# = query1.pol#;
The same with WITH clauses:
with query1 as (your query 1 here),
query2 as (your query 2 here)
select *
from query1
join query2 on query2.pol# = query1.pol#;

Why am I getting duplicate records for this query? I am performing a Left Outer Join

I am entering the following query below and getting duplicate values. I thought if I did a Left Outer Join that it wouldn't do that. I want T0. data for 2 of the 3 columns. The one column that I want T1. data is for the related customer name to the customer code. But it seems to want to populate the record twice.
Here is the code that I am attempting to use:
SELECT T0.CardCode
,T1.CardName
,T0.State
FROM CRD1 T0 LEFT OUTER JOIN OCRD T1 ON T0.CardCode=T1.CardCode
Try using distinct keyword.
SELECT distinct
T0.CardCode
,T1.CardName
,T0.State
FROM CRD1 T0 LEFT OUTER JOIN OCRD T1 ON T0.CardCode=T1.CardCode
Usually this means you have multiple matches on the join predicate in the related table. The left outer join ensures you keep all rows from the left table regardless of match or not, but doesn't prevent multiple matches if they happen to exist. Example:
with _left (id)
as (
select 3 union all
select 4 union all
select 5
)
,_right(id)
as (
select 3 union all
select 3
)
select *
from _left l
left join _right r on l.id = r.id
Result:
id id
3 3
3 3
4 NULL
5 NULL
Use the keyword distinct in your select query.Then you will get only single records in the output grid.
Thanks.

Sql NOT IN optimization

I'm having trouble optimizing a query. Here are two example tables I am working with:
Table 1:
UID
A
B
Table 2:
UID Parent
A 2
B 2
C 3
D 2
E 3
F 2
Here is what I am doing now:
Select Table1.UID
FROM Table1 R
INNER JOIN Table2 T ON
R.UID = T.UID
INNER JOIN Table2 E ON
T.PARENT = E.PARENT
AND E.UID NOT IN (SELECT UID FROM Table1)
I'm trying to avoid using the NOT IN clause because of obvious hindrances in performance for large numbers of records.
I know the typical ways to avoid NOT IN clauses like the LEFT JOIN where the other table is null, but can't seem to get what I want with all of the other Joins going on.
I will continue working and post if I find a solution.
EDIT: Here is what I am trying to end up with
After the first Inner Join I would have
A
B
AFter the second Inner join I would have:
A D
A F
B D
B F
The second column above is just to represent that it is matching to the other UIDs with the same parent, but I still need the As and Bs as the UID.
EDIT: RDBMS is SQL server 2005, 2008r2, 2012
Table1 is declared in the query with no index
DECLARE #Table1 TABLE ( [UNIQUE_ID] INT PRIMARY KEY )
Table2 has a clustered index on Unique ID
The general approach to this is to use a LEFT JOIN with a where clause that only selects the non-matching rows:
Select Table1.UID
FROM Table1 R
JOIN Table2 T ON R.UID = T.UID
JOIN Table2 E ON T.PARENT = E.PARENT
LEFT JOIN Table3 E2 ON E.UID = R.UID
WHERE E2.UID IS NULL
SELECT Table2.*
FROM Table2
INNER JOIN (
SELECT id FROM Table2
EXCEPT
SELECT id FROM Table1
) AS Filter ON (Table2.id = Filter.id)

SQL joining 4 tables issue

I have four tables:
T1
ID ID1 TITLE
1 100 TITLE1
2 100 TITLE2
3 100 TITLE3
T2
ID TEXT
1 LONG1
2 LONG2
T3
ID1 ID2
100 200
T4
ID4 ID2 SUBJECT
1 200 A
2 200 B
3 200 C
4 200 D
5 200 E
I want output in this result format:
TITLE TEXT SUBJECT
TITLE1 LONG1 A
TITLE2 LONG2 B
TITLE3 null C
null null D
null null E
So I made this query but it gives me much more results than it should be.On example titles asre displayed more times than just once etc.
SELECT
t1.title,
t2.text,
t4.subject
FROM t1
LEFT OUTER JOIN t2 ON t1.id=t2.id
INNER JOIN t3 ON t1.id1=t3.id1
LEFT OUTER JOIN t4 ON t4.id2=t3.id2
WHERE
t1.id1=100
Thanks for help
Disclaimer: I don't work with DB2. After some browsing through documentation I have found that DB2 supports row_number() and full outer join, but I might easily be wrong.
To get rid of n:m relationship one has to build additional key. In this case simple solution is to add row number to each record in t1 and t4 and use it as join condition. Row_number does just that, produces numbers for groups of data defined by partition by in ascending sequence in order defined by order by.
As there is difference in number of records in t1 and t4, and it is unknown which one always has more records, I use full outer join to join them.
You can see the test (Sql Server version) # Sql Fiddle.
select t1_rn.title,
t2.[text],
t4_rn.subject
from
(
select t1.id,
t1.title,
t1.id1,
t3.id2,
row_number() over(partition by t1.id1
order by id) rn
from t1
inner join t3
on t1.id1 = t3.id1
) t1_rn
full outer join
(
select t4.subject,
t3.id1,
t4.id2,
row_number() over(partition by t4.id2
order by id4) rn
from t4
inner join t3
on t4.id2 = t3.id2
) t4_rn
on t1_rn.id1 = t4_rn.id1
and t1_rn.id2 = t4_rn.id2
and t1_rn.rn = t4_rn.rn
left join t2
on t1_rn.id = t2.id
This kind of work should definitely be done on presentation side of an application, but I believe that software you are using requires already prepared data.
try this :
select t1.title,t2.text,t4.subject
from t4
left join t3
on t4.id2=t3.id2
left join t1
on t1.id1=t3.id1
left join t2
on t1.id=t2.id
where t1.id=100
You should change your tables. Your last join does that to your output -just analyze your query. for every record from T1 you have every record from T4.
Outer joins are guaranteed to replicate rows, instead of matching only the ones you need. You may want to look at this:
http://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/
To understand what the join types are, and how you can use them.
You are looking for a list of subjects, with associated text and title, but this may not be unique; more than one null exist for each of the titles. You want to drive the join from table 4, and get a list of subjects, with associated titles for each.
Looking at your ouput it appears you want all subjects displayed. Knowing this you should first off build everything off this table.
SELECT columns
FROM T4
Next build up your inner joins.
SELECT columns
FROM T4 subjectTable
INNER JOIN T3 mapTable
ON mapTable.ID2 = subjectTable.ID2
When happy with them, add on your optional columns with the outer join.
SELECT columns
FROM T4 subjectTable
INNER JOIN T3 mapTable
ON mapTable.ID2 = subjectTable.ID2
LEFT OUTER JOIN T2 textTable
ON textTable.ID = subjectTable.ID4
LEFT OUTER JOIN T1 titleTable
ON titleTable.ID1 = mapTable.ID1
WHERE
subjectTable.ID = 100;