What's the SQL Equivalent of the SPSS Partial Outer Join? - sql

I'm working on a project to convert an old SPSS system to Oracle. The current code makes heavy use of partial outer joins, and I'm not sure what the best way to replicate that in SQL would be.
Here's an example of a typical query:
SELECT
TA.A
TB.B
TC.C
FROM
TABLE_A TA
TABLE_B TB
TABLE_C TC
PARTIAL OUTER JOIN
TA.FIELD = TB.FIELD
AND TB.FIELD = TC.FIELD
From what I've read in the IBM Knowledge Base, a partial outer join is essentially a left/right outer join that merges multiple tables. IBM provides the following Venn diagram:
It seems like the best way to replicate this would be to do a full outer join between tables A and C, and then left outer joins from A to C and B to C.
Is that correct? Is there a better solution?
Thanks!

I asked around the office and one of the senior devs pointed me to this manual which defines a partial outer join as
Returns all records from the primary file (first file
in the FROM clause) whether or not there is a match in the secondary file(s)
In other words, take the first table in the FROM clause and do a left outer join to all the other tables.
So the original query:
SELECT
TA.A
TB.B
TC.C
FROM
TABLE_A TA
TABLE_B TB
TABLE_C TC
PARTIAL OUTER JOIN
TA.FIELD = TB.FIELD
AND TB.FIELD = TC.FIELD
Should become this:
SELECT
TA.A
TB.B
TC.C
FROM
TABLE_A TA LEFT OUTER JOIN TABLE_B TB ON TA.FIELD = TB.FIELD
LEFT OUTER JOIN TABLE_C TC ON TA.FIELD = TC.FIELD

You can do this in SQL using:
SELECT
TA.A,
TB.B,
TC.B
FROM
TABLE_A TA
FULL OUTER JOIN
TABLE_B TB
ON (TA.FIELD = TB.FIELD)
LEFT OUTER JOIN
TABLE_C TC
ON (TA.FIELD = TC.FIELD)
OR (TB.FIELD = TC.FIELD);
Every record in tables A and B will return, even if they do not match. Then only records from table C that match one in A or B will return.

The documentation is not at all clear. I'm not sure what SPSS does when there are rows in A and B that match different rows in C. One possibility is
select a.a, b.b, c.c
from a full join
b
on a.? = b.? left join
c
on a.? = c.? or b.? = c.?;

Related

What is a good way to make multiple full outer join?

I'd like to know if anyone would know an elegant and scalable method to full outer join multiple tables, given that I might want to regularly add new tables to the join?
For now my method consists in full joining table A with table B, store the result as a cte, then full joining the cte to table C, store the result as a cte2, full joining cte2 to table D... you got it.
Creating a new cte every time i want to add another table to the join is not very practical, but every other solutions i found so far have the same issue, there's always some kind of infinite looping either on ctes or in selects (like SELECT blabla FROM (SELECT blabla2 FROM..)).
Is there any way that i don't know that would help me perform this multiple full join without falling in an infinite recursive loop of ctes?
Thanks
EDIT: Sorry it seems it wasn't clear enough
When i perform a multiple full join in one query like:
SELECT
a.*, b.*, c.*
FROM
tableA a
FULL JOIN
tableB b
ON
a.id = b.id
FULL JOIN
tableC c
ON
a.id = c.id
If the id is present in tableB and tableC but not tableA, my result will create two lines where there should be one, because i joined b to a and c to a but not b to c. That's why i need to full join the result of the full join of a and b to c.
So if i have let's say five table instead of three, i need to full join the result of the full join of the result of the full join of the result of the full join... x)
This fiddle illustrates the problem.
If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.
Your join should therefore look like:
SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id
Most databases that support FULL JOIN also support USING, which is the simplest way to do what you want:
SELECT *
FROM tableA a FULL JOIN
tableB b
USING (id) FULL JOIN
tableC c
USING (id);
The semantics of USING mean that only non-NULL values are used, if such a value is available.

Getting a field from a table that is not in the left outer join in the same query

I have 4 tables, 3 of which are joined, but I need to get a field from the forth table (Table_D). For the sake of simplicity lets say they are Tables A, B, C, D.
Select Distinct A.Field_1, B.Field_2, C.Field_3
From Table_A
Left outer join B on A.Field_z= B.Field_z
Left Outer Join C on A.Field_z= C.Field_z
where A.Field_z in (1111);
This seems to work but I need a field in Table_D that is only connected to Table_A through Table_C.
How can I add it to the join? or can I?
Thanks!
WB
The on clause has essentially all the features of the where clause. on clauses for any part of the join can refer to any attribute of any entity in the from clause, I suspect your confusion comes from thinking that you can only join subsequent entities to the first entity in the from clause. If Table_D is related to Table_A through Table_C, you might see a from/join/on construct like:
select a.thing, b.things, c.thang, d.stuff
from Table_A a
left join Table_B b
on a.id = b.a_id
left join Table_C c
on a.id = c.a_id
left join Table D d
on d.id = c.d_id
where
a.this = b.that
Note that the word "outer" is implied in a left join, and is not necessary for syntactic completion.
Of course you can add it. You don't specify the logic, but something like this:
Select Distinct A.Field_1, B.Field_2, C.Field_3, D.??
From Table_A a Left outer join
B
on A.Field_z= B.Field_z Left Outer Join
C
on A.Field_z= C.Field_z Left Outer Join
D
on . . .
where A.Field_z in (1111);
Just fill in the conditions that you want. You want a left join, because the condition between c and d would otherwise turn the outer join to c into an inner join.
If I understand you correctly, I believe this would work:
Left Outer Join D on C.Field_z= D.Field_z
You can try this way. It will help i guess
Select Distinct A.Field_1, B.Field_2, C.Field_3, c.Field_4
From Table_A
Left outer join B on A.Field_z= B.Field_z
Left Outer Join ( select C.field_3, d.Field_4, c.Field_z from c inner join d on c.field_z = d.field_z ) c on A.Field_z= C.Field_z
where A.Field_z in (1111);

Joining multiple tables in SQL

Can sombody Explains me about joins?
Inner join selects common data based on where condition.
Left outer join selects all data from left irrespective of common but takes common data from right table and vice versa for Right outer.
I know the basics but question stays when it comes to join for than 5, 8, 10 tables.
Suppose I have 10 tables to join. If I have inner join with the first 5 tables and now try to apply a left join with the 6th table, now how the query will work?
I mean to say now the result set of first 5 tables will be taken as left table and the 6th one will be considerded as Right table? Or only Fifth table will be considered as left and 6th as right? Please help me regarding this.
When joining multiple tables the output of each join logically forms a virtual table that goes into the next join.
So in the example in your question the composite result of joining the first 5 tables would be treated as the left hand table.
See Itzik Ben-Gan's Logical Query Processing Poster for more about this.
The virtual tables involved in the joins can be controlled by positioning the ON clause. For example
SELECT *
FROM T1
INNER JOIN T2
ON T2.C = T1.C
INNER JOIN T3
LEFT JOIN T4
ON T4.C = T3.C
ON T3.C = T2.C
is equivalent to (T1 Inner Join T2) Inner Join (T3 Left Join T4)
It's helpful to think of JOIN's in sequence, so the former is correct.
SELECT *
FROM a
INNER JOIN b ON b.a = a.id
INNER JOIN c ON c.b = b.id
LEFT JOIN d ON d.c = c.id
LEFT JOIN e ON e.d = d.id
Would be all the fields from a and b and c where all the ON criteria match, plus the values from d where its criteria match plus all the contents of e where all its criteria match.
I know RIGHT JOIN is perfectly acceptable, but I've found in my experience that it's unnecessary - I almost always just join things from left to right.
> Simple INNER JOIN VIEW code...
CREATE VIEW room_view
AS SELECT a.*,b.*
FROM j4_booking a INNER JOIN j4_scheduling b
on a.room_id = b.room_id;
You can apply join like this..
select a.*,b.*,c.*,d.*,e.*
from [DatabaseName].[Table_a] a
INNER JOIN [DatabaseName].[Table_b] b ON a.id = b.id
INNER JOIN [DatabaseName].[Table_c] c ON b.id=c.id
INNER JOIN [DatabaseName].[Table_d] d on c.id=d.id
INNER JOIN [DatabaseName].[Table_e] e on d.id=e.id where a.con=5 and
b.con=6
Here, at place of a.* and in where condition, you can show column(filed) which you like and according condition in where condition. You can insert more table and database as per your choice. But mind that you need to mention database name and alias if you work in different database.
Just tried the following from the Example DataBase given in W3School. Worked Fine for me.
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate, Products.ProductName, Products.ProductID
FROM Orders
INNER JOIN Products
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Join used to combine rows from two or more tables, based on a related column between them. This example from Adventure works:
SELECT a.[EmailAddress],b.[FirstName],b.[LastName],c.[PhoneNumber],d.[Name]
FROM [Person].[EmailAddress] a
INNER JOIN [Person].[Person] b
ON a.BusinessEntityID = b.BusinessEntityID
INNER JOIN [Person].[PersonPhone] c
ON b.BusinessEntityID = c.BusinessEntityID
INNER JOIN [Person].[PhoneNumberType] d
ON c.phoneNumberTypeID = d.phoneNumberTypeID

Do I have to do a LEFT JOIN after a RIGHT JOIN?

Say I have three tables in SQL server 2008 R2
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
LEFT JOIN Table_C c ON b.id = c.id
or
SELECT a.*, b.*, c.*
FROM
Table_A a
RIGHT JOIN Table_B b ON a.id = b.id
JOIN Table_C c ON b.id = c.id
also, does it matter if I use b.id or a.id on joining c?
i.e. instead of JOIN Table_C c ON b.id = c.id, use JOIN Table_C c ON a.id = c.id
Thank you!
If it doesn't change the semantics of the query, the database server can reorder the joins to run in whichever way it thinks is more efficient.
Usually, if you want to force a certain order, you can use inline view subqueries, as in
SELECT a.*, x.*
FROM
Table_A a
RIGHT JOIN
(
SELECT *, b.id as id2 FROM Table_B b
LEFT JOIN Table_C c ON b.id = c.id
) x
ON a.id = x.id2
According to the definitions:
JOIN
: Return rows when there is at least one match in both tables
LEFT JOIN Return all rows from the left table, even if there are no matches in the right table
RIGHT JOIN Return all rows from the right table, even if there are no matches in the left table
The first option would include all raws from the 1st Join on Tables a and b even if there are no matching ones in table c, while the second statement would show only raws which match ones in table c.
regarding the second question i guess it would make a difference, since the 1st join includes all ids from table b, even though there are no matching ones in table a, so once you change your Join creterium to a.id you will get a different set of ids than b.id.
Yes, you do need a LEFT JOIN after a RIGHT JOIN
See
http://sqlfiddle.com/#!3/2c079/5/0
http://sqlfiddle.com/#!3/2c079/6/0
If you don't, the (inner) JOIN at the end will cancel out the effect of your RIGHT JOIN.
That wouldn't make any sense to have a RIGHT JOIN if you don't care. And if you care, you will have to add a LEFT JOIN after it.

INNER and LEFT OUTER join help

Say I have 3 tables. TableA, TableB, TableC
I need to operate the recordset that is available after a INNER JOIN.
Set 1 -> TableA INNER JOIN TableB
Set 2 -> TableC INNER JOIN TableB
I need the Set 1 irrespective of if Set 2 is empty or not (LEFT OUTER JOIN) comes to mind.
So essentially, I am trying to write a query and have come this far
SELECT *
FROM TableA
INNER JOIN TableB ON ...
LEFT OUTER JOIN (TableC INNER JOIN TableB)
How would I write in SQL Server?
EDIT: In reality, what I am trying to do is to join multiple tables. How would your response change if I need to join multiple tables ex: OUTER JOIN OF (INNER JOIN of TableA and TableB) and (INNER JOIN OF TableC and TableD) NOTE: There is a new TableD in the equation
SELECT * FROM TableA
INNER JOIN TableB ON TableB.id = TableA.id
LEFT JOIN TABLEC ON TABLEC.id = TABLEB.id
I Don't know what columns you are trying to use but it is just that easy
Edit:
Looking at your edit it seems that you are confused about what Joins actually do. In the example I have written above you will recieve the following results.
Columns -> You will get all of the columns for TableA,TableB and TableC
Rows-> You will start off with all of the rows from tableA. Next you will remove all rows from TableA that do not have a matching "id" in Table B.(You will have duplicates if it is not a 1:1 relationship between TableA and TableB).
Now if you take the results from above you will match any records from TableC that match the TableB.id column. Any rows from above that do not have a matching TableC record will get a null value for all of the columns from TableC in the results.
ADVICE- I am betting that only part of this made sense to you but my advice is that you start writing some queries, predict the results and then see if your predictions are correct to see if you understand what it is doing.
What you want isn't a JOIN but a UNION.
SELECT * FROM TableA INNER JOIN TableB ON ...
UNION
SELECT * FROM TableC INNER JOIN TableD ON ...
You can actually add an ordering to your joins just like in a math equation where you might do this: (5 + 4) * (3 + 1).
Given the second part of your question, give this a try:
SELECT
<your columns>
FROM
(TableA INNER JOIN Table B ON <join criteria for A to B>)
LEFT OUTER JOIN
(TableC INNER JOIN Table D ON <join criteria for C to D>) ON
<join criteria for AxB to CxD>
Select * from ((((TableA a inner join TableB b on a.id = b.id)
left outer join TableC c on b.id = c.id)
full outer join TableD d on c.id = d.id)
right outer join TableE e on e.id = d.id)
/* etc, etc... */
You can lose the brackets if you want.
try this..
SELECT *
FROM TableA a
INNER JOIN TableB b ON a.id=b.id
LEFT OUTER JOIN (SELECT *
FROM TableC c
INNER JOIN TableD d on c.id=d.id
) dt on b.id=dt.id
You didn't give your join conditions or explain how the tables are intended to be related, so it's not obvious how this might be simplified.
SELECT a.a_id, b1.b_id b1_id, b2_id, bc.c_id
FROM TableA a JOIN TableB b1 on a.b_id = b1.b_id
LEFT JOIN (SELECT c.c_id, b2.b_id b2_id
FROM TableC c JOIN TableB b2 ON c.b_id = b2.b_id
) bc ON bc.c_id = a.c_id;
Looking at your latest edit, you can do something along the lines of:
SELECT <columns>
FROM (SELECT <columns> FROM TableA JOIN TableB ON <A-B join conditions>)
LEFT JOIN
(SELECT <columns> FROM TableC JOIN TableD ON <C-D join conditions>)
ON <AB-CD join conditions>
Although you don't actually need the inner projections, and can do:
SELECT <columns>
FROM (TableA a JOIN TableB b ON <A-B join conditions>)
LEFT JOIN
(TableC c JOIN TableD d ON <C-D join conditions>)
ON <AB-CD join conditions>
Where the AB-CD join conditions are written in terms of columns of a, b, c, d etc directly.
Since you're using Sql Server, why not create views that help you? Stuffing everything in a gigantic Sql statement can become hard to read. An example view might look like:
create view AandB
as
select *
from A
inner join B on B.aid = A.aid
And the same for CandD. Then you can retrieve the optional join with simple Sql:
select *
from AndB
left outer join CandD on AndB.cid = CandD.cid
If you're interested in rows from both sets, you can do a full join:
select *
from AndB
full outer join CandD on AndB.cid = CandD.cid
Assuming I Understand your question, I think this is what you're asking for:
SELECT *
FROM TableA INNER JOIN TableB on TableA.JoinColumn = TableB.JoinColumn
LEFT OUTER JOIN TableC on TableB.JoinColum = TableC.JoinColumn
INNER JOIN TableD on TableC.JoinColumn = TableD.JoinColumn
Note that the JoinColumn used to join A & B doesn't necesarilly have to be the same column as the one used to join B & C, and so on for C & D.
SELECT *
FROM TableA A
INNER JOIN TableB B ON B.?? = A.?? AND ...
LEFT JOIN TableC C ON C.?? = B.?? AND ...
LEFT JOIN TableB B2 ON B2.?? = C.?? AND ...
LEFT JOIN TableD D ON D.?? = C.?? AND ...
So here's the thing: logically, joins aren't actually between specific tables, they are between a table and the rest of the "set" (of joins and tables). So while you know that there is a 1-to-1 relationship between C and B2 or between C and D, you can't INNER JOIN to C because C could be null from it's LEFT JOIN to B, which will eliminate those rows, effectively undoing your LEFT join.
So basically, any joins to a table that's LEFT outer joined must also be LEFT outer joined. Does this make sense?