What's the best way to join on the same table twice? - sql

This is a little complicated, but I have 2 tables. Let's say the structure is something like this:
*Table1*
ID
PhoneNumber1
PhoneNumber2
*Table2*
PhoneNumber
SomeOtherField
The tables can be joined based on Table1.PhoneNumber1 -> Table2.PhoneNumber, or Table1.PhoneNumber2 -> Table2.PhoneNumber.
Now, I want to get a resultset that contains PhoneNumber1, SomeOtherField that corresponds to PhoneNumber1, PhoneNumber2, and SomeOtherField that corresponds to PhoneNumber2.
I thought of 2 ways to do this - either by joining on the table twice, or by joining once with an OR in the ON clause.
Method 1:
SELECT t1.PhoneNumber1, t1.PhoneNumber2,
t2.SomeOtherFieldForPhone1, t3.someOtherFieldForPhone2
FROM Table1 t1
INNER JOIN Table2 t2
ON t2.PhoneNumber = t1.PhoneNumber1
INNER JOIN Table2 t3
ON t3.PhoneNumber = t1.PhoneNumber2
This seems to work.
Method 2:
To somehow have a query that looks a bit like this -
SELECT ...
FROM Table1
INNER JOIN Table2
ON Table1.PhoneNumber1 = Table2.PhoneNumber OR
Table1.PhoneNumber2 = Table2.PhoneNumber
I haven't gotten this to work yet and I'm not sure if there's a way to do it.
What's the best way to accomplish this? Neither way seems simple or intuitive... Is there a more straightforward way to do this? How is this requirement generally implemented?

First, I would try and refactor these tables to get away from using phone numbers as natural keys. I am not a fan of natural keys and this is a great example why. Natural keys, especially things like phone numbers, can change and frequently so. Updating your database when that change happens will be a HUGE, error-prone headache. *
Method 1 as you describe it is your best bet though. It looks a bit terse due to the naming scheme and the short aliases but... aliasing is your friend when it comes to joining the same table multiple times or using subqueries etc.
I would just clean things up a bit:
SELECT t.PhoneNumber1, t.PhoneNumber2,
t1.SomeOtherFieldForPhone1, t2.someOtherFieldForPhone2
FROM Table1 t
JOIN Table2 t1 ON t1.PhoneNumber = t.PhoneNumber1
JOIN Table2 t2 ON t2.PhoneNumber = t.PhoneNumber2
What i did:
No need to specify INNER - it's implied by the fact that you don't specify LEFT or RIGHT
Don't n-suffix your primary lookup table
N-Suffix the table aliases that you will use multiple times to make it obvious
*One way DBAs avoid the headaches of updating natural keys is to not specify primary keys and foreign key constraints which further compounds the issues with poor db design. I've actually seen this more often than not.

The first is good unless either Phone1 or (more likely) phone2 can be null. In that case you want to use a Left join instead of an inner join.
It is usually a bad sign when you have a table with two phone number fields. Usually this means your database design is flawed.

You could use UNION to combine two joins:
SELECT Table1.PhoneNumber1 as PhoneNumber, Table2.SomeOtherField as OtherField
FROM Table1
JOIN Table2
ON Table1.PhoneNumber1 = Table2.PhoneNumber
UNION
SELECT Table1.PhoneNumber2 as PhoneNumber, Table2.SomeOtherField as OtherField
FROM Table1
JOIN Table2
ON Table1.PhoneNumber2 = Table2.PhoneNumber

The first method is the proper approach and will do what you need. However, with the inner joins, you will only select rows from Table1 if both phone numbers exist in Table2. You may want to do a LEFT JOIN so that all rows from Table1 are selected. If the phone numbers don't match, then the SomeOtherFields would be null. If you want to make sure you have at least one matching phone number you could then do WHERE t2.PhoneNumber IS NOT NULL OR t3.PhoneNumber IS NOT NULL
The second method could have a problem: what happens if Table2 has both PhoneNumber1 and PhoneNumber2? Which row will be selected? Depending on your data, foreign keys, etc. this may or may not be a problem.

My problem was to display the record even if no or only one phone number exists (full address book). Therefore I used a LEFT JOIN which takes all records from the left, even if no corresponding exists on the right. For me this works in Microsoft Access SQL (they require the parenthesis!)
SELECT t.PhoneNumber1, t.PhoneNumber2, t.PhoneNumber3
t1.SomeOtherFieldForPhone1, t2.someOtherFieldForPhone2, t3.someOtherFieldForPhone3
FROM
(
(
Table1 AS t LEFT JOIN Table2 AS t3 ON t.PhoneNumber3 = t3.PhoneNumber
)
LEFT JOIN Table2 AS t2 ON t.PhoneNumber2 = t2.PhoneNumber
)
LEFT JOIN Table2 AS t1 ON t.PhoneNumber1 = t1.PhoneNumber;

SELECT
T1.ID
T1.PhoneNumber1,
T1.PhoneNumber2
T2A.SomeOtherField AS "SomeOtherField of PhoneNumber1",
T2B.SomeOtherField AS "SomeOtherField of PhoneNumber2"
FROM
Table1 T1
LEFT JOIN Table2 T2A ON T1.PhoneNumber1 = T2A.PhoneNumber
LEFT JOIN Table2 T2B ON T1.PhoneNumber2 = T2B.PhoneNumber
WHERE
T1.ID = 'FOO';
LEFT JOIN or JOIN also return same result. Tested success with PostgreSQL 13.1.1 .

Related

Joining multiple tables with single join clause (sqlite)

So I'm learning SQL (sqlite flavour) and looking through the sqlite JOIN-clause documentation, I figure that these two statements are valid:
SELECT *
FROM table1
JOIN (table2, table3) USING (id);
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id)
(or even, but that's beside the point:
SELECT *
FROM table1
JOIN (table 2 JOIN table3 USING id) USING id
)
Now I've seen the second one (chained join) a lot in SO questions on JOIN clauses, but rarely the first (grouped table-query). Both querys execute in SQLiteStudio for the non-simplified case.
A minimal example is provided here based on this code
CREATE TABLE table1 (
id INTEGER PRIMARY KEY,
field1 TEXT
)
WITHOUT ROWID;
CREATE TABLE table2 (
id INTEGER PRIMARY KEY,
field2 TEXT
)
WITHOUT ROWID;
CREATE TABLE table3 (
id INTEGER PRIMARY KEY,
field3 TEXT
)
WITHOUT ROWID;
INSERT INTO table1 (field1, id)
VALUES ('FOO0', 0),
('FOO1', 1),
('FOO2', 2),
('FOO3', 3);
INSERT INTO table2 (field2, id)
VALUES ('BAR0', 0),
('BAR2', 1),
('BAR3', 3);
INSERT INTO table3 (field3, id)
VALUES ('PIP0', 0),
('PIP1', 1),
('PIP2', 2);
SELECT *
FROM table1
JOIN (table2, table3) USING (id);
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id);
Could someone explain why one would use one over the other and if they are not equivalent for certain input data, provide an example? The first certainly looks cleaner (at least less redundant) to me.
INNER JOIN ON vs WHERE clause has been suggested as a possible duplicate. While it touches on the use of , as a join operator, I feel the questions and especially the answers are more focussed on the readability aspect and use of WHERE vs JOIN. My question is more about the general validity and possible differences in outcome (given the necessary input to induce the difference).
SQLite does not enforce a proper join syntax. It sees the join operator ([INNER] JOIN, LEFT [OUTER] JOIN, etc., even the comma of the outdated 1980s join syntax) separate from the condition (ON, USING). That is not good, because it makes joins more prone to errors. The SQLite docs are hence a very bad reference for learning joins. (And SQLite itself a bad system for learning them, because the DBMS doesn't detect standard SQL join violations.)
Stick to the syntax defined by the SQL standard (and don't ever use comma-separated joins):
FROM table [alias]
((([INNER] | [(LEFT|FULL) [OUTER]]) JOIN table [alias] (ON conditions | USING ( columns ))) | (CROSS JOIN table [alias]))
((([INNER] | [(LEFT|FULL) [OUTER]]) JOIN table [alias] (ON conditions | USING ( columns ))) | (CROSS JOIN table [alias]))
...
(Hope, I've got this right :-) And I also hope this is readable enough :-| I've omitted NATURAL JOIN and RIGHT [OUTER] JOIN here, because I don't recommend using them at all.)
For table you can place some table name or view or a subquery (the latter including parentheses, e.g. (select * from mytable)). Columns in USING have to be surrounded by parentheses (e.g. USING (a, b, c)). (You can of couse use parentheses around ON conditions as well, if you find this more readable.)
In your case, a properly written query would be:
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id)
or
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.id = t1.id
JOIN table3 t3 ON t3.id = t1.id
for instance. The example suggests three 1:1 related tables, though. In real life these are extremely rare and a more typical example would be
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.t1_id = t1.id
JOIN table3 t3 ON t3.t2_id = t2.id
After fixing syntax, these are not the same for all tables, read the syntax & definitions of the join operators in the manual. Comma is cross join with lower precedence than join keyword joins. Different DBMS's SQLs have syntax variations. Read the manual. Some allow naked join for cross join.
using returns only one column for each specified column name & natural is using for all common columns; but other joins are based on cross join & return a column for every input column. So since here tables 2 & 3 have id columns the comma returns a table with 2 id columns. Then using (id) doesn't make sense since one operand has 2 id columns.
If only tables 1 & 3 have an id column, clearly the 2nd query can't join 1 & 2 using id.
There are always many ways to express things. In particular SQL DBMSs execute many different expressions the same way. Research re relational query implementation/optimization in general, in SQL & in your DBMS manual. Generally no simple query variations like these make a difference in execution for the simplest query engine. (We see that in SQLite cross join "is handled differently by the query optimizer".)
First learn to write straightforward queries & learn what the operators do & what their syntax & restrictions are.

Query data depends on data from other table

I'm trying to get data from a table which is only related through two different tables. For example, I have information "dataA" and need to get information "dataD".
How do I write a query to display dataA and dataD as they relate? I don't want to display all instances of dataD, just the ones that related to dataB and then to dataA. I'm sorry if this doesn't make enough sense.
You'll use JOINs to do this. Specifically, in this case, a LEFT OUTER JOIN is your best bet. Something like:
SELECT
Table1.dataA,
Table3.dataD
FROM
Table1
LEFT OUTER JOIN Table2 ON Table1.dataB = Table2.dataB
LEFT OUTER JOIN Table3 ON Table2.dataC = Table3.dataC
Working on the assumption that the "data" entries you put under each table name represent columns in the table, this is fairly simple:
select dataA
, dataD
from Table1 t1
join Table2 t2
on t1.dataB = t2.dataB
join Table3 t3
on T2.dataC = t3.dataC
If you need all the dataA regardless of whether there's a matching dataD (or other intervening column) you would change where I wrote 'join' to 'left join'.
And your question makes plenty of sense. :) I was just a little unclear from the diagram what you had in mind.

How do I put multiple criteria for a column in a where clause?

I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?
So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work
Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.

How to get a related row if one (another) row exists?

I'm aware that this question's title might be a little bit inaccurate but I couldn't come up with anything better. Sorry.
I have to fetch 2 different fields, one is always there, the other isn't. That means I'm looking at a LEFT JOIN. Good so far.
But the row I want shown is not the row whose existence is uncertain.
I would like to do something like:
Show name and picture, but only show the picture if that name has a picture_id. Otherwise show nothing for the picture, but I still want the names regardless(left join).
I know this might be a little confusing but there's some clever guys out here so I guess somebody will understand it.
I tried some approaches but I couldn't quite say what I want in SQL.
P.S.: solutions specific to Oracle are good too.
------------------------------------------------------------------------------------------------------------------------------------
EDIT I've tried some queries but the main problem I found is that, inside the ON clause, I am only able to reference the last table mentioned, in other words:
There are four tables from which I'm retrieving data, but I can only mention the last (third table) inside the on clause of the LEFT JOIN(which is the 4th table). I'll describe the tables hopefully that'll help. Try not to delve too much on the names, because they are in Portuguese:
There are 4 tables. The fields I want to retrieve are :TB395.dsclaudo and TB397.dscrecomendacao, for a given TB392.nronip. The tables are as follows:
TB392(laudoid,nronip,codlaudo) // laudoid is PK, references TB395
TB395(codlaudo,dsclaudo) //codlaudo is PK
TB398(laudoid,codrecomendacao) //the pair laudoid,codrecomendacao is PK , references TB397
TB397(codrecomendacao,dscrecomendacao) // codrecomendacao is PK
Fields with the same name are foreign keys.
The problem is that there's no guarantee that, for a given laudoid,there will be one codrecomendacao. But, if there is, I want the dscrecomendacao field returned, that's what I don't know how to do. But even if there isn't a corresponding codrecomendacao for the laudoid, I still want the dsclaudo field, that's why I think a LEFT JOIN applies.
Sounds like you want your primary row source to be the join of TB392 and TB395; then you want an outer join to TB398, and when that gets a match, you want to lookup the corresponding value in TB397.
I would suggest coding the primary join as one inline view; the join between the two extra tables as a second inline view; and then doing an outer join between them. Something like:
SELECT ... FROM
(SELECT ... FROM TB392 JOIN TB395 ON ...) join1
LEFT JOIN
(SELECT ... FROM TB398 JOIN TB397 ON ...) join2
ON ...
It would be nice if you could specify what your tables are, which columns are on which tables, and what columns they join on. Its not clear if you have two tables or only one. I guess you have two tables because you are talking about a LEFT JOIN, and seem to imply that the join is on the name column. So you can use the NVL2 function to accomplish waht you want. So guessing what I can from your question, maybe something like:
SELECT T1.name
, NVL2( T2.picture_id, T1.picture, NULL )
FROM table1 T1
LEFT JOIN
table2 T2
ON T1.name = T2.name
If you only have one table, then its even simpler
SELECT T1.name
, NVL2( T1.picture_id, T1.picture, NULL )
FROM table1 T1
I think you need:
SELECT ...
FROM
TB395
JOIN
TB392
ON ...
LEFT JOIN --- this should be a LEFT JOIN
TB398
ON ...
LEFT JOIN --- and this as well, so the previous is not cancelled
TB397
ON ...
The details may be not accurate:
SELECT
a.dsclaudo
, b.laudoid
, c.codrecomendacao
, d.dscrecomendacao
FROM
TB395 a
JOIN
TB392 b
ON b.codlaudo = a.codlaudo
LEFT JOIN
TB398 c
ON c.laudoid = b.laudoid
LEFT JOIN
TB397 d
ON d.codrecomendacao = c.codrecomendacao
Create two views and then do your left join on the views. For example:
Create View view392_395
as
SELECT
t1.laudoid,
t1.nronip,
t1.codlaudo,
t2.dsclaudo
FROM TB392 t1
INNER JOIN TB395 t2
ON t1.codlaudo
= t2.codlaudo
Create View view398_397
as
SELECT
t1.laudoid,
t1.codrecomendacao,
t2.dscrecomendacao
FROM TB398 t1
INNER JOIN TB397 t2
ON t1.codrecomendacao
= t2.codrecomendacao
SELECT
v1.laudoid,
v1.nronip,
v1.codlaudo,
v1.dsclaudo,
v2.codrecomendacao,
v2.dscrecomendacao
FROM view392_395 v1
LEFT OUTER JOIN view398_397 v2
ON v1.laudoid
= v2.laudoid
In my opinion, views are always under used. Views are your friend. They can simplify some of the most complicated queries.

Difference between "and" and "where" in joins

Whats the difference between
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
and table2.Id IN (2728)
and
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
where table2.Id IN (2728)
both return the same result and both have the same explain output
Firstly there is a semantic difference. When you have a join, you are saying that the relationship between the two tables is defined by that condition. So in your first example you are saying that the tables are related by cd.Company = table2.Name AND table2.Id IN (2728). When you use the WHERE clause, you are saying that the relationship is defined by cd.Company = table2.Name and that you only want the rows where the condition table2.Id IN (2728) applies. Even though these give the same answer, it means very different things to a programmer reading your code.
In this case, the WHERE clause is almost certainly what you mean so you should use it.
Secondly there is actually difference in the result in the case that you use a LEFT JOIN instead of an INNER JOIN. If you include the second condition as part of the join, you will still get a result row if the condition fails - you will get values from the left table and nulls for the right table. If you include the condition as part of the WHERE clause and that condition fails, you won't get the row at all.
Here is an example to demonstrate this.
Query 1 (WHERE):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
WHERE table2.Id IN (2728);
Result:
field1
200
Query 2 (AND):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
AND table2.Id IN (2728);
Result:
field1
100
200
Test data used:
CREATE TABLE table1 (Company NVARCHAR(100) NOT NULL, Field1 INT NOT NULL);
INSERT INTO table1 (Company, Field1) VALUES
('FooSoft', 100),
('BarSoft', 200);
CREATE TABLE table2 (Id INT NOT NULL, Name NVARCHAR(100) NOT NULL);
INSERT INTO table2 (Id, Name) VALUES
(2727, 'FooSoft'),
(2728, 'BarSoft');
SQL comes from relational algebra.
One way to look at the difference is that JOINs are operations on sets that can produce more records or less records in the result than you had in the original tables. On the other side WHERE will always restrict the number of results.
The rest of the text is extra explanation.
For overview of join types see article again.
When I said that the where condition will always restrict the results, you have to take into account that when we are talking about queries on two (or more) tables you have to somehow pair records from these tables even if there is no JOIN keyword.
So in SQL if the tables are simply separated by a comma, you are actually using a CROSS JOIN (cartesian product) which returns every row from one table for each row in the other.
And since this is a maximum number of combinations of rows from two tables then the results of any WHERE on cross joined tables can be expressed as a JOIN operation.
But hold, there are exceptions to this maximum when you introduce LEFT, RIGHT and FULL OUTER joins.
LEFT JOIN will join records from the left table on a given criteria with records from the right table, BUT if the join criteria, looking at a row from the left table is not satisfied for any records in the right table the LEFT JOIN will still return a record from the left table and in the columns that would come from the right table it will return NULLs (RIGHT JOIN works similarly but from the other side, FULL OUTER works like both at the same time).
Since the default cross join does NOT return those records you can not express these join criteria with WHERE condition and you are forced to use JOIN syntax (oracle was an exception to this with an extension to SQL standard and to = operator, but this was not accepted by other vendors nor the standard).
Also, joins usually, but not always, coincide with existing referential integrity and suggest relationships between entities, but I would not put as much weight into that since the where conditions can do the same (except in the before mentioned case) and to a good RDBMS it will not make a difference where you specify your criteria.
The join is used to reflect the entity relations
the where clause filters down results.
So the join clauses are 'static' (unless the entity relations change), while the where clauses are use-case specific.
There is no difference. "ON" is like a synonym for "WHERE", so t he second kind of reads like:
JOIN table2 WHERE cd.Company = table2.Name AND table2.Id IN (2728)
There is no difference when the query optimisation engine breaks it down to its relevant query operators.