What kind of join is actually for the following sql statement?
select *
from table1 tbl1, table2 tbl2
where tbl1.id = tbl2.id
Does it only return result if both id matches?
This is an inner join.
Yes, only records that have matching IDs will be returned.
This is the same as:
select *
from table1 tbl1
inner join table2 tbl2
on tbl1.id = tbl2.id
Personally, I prefer the explicit notation of INNER JOIN.
Yes, that is ANSI-89 syntax for an inner join. ANSI-92 defines the [INNER,LEFT, etc...] JOIN keywords.
Related
I was trying to run a query that would make use of multiple joins inside HIVE.
example:
SELECT *
FROM table1
LEFT JOIN table2 -- the table resulted from the inner join should be left joined to table1
INNER JOIN table3 -- this inner join should happen first between table2 and table3
ON table3.id = table2.id
ON table2.id = table1.id
I think this is perfectly valid on other SQL DBMS's, but HIVE gives me an error. Are this kind of joins ( I really don't know what to call them so I can't google them) illegal in HIVE?
Workarounds would be some subquery unions, but I am more interested in getting more information on this kind of syntax.
Thanks!
This is valid SQL syntax and should be parsed as:
FROM table1 LEFT JOIN
(table2 INNER JOIN
table3
ON table3.id = table2.id
)
ON table2.id = table1.id
By convention, ON clauses are interleaved with JOINs, sot the conditions are where the JOIN is specified. However, the syntax allows for this construct as well.
I don't use such syntax -- and I strongly discourage using it without parentheses -- but I thought pretty much all databases supported it.
If parentheses don't work, you have two options. One is a subquery:
This is valid SQL syntax and should be parsed as:
FROM table1 LEFT JOIN
(SELECT table2.id, . . . -- other columns you want
FROM table2 INNER JOIN
table3
ON table3.id = table2.id
) t23
ON t23.id = table1.id
Or using a RIGHT JOIN:
SELECT table2 INNER JOIN
table3
ON table3.id = table2.id RIGHT JOIN
table1
ON table2.id = table1.id
In this case, the RIGHT JOIN should be equivalent. But it can be complicated getting exactly the same semantics when multiple joins are involved (and without using parentheses).
Say, I have the following query:
SELECT * FROM TABLE1
JOIN TABLE2 ON ...
LEFT JOIN TABLE3 ON ...
JOIN TABLE3_1 ON ...
JOIN TABLE3_2 ON ...
JOIN TABLE3_3 ON ...
What I want to achieve is for TABLE3, TABLE3_1, TABLE3_2, TABLE3_3 to have inner joins within them (I only need all the matching data between them, the rest gone). Then for TABLE1, TABLE2 to have inner joins too. But from TABLE1 + TABLE2 result, some won't have a corresponding entries to TABLE3, and that's okay, I will still want it.
Using the above pseudo code if I run it as it is, obviously it will not achieve the same result.
Use paretheses to force joins order, kind of
SELECT *
FROM (
TABLE1
JOIN TABLE2 ON ...)
LEFT JOIN (
TABLE3
JOIN TABLE3_1 ON ...
JOIN TABLE3_2 ON ...
JOIN TABLE3_3 ON ...) ON ...
Check this answer.
#Serg answer is correct but you do not need to use parentheses if you specify the ON condition at the end of the statement.
SELECT * FROM TABLE1
JOIN TABLE2 ON ...
LEFT JOIN TABLE3 ON ThisConditionShouldBeAtTheEnd
JOIN TABLE3_1 ON ...
JOIN TABLE3_2 ON ...
JOIN TABLE3_3 ON ...
you rewrite like this:
SELECT * FROM TABLE1
JOIN TABLE2 ON ...
LEFT JOIN TABLE3
JOIN TABLE3_1 ON ...
JOIN TABLE3_2 ON ...
JOIN TABLE3_3 ON ...
ON ThisConditionShouldBeAtTheEnd
See also this article for more explanation. The reason is that JOIN conditions are evaluated from left to right (top-down) and you need the LEFT join condition to be evaluated after previous inner joins.
Disclaimer: I didn't have a oracle DB at hand to check but hopefully it will contain ideas to help you.
Solution 1: You could use parenthesis to state the intermediate joined table of (TABLE3 x N). Pseudo-code:
select *
FROM TABLE1
inner join TABLE2 on (condition)
left join (
table3
inner join table3_1 on (condition)
inner join table3_2 on (condition)
inner join table3_3 on (condition)
) as table3_joined ON (table3_joined.ID = table2.id)
It works on MSSQL, at least. I cannot verify it works in oracle as well, but you could try. I consider this syntax very explicit and easy to follow/maintain.
Solution2: Alternative is to reuse the same left-to-right order that's troubling you for your advantage using right join. Pseudo-code:
select *
from table3
inner join table3_1 on (condition)
inner join table3_2 on (condition)
inner join table3_3 on (condition)
right join table2 on (condition)
inner join table1 on (condition)
This syntax probably works but imho using right joins makes the syntax a bit more uncomfortable to reason about.
An alternative to the other answers is a CTE (common table expression). This just has a query for the inner joined table3 group and a query for the inner joined table1/table 2 group and those two groups are outer joined in the main query. For me (and obviously this is subjective) I would find this easier to understand what was going on if I came across it in someone else's code.
WITH
t3_group AS
(SELECT *
FROM table3 ON ...
INNER JOIN table3_1 ON ...
INNER JOIN table3_2 ON ...
INNER JOIN table3_3 ON ... ),
t1_t2_group AS
(SELECT *
FROM table1
INNER JOIN table2 ON ...)
SELECT *
FROM t1_t2_group
LEFT JOIN t3_group ON ...
I appreciate this might be very simple for you guys but sometimes the logic behind JOIN can be difficult for beginners. I want to select "ID" from table1 but only those "ID"s which do NOT appear in table2."ID". I tested LEFT and RIGHT but cannot get it to work the way I need to. I am using dashDB.
You can use NOT IN and subquery
Select * from table1 where id NOT IN (select id from table2);
try this...
SELECT *
FROM table1
LEFT JOIN table2 ON table1.ID = table2.ID
WHERE table2.ID IS NULL
I always prefer NOT EXISTS to do this
Select * from table1 a
where NOT EXISTS (select 1 from table2 b where a.id = b.id);
Here is a excellent article by Aaron Bertrand that compares the performance of all the methods
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Use the below script.
SELECT t1.ID
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID = t2.ID
WHERE t2.ID IS NULL
I was trying to write a query with inner join only if RepID of Table1 exists in Table2, if not do not join table2. With the query that i used below, i do not get from both the tables if repID doesnot exist in Table2. How is it possible? I am using sql server 2005. Thank you in advance!
Select * from Table1
inner join Table2 on Table1.RepID = Table2.RepID
where Table1.Date = #Date
order by Table1.Date desc
An inner join will only return a row if matches are found in both sides of the join. If you're looking for something that will return all rows from Table1 but only records from Table2 when a match is found, you want a left outer join:
select * from Table1 as t1
left outer join Table2 as t2
on t1.RepID = t2.RepID
where t1.Date = #Date
order by t1.Date desc
Try "LEFT JOIN" instead of "INNER JOIN".
The word "LEFT" means "Always include every record from the table on the LEFT of the join," in this case Table1, since you will write: Table1 LEFT JOIN Table2, and "Table1" is on the left of that pair! :-)
it sounds like what you actually want is a left outer join, isnt it?
SELECT *
FROM Table1
LEFT JOIN Table2
ON Table1.RepID = Table2.RepID
WHERE Table1.Date = #Date
ORDER BY Table1.Date DESC;
That's what outer joins are for.
Select * from Table1
left outer join Table2 on Table1.RepID = Table2.RepID
where Table1.Date = #Date
order by Table1.Date desc
Are the following select statements SQL92 compliant?
SELECT table1.id, table2.id,*
FROM table1, table2
WHERE table1.id = table2.id
SELECT table1.Num, table2.id,*
FROM table1, table2
WHERE table1.Num = table2.id
Following on from StingyJack...
SELECT
table1.id,
table2.id,
*
FROM
table1
INNER JOIN
table2 ON table1.id = table2.id
WHERE
table1.column = 'bob'
SELECT table1.id, table2.id,* FROM table1, table2 WHERE table1.id = table2.id and table1.column = 'bob'
Where's the JOIN? Where's the filter?
JOIN also forces some discipline and basic checking: easier to avoid cross join or partial cross joins
Yes, the queries you show use SQL92 compliant syntax. My copy of "Understanding the New SQL: A Complete Guide" by Jim Melton & Alan R. Simon confirms it.
SQL92 still supports joins using the comma syntax, for backward compatibility with SQL89.
As far as I know, all SQL implementations support both comma syntax and JOIN syntax joins.
In most cases, the SQL implementation knows how to optimize them so that they are identical in semantics (that is, they produce the same result) and performance.
I might be wrong, but my understanding is that the SQL92 convention is to join tables using the JOIN statement (e.g. FROM table1 INNER JOIN table2).
Unfortunately, I believe they are, but that join syntax is more difficult to read and maintain.
I know that with MSSQL, there is no perfomance difference between either of these two join methods, but which one is easier to understand?
SELECT table1.id, table2.id,*
FROM table1, table2
WHERE table1.id = table2.id
SELECT
table1.id,
table2.id,
*
FROM table1
INNER JOIN table2
ON table1.id = table2.id