Join evaluation order in HIVE - sql

I was trying to run a query that would make use of multiple joins inside HIVE.
example:
SELECT *
FROM table1
LEFT JOIN table2 -- the table resulted from the inner join should be left joined to table1
INNER JOIN table3 -- this inner join should happen first between table2 and table3
ON table3.id = table2.id
ON table2.id = table1.id
I think this is perfectly valid on other SQL DBMS's, but HIVE gives me an error. Are this kind of joins ( I really don't know what to call them so I can't google them) illegal in HIVE?
Workarounds would be some subquery unions, but I am more interested in getting more information on this kind of syntax.
Thanks!

This is valid SQL syntax and should be parsed as:
FROM table1 LEFT JOIN
(table2 INNER JOIN
table3
ON table3.id = table2.id
)
ON table2.id = table1.id
By convention, ON clauses are interleaved with JOINs, sot the conditions are where the JOIN is specified. However, the syntax allows for this construct as well.
I don't use such syntax -- and I strongly discourage using it without parentheses -- but I thought pretty much all databases supported it.
If parentheses don't work, you have two options. One is a subquery:
This is valid SQL syntax and should be parsed as:
FROM table1 LEFT JOIN
(SELECT table2.id, . . . -- other columns you want
FROM table2 INNER JOIN
table3
ON table3.id = table2.id
) t23
ON t23.id = table1.id
Or using a RIGHT JOIN:
SELECT table2 INNER JOIN
table3
ON table3.id = table2.id RIGHT JOIN
table1
ON table2.id = table1.id
In this case, the RIGHT JOIN should be equivalent. But it can be complicated getting exactly the same semantics when multiple joins are involved (and without using parentheses).

Related

SQL Join tables with empty values

I have 2 tables
Lets say Table1 and Table2
They both have one shared value(id)
What I'm looking for is whether there is any function to combine them both based on that key, however if table2 has more elements, i want columns of table1 to be empty, and if table1 has more elements, table 2 columns to be empty
I tried a lot of different joins, but most of the time I end up with a lot of duplicate values as it tries to fill in both sides.
Tried Full outer join, Full join, etc
You are looking for full join:
select t1.*, t2.*
from t1 full join
t2
on t1.id = t2.id;
The above code from Gordon is right. However, since you have not specified the database and its version, I will post an alternate version for MySQL, which should also work for other databases.
Without duplicates:
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id
UNION
SELECT * FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id
With duplicates:
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id
UNION ALL
SELECT * FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id

Exactly when to user inner join or an alternative query

SELECT .... FROM TABLE1 T1, TABLE2 T2, TABLE3 T3
WHERE T1.NAME = 'ABC' AND T1.ID = T2.COL_ID AND T2.COL1 = T3.COL2
vs
SELECT .... FROM TABLE1 T1
WHERE T1.NAME = 'ABC'
INNER JOIN TABLE2 T2 ON T1.ID = T2.COL_ID
INNER JOIN TABLE3 T3 ON T2.COL1 = T3.COL2
Two questions
In terms of performance, which will perform better and why?
If Option 2 has the better performance, when should be using Option 1? (vice versa question if Option 1 has better performance)
The second query is not correct. It should be:
SELECT .... FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.ID = T2.COL_ID
INNER JOIN TABLE3 T3 ON T2.COL1 = T3.COL2
WHERE T1.NAME = 'ABC'
This is the right way to write your join condition. The 1st one is accepted, but technically creates a cartesian product. All modern database deals perfectly with both 1st and 2nd queries and interprets them the same way, therefore, performance should be the same. But still, you should use the second one because it is more readable and allows you to have only one way to write join weither it is a inner, left or full outer.
The answer is easy: Don't use comma-separated joins (first query). We used these in the 1980s for the lack of something better, but then in 1992 the new syntax (second query) was introduced1, because the old syntax was error-prone (it was easier to forget to apply join criteria) and harder to maintain (was missing join criteria intended or not in a query?) and there was no standard syntax for outer joins.
1 Oracle was a little late though featuring the new syntax. They introduced the new ANSI joins in Oracle 9i in 2001.
In terms of performance: There should be no difference in speed, because DBMS optimizers see that this is essentially the same query.
Your second query is syntactically incorrect by the way. The query's WHERE clause belongs after the complete FROM clause, i.e. after all the joins:
SELECT ....
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.col_id
INNER JOIN table3 t3 ON t2.col1 = t3.col2
WHERE t1.name = 'ABC';

Outer Join 3 Tables in Rails

I have 3 models. Table1 belongs to Table2, and Table2 belongs to Table3.
I want to get an ActiveRecord::Relation that includes all of the fields from all 3 tables, including nulls (outer joining to get all of Table1), with a WHERE clause on Table1 and an order by a column in Table3.
What I want in SQL is:
SELECT * FROM Table1 LEFT OUTER JOIN Table2 ON Table2.id = Table1.table2_id
LEFT OUTER JOIN Table3 ON Table3.id = Table2.table3_id
WHERE Table1.column1 = "example"
ORDER BY Table3.table3_column
However, I have been trying for hours now to do this in rails and getting nowhere. Is it possible?
#records = Table1.joins(:table2).joins(:table3).where(:column1 => "example").order("table3_column")
(For example), gets me nowhere because it is looking for an association between Table1 and Table3, which doesn't exist other than through Table2. I need to join once, then join on top of that. Not to mention that is an inner join. I've tried of the form:
#records = Table1.joins("LEFT OUTER JOIN Table2 ON Table2.id = Table1.table2_id LEFT OUTER JOIN Table3 ON Table3.id = Table2.table3_id")
But I get nil from that.
Thanks for any help with this.
Try find_by_sql
Table1.find_by_sql("SELECT * FROM Table1
LEFT OUTER JOIN Table2 ON Table2.id = Table1.table2_id
LEFT OUTER JOIN Table3 ON Table3.id = Table2.table3_id
WHERE Table1.column1 = "example"
ORDER BY Table3.table3_column")
refer find by sql

Selecting records based on two other tables

I have a query with inner join to another table, with this I want also include records which are contained in another column.
Example:
select name, address from table1
inner join table2 on table1.id = table2.id
With this, I want to also include rows which are having table1.recno = (1,2,4).
How could I write query for that?
One option I know is to use the IN keyword instead of the first table join. But our client doesn't want to use the IN keyword.
Use a left join and then use the WHERE clause to filter out the rows that you need.
select name, address
from table1
left join table2 on table1.id = table2.id
where
table2.id IS NOT NULL OR table1.ID In (1,2,4)
Or if you want to avoid an innocuous IN for silly reasons, use:
select name, address
from table1
left join table2 on table1.id = table2.id
where
table2.id IS NOT NULL
OR table1.ID = 1
OR table1.ID = 2
OR table1.ID = 4

Are the following select statements SQL92 compliant?

Are the following select statements SQL92 compliant?
SELECT table1.id, table2.id,*
FROM table1, table2
WHERE table1.id = table2.id
SELECT table1.Num, table2.id,*
FROM table1, table2
WHERE table1.Num = table2.id
Following on from StingyJack...
SELECT
table1.id,
table2.id,
*
FROM
table1
INNER JOIN
table2 ON table1.id = table2.id
WHERE
table1.column = 'bob'
SELECT table1.id, table2.id,* FROM table1, table2 WHERE table1.id = table2.id and table1.column = 'bob'
Where's the JOIN? Where's the filter?
JOIN also forces some discipline and basic checking: easier to avoid cross join or partial cross joins
Yes, the queries you show use SQL92 compliant syntax. My copy of "Understanding the New SQL: A Complete Guide" by Jim Melton & Alan R. Simon confirms it.
SQL92 still supports joins using the comma syntax, for backward compatibility with SQL89.
As far as I know, all SQL implementations support both comma syntax and JOIN syntax joins.
In most cases, the SQL implementation knows how to optimize them so that they are identical in semantics (that is, they produce the same result) and performance.
I might be wrong, but my understanding is that the SQL92 convention is to join tables using the JOIN statement (e.g. FROM table1 INNER JOIN table2).
Unfortunately, I believe they are, but that join syntax is more difficult to read and maintain.
I know that with MSSQL, there is no perfomance difference between either of these two join methods, but which one is easier to understand?
SELECT table1.id, table2.id,*
FROM table1, table2
WHERE table1.id = table2.id
SELECT
table1.id,
table2.id,
*
FROM table1
INNER JOIN table2
ON table1.id = table2.id