Hive - Where and OR clause error - hive

Hi I am trying to run this query in Hive, but get the error 10249 (Unsupported query expression - only 1 subquery is supported...)
select count(*) from
(
select * from tableA
union all
select * from tableB
) a
where a.field1 in (select fieldA in tableC)
or a.field2 in (select fieldA in tableC)
or a.field3 in (select fieldA in tableC);
Would anybody know how I can write this so that Hive supports this query (works fine in SQL server)

Since you do not need fields from tableC, you can use left semi join instead of in:
select count(*) from
(
select * from tableA
union all
select * from tableB
) a
left semi join tableC c1 on a.field1=c1.fieldA
left semi join tableC c2 on a.field2=c2.fieldA
left semi join tableC c3 on a.field3=c3.fieldA
;
left semi join is half join, the result set contains fields only from one of joined tables, only joined rows returned, similar to inner join but will not create duplicates if right table contains multiple matching rows.
LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way.

Covert you sub query in CTE , left join and use or condition in where clause.
IE.
with temp as
(
select * from tableA
union all
select * from tableB
)
select COUNT(a.*)
from temp a left join tableC a1 on a.field1 =a1.fieldA
left join tableC a2 on a.field2 =a2.fieldA
left join tableC a3 on a.field3 =a3.fieldA
where a1.fieldA is not null
or a3.fieldA is not null
or a3.fieldA is not null

Related

Compare two tables via the three tables SQL

I plan to compare two tables via the three table. my query is as the following
Select count(*)
from tableA a
join tableB b
on a.A_ID =b.A_ID
full join tableC c
on c.B_ID=b.B_ID
where a.A_ID is null or c.B_ID is null
if the count is zore, then the tableA and TableC match, otherwise, these two tables do not match
It takes a long time to run the query. Do we have a way to compare tableA and tableC fast?
Question: How to compare tableA and tableC?
You don't need a FULL JOIN here, you can just join the first two tables, then EXCEPT the third.
SELECT COUNT(*)
FROM (
SELECT b.B_ID, a.A_Name
FROM tableA a
JOIN tableB b ON a.A_ID = b.A_ID
EXCEPT
SELECT c.B_ID, c.B_Name
FROM tableC c
) t;
Note that EXCEPT implies DISTINCT so if you want the number of rows extra in table_A then you need to use WHERE NOT EXISTS
SELECT b.B_ID, a.A_Name
FROM tableA a
JOIN tableB b ON a.A_ID = b.A_ID
WHERE NOT EXISTS (SELECT 1
FROM tableC c
WHERE c.B_ID = b.B_ID AND c.B_Name = a.A_Name
);
If you want to find rows that are in one table but not the other you can use the EXCEPT union in standard SQL
-- Show in A not in B
SELECT * FROM TableA EXCEPT SELECT * FROM TableB
-- Show in A not in C
SELECT * FROM TableA EXCEPT SELECT * FROM TableC
-- Show in B not in A
SELECT * FROM TableB EXCEPT SELECT * FROM TableA
-- Show in B not in C
SELECT * FROM TableB EXCEPT SELECT * FROM TableC
-- Show in C not in A
SELECT * FROM TableC EXCEPT SELECT * FROM TableA
-- Show in C not in B
SELECT * FROM TableC EXCEPT SELECT * FROM TableB

SQL condition with NULL

I have the following working select:
SELECT
TableA.FullName
FROM
TableA,
TableB
WHERE
TableA.Contact = TableB.Contact
But I have some lines in TableB that have NULL in the TableB.Contact column, and I would like to show it. I tried:
WHERE
(TableA.Contact = TableB.Contact OR TableB.Contact IS NULL)
Without lucky.
You need to use a LEFT JOIN
SELECT
a.FullName, a.Contact, b.Contact
FROM TableA a
LEFT JOIN TableB b ON b.Contact = a.Contact
WHERE
(a.Contact = b.Contact OR b.Contact IS NULL)
The WHERE clause is redundant here - is does the same as the LEFT JOIN
You are using an outdated JOIN format. Try changing it to this:
SELECT column1, column2, column3 FROM
TableA as a
LEFT JOIN TableB as b
ON a.contact = b.contact
Using a left join will pull all records from A regardless if there is a match in B. This is a good explanation of how the various joins work:
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
Also, notice that I am specifically pulling only the columns I want instead of *. This is a good habit to get into with SQL.
UPDATE:
After your comments, it seems like you are looking for a FULL OUTER JOIN?
SELECT column1, column2
FROM TableA as a
FULL OUTER JOIN TableB as b
ON a.contact = b.contact
Try changing the join condition:
SELECT *
FROM TableA JOIN
TableB
ON TableA.Contact = TableB.Contact OR
TableA.Contact IS NULL AND TableB.Contact IS NULL;
A left outer join won't do exactly what you want. It keeps all rows in TableA, but doesn't check for matching values.
For performance reasons, the following might work better:
SELECT *
FROM TableA JOIN
TableB
ON TableA.Contact = TableB.Contact
UNION ALL
SELECT *
FROM TableA CROSS JOIN
TableB
WHERE TableA.Contact IS NULL AND TableB.Contact IS NULL;

In SQL, What's the difference a ON condition following a Join vs at the end of multiple JOINS

I have been having a hard time googling an answer for this, but....
can someone explain to me the difference between putting the ON condition of a JOIN with the the JOIN itself vs putting the ON at the end of all the other JOINs.
here is an example http://sqlfiddle.com/#!3/e0a0f/3
CREATE TABLE TableA (Email VARCHAR(100), SomeNameA VARCHAR(100))
CREATE TABLE Tableb (Email VARCHAR(100), SomeNameB VARCHAR(100))
CREATE TABLE Tablec (Email VARCHAR(100), SomeNameC VARCHAR(100))
INSERT INTO TableA SELECT 'joe#test.com', 'JoeA'
INSERT INTO TableA SELECT 'jan#test.com', 'JaneA'
INSERT INTO TableA SELECT 'dave#test.com', 'DaveA'
INSERT INTO TableB SELECT 'joe#test.com', 'JoeB'
INSERT INTO TableB SELECT 'dave#test.com', 'DaveB'
INSERT INTO TableC SELECT 'joe#test.com', 'JoeC'
INSERT INTO TableC SELECT 'dave#test.com', 'DaveC'
SELECT TOP 2 a.*,
b.*,
c.*
FROM TableA a
LEFT OUTER JOIN TableB b
ON a.email = b.email
INNER JOIN TableC c
ON c.Email = b.email;
SELECT TOP 2 a.*,
b.*,
c.*
FROM TableA a
LEFT OUTER JOIN TableB b
INNER JOIN TableC c
ON c.Email = b.email
ON a.email = b.email;
I don't understand why these two SELECT statements produce different results.
What matters is orders of joins. Treat your expressions as if every join produced temporary "virtual" table.
So when you write
FROM TableA a
LEFT OUTER JOIN TableB b ON a.email = b.email
INNER JOIN TableC c ON c.Email = b.email ;
then order is as follows:
TableA is left joined to TableB producing temporary relation V1
V1 is inner joined to TableC.
Meanhwile when you write:
FROM TableA a
LEFT OUTER JOIN TableB b
INNER JOIN TableC c ON c.Email = b.email ON a.email = b.email;
then order is as follows:
TableB is inner joined to TableC producing temporary relation V1.
TableA is left joined to V1.
Thus results are different. It is generally recommended to use parenthesis in such situations to improve readability of the query:
FROM TableA a
LEFT OUTER JOIN
(TableB b INNER JOIN TableC c ON c.Email = b.email)
ON a.email = b.email;
In your second example, the part ON a.email = b.email belongs to the LEFT JOIN.
If written like this, it means the following:
INNER JOIN TableC with TableB and LEFT OUTER JOIN the result with TableA.
The result will be all rows from TableA joined with those rows from TableB that also have an entry in TableC.
The first example means the following:
LEFT OUTER JOIN TableB with TableA and INNER JOIN TableC with the result. This is equivalent to using an INNER JOIN for TableB.
Explanation: When you LEFT OUTER JOIN TableA with TableB you will get all rows from TableA and for matching rows in TableB you will get that data, too. In your result set you will have rows with b.email = NULL and this will now be INNER JOINed with TableC. As long as there is no entry in TableC with email = NULL you will get the results you observed.

SELECT * FROM tableA, tableB WHERE Conditions [+]

I have the following query
SELECT *
FROM tableA, tableB
WHERE Conditions [+]
What does this keyword Conditions[+] Stands for?
How this query behaves as a outer join?
That is old Oracle Join syntax.
SELECT *
FROM tableA, tableB
WHERE Conditions [+] -- this should be tableA (+) = tableB
The positioning of the + sign denotes the JOIN syntax.
If you query was:
SELECT *
FROM tableA, tableB
WHERE tableA.id (+) = tableB.Id
Then it would be showing a RIGHT OUTER JOIN so the equivalent is:
SELECT *
FROM tableA
RIGHT OUTER JOIN tableB
ON tableB.id = tableA.Id
If the + sign was on the other side then it would be a LEFT OUTER JOIN
SELECT *
FROM tableA, tableB
WHERE tableA.id = tableB.Id (+)
is equivalent to
SELECT *
FROM tableA
LEFT OUTER JOIN tableB
ON tableA.id = tableB.Id
I would advise using standard join syntax though.
If you do not specify a + sign then it will be interpreted as an INNER JOIN
SELECT *
FROM tableA, tableB
WHERE tableA = tableB
it's equivalent is:
SELECT *
FROM tableA
INNER JOIN tableB
ON tableA.id = tableB.id
A FULL OUTER JOIN would be written using two SELECT statements and a UNION:
SELECT *
FROM tableA, tableB
WHERE tableA.id = tableB.Id (+)
UNION
SELECT *
FROM tableA, tableB
WHERE tableA.id (+) = tableB.Id
It's equivalent is:
SELECT *
FROM tableA
FULL OUTER JOIN tableB
ON tableA.id = tableB.id
Here is a tutorial that explains a lot of these:
Old Outer Join Syntax
It is not important how that behaves. You should use the standard syntax for outer joins:
select *
from tableA left outer join
tableB
on . . .
The "(+)" syntax was introduced by Oracle before the standard syntax, and it is highly out of date.
'Conditions' here just means what you're using to filter all this data.
LIke here's an example:
SELECT *
FROM tableA, tableB
WHERE Name like '%Bob%'
would return names that have "Bob" anywhere inside.
About outer joins, actually you'd use that in the FROM clause:
So maybe
SELECT *
FROM tableA ta
OUTER JOIN tableB tb
ON ta.name = tb.name
WHERE ta.age <> 10
and there where here is optional, by the way
I hate to just copy & paste an answer, but this sort of thing can be found pretty easily if you do a little searching...
An outer join returns rows for one table, even when there are no
matching rows in the other. You specify an outer join in Oracle by
placing a plus sign (+) in parentheses following the column names from
the optional table in your WHERE clause. For example:
SELECT ut.table_name, uc.constraint_name
FROM user_tables ut, user_constraints uc
WHERE ut.table_name = uc.table_name(+);
The (+) after uc.table_name makes the user_constraint table optional.
The query returns all tables, and where there are no corresponding
constraint records, Oracle supplies a null in the constraint name
column.

sql, outer join

I have two tables, linked with an outer join. The relationship between the primary and secondary table is a 1 to [0..n]. The secondary table includes a timestamp column indicating when the record was added. I only want to retrieve the most recent record of the secondary table for each row in the primary. I have to use a group by on the primary table due to other tables also part of the SELECT. There's no way to use a 'having' clause though since this secondary table is not part of the group.
How can I do this without doing multiple queries?
For performance, try to touch the table least times
Option 1, OUTER APPLY
SELECT *
FROM
table1 a
OUTER APPY
(SELECT TOP 1 TimeStamp FROM table2 b
WHERE a.somekey = b.somekey ORDER BY TimeStamp DESC) x
Option 2, Aggregate
SELECT *
FROM
table1 a
LEFT JOIN
(SELECT MAX(TimeStamp) AS maxTs, somekey FROM table2
GROUP BY somekey) x ON a.somekey = x.somekey
Note: each table is mentioned once, no correlated subqueries
Something like:
SELECT a.id, b.*
FROM table1 a
INNER JOIN table2 b ON b.parentid = a.id
WHERE b.timestamp = (SELECT MAX(timestamp) FROM table2 c WHERE c.parentid = a.id)
Use LEFT JOIN instead of INNER JOIN if you want to show rows for IDs in table1 without any matches in table2.
select *
from table1 left outer join table2 a on
table1.id = a.table1_id
where
not exists (select 1 from table2 b where a.table1_id = b.table1_id and b.timestamp > a.timestamp)
The quickest way I know of is this:
SELECT
A.*,
B.SomeField
FROM
Table1 A
INNER JOIN (
SELECT
B1.A_ID,
B1.SomeField
FROM
Table2 B1
LEFT JOIN Table2 B2 ON (B1.A_ID=B2.A_ID) AND (B1.TimeStmp < B2.TimeStmp)
WHERE
B2.A_ID IS NULL
) B ON B.A_ID = A.ID