NOT EXISTS / NOT IN clause in hive - hive

I have tried to find how could I use NOT EXISTS / NOT IN clause in HIVE and have not found a solution that HIVE has to offer.
I basically need to find id's that exist in one table and not exist in the second table and I can't find a way to jump over it. Please advise

In hive you can use left join to detech not exist type clause. If you share your sql, i can be more precise. But here is some hint.
select
a.id
from
a
left outer join b on a.id = b.id
left outer join c on a.id = c.id
where
b.id is null -- make sure data doesn't exist in b
and c.id is not null -- make sure data exists in c
left join and where clause make sure data exist in table c but doesn't exist in b.

Related

Oracle join's order

I have a sql sentence.
select a.*,b.*,c.*
from a
inner join b
on a.id=b.id
left join c
on b.id=c.id
but I not konw it is execute inner join first .then create a temporary table such as temp .and finaly temp left join c. inner join ,left join and right join do they have same level of execution.Thank you!
SQL is not a procedural language. A SQL query describes the result set being produced. When interpreting a query, the join order is from left to right. So, in your query, the result set is the one produced by an inner join on a and b with that result being left joined to c.
You can add parentheses to avoid ambiguity:
from (a inner join
b
on a.id = b.id
) left join
c
on b.id = c.id
But that is unnecessary.
That is logically how the query is processed. The optimization engine can choose many different ways of executing the query, some of which might be pretty unrecognizable as relating to this particular query. The only guarantee is what the result set looks like.

MS-Access SQL : Join expression not supported

I have 2 tables A and B. I want to create a third one, C. C must contain each record that is in A but not in B, and each record that is in A and B.
I've tried the following :
SELECT A.* INTO C FROM (A INNER JOIN B ON A.Id = B.Id) LEFT JOIN B ON A.Id = B.Id WHERE B.Id IS NULL;
But it gives me the error message : JOIN expression not supported.
When there's only the INNER JOIN or the LEFT JOIN, it works perfectly. But for some reason when I combine both with the brackets, it doesn't work.
I believe I am using MS-Access 2013, if that helps.
By the way, I'm an Access and an SQL newbie.
The correct logic is:
SELECT A.* INTO C
FROM A LEFT JOIN
B
ON A.Id = B.Id
WHERE B.Id IS NULL;
You do not need two joins. My guess is that the problem with your query is that B appears twice in the FROM clause, without a table alias. MS Access doesn't know what the second B refers to.

Having problems with SQL Joins

Table A
Table B
I tried to use LEFT OUTER JOIN but it seems not working..
I want the query to extract all data from Table A with 0 as average score if there is no data yet for the specified parameter. Meaning, in Figure 3, it should have shown ID 2 with 0 on s. Can anyone help me figure out the solution?
You have the table names switched in the join. To keep all of Table A then it needs to be the table listed on the left side of the left join. Also anything that you want to only affect the output of table B, and not filter the entire results, should be moved to the left join on clause. Should be:
SELECT a.id,
Avg(Isnull(b.score, 0)) AS s
FROM a
LEFT OUTER JOIN b
ON a.id = b.id
AND b.kind = 'X'
GROUP BY a.id

Need help with a sql query that has an inner and outer join

I really need help getting this query right. I can't share actual table and column names, but will try my best to layout the problem simply.
Assume the following tables. The tables and keys CANNOT be changed. Period. I don't care if you think it's a bad design, this question isn't a design question, it's on SQL syntax.
Table A - Primary key named id1
Table B - Contains two foreign keys, TableA.id1 and Foo.id2(ignore Foo, it doesn't matter for this)
Table C - Contains two foreign keys, TableA.id1 and Foo.id2, additional interesting
columns.
Constraints:
The SQL gets a set of id1s passed in as an argument.
It must return a list of Table C rows.
It must only return Table C rows where a Table B row exists with a matching TableA.id1 and Foo.id2 - There ARE rows in Table C that don't match Table B
A row MUST be returned for every id1 passed in, even if no Table C row exists.
At first I tried a Left Outer Join from Table A to Table B then an Inner Join to Table C. That violates the 4th rule above, as the Inner Join drops out those rows.
Next I tried two Left Outer joins. This is closer, but has the side effect of including rows that match the Table A join to Table B, but don't have a corresponding Table C entry, which isn't what I want.
So, here's what I came up with.
SELECT
a.id1,
c.*
FROM
TableB b
INNER JOIN
TableC c USING (id1,id2)
RIGHT OUTER JOIN
TableA a USING (id1)
WHERE
a.id1 in (x,y,z)
I'm a bit wary of a Right Outer Join, as the documentation I've read says it can be replaced with a Left Outer, but it doesn't appear so for this case. It also seems a bit rare, which is making other devs nervous, so I'm being cautious.
So, three questions in one.
Is this correct?
Did I use the Right Outer Join correctly?
Is there a cleaner way to achieve the same thing?
EDIT: DB is MySQL
You can rewrite it as a LEFT OUTER JOIN by using parentheses. In pseudo-SQL change this:
SELECT ...
FROM b
INNER JOIN c ON ...
RIGHT OUTER JOIN a ON ...
to this:
SELECT ...
FROM a
LEFT OUTER JOIN (
b INNER JOIN c ON ...
) ON ...
You can use an EXISTS clause, which sometimes works better
SELECT
a.id1,
c.*
FROM TableA a
LEFT JOIN TableC c
ON c.id1 = a.id1 AND EXISTS (
select *
from TableB b
where b.id1=c.id1 and b.id2=c.id2)
WHERE
a.id1 in (x,y,z)
As you have written it, it works because ANSI JOINs are always processed top to bottom. Since you need to test B against C before joining to A, it is about the only way to write it without introducing a subquery [(B x C) RIGHT JOIN A]. However, a bad query plan could perform all records in B and C (B x C) before right joining to A.
The EXISTS method efficiently uses the filter on A, then LEFT JOINs to C and for each C found, validates that it also exists in B (or discards).
Q's
Yes your query is correct
Yes
EXISTS should work better
Yeah, you need to start with TableA and then add tables B and C using joins. The only reason you even need TableA is to make sure you have a row for each parameter.
Select a.id1,c.*
From
TableA a
Left Join TableB b on a.id1=b.id1
Left Join TableC c on b.id1=c.id1 and b.id2=c.id2
Where a.id1 in (x,y,z)
You need to do OUTER joins all the way across, or rows that are missing in B will also cause data from A to be filtered out of the result set. By joining C to B (instead of directly to A) you are using B to filter. You could do it with a complicated EXISTS clause, but this is cleaner.

Meaning of (+) in SQL queries

I've come across some SQL queries in Oracle that contain '(+)' and I have no idea what that means. Can someone explain its purpose or provide some examples of its use?
Thanks
It's Oracle's synonym for OUTER JOIN.
SELECT *
FROM a, b
WHERE b.id(+) = a.id
gives same result as
SELECT *
FROM a
LEFT OUTER JOIN b
ON b.id = a.id
The + is a short cut for OUTER JOIN, depending on which side you put it on, it indicates a LEFT or RIGHT OUTER JOIN
Check the second entry in this forum post for some examples
You use this to assure that the table you're joining doesn't reduce the amount of records returned. So it's handy when you're joining to a table that may not have a record for every key you're joining on.
For example, if you were joining a Customer and Purchase table:
To list all customers and all their purchases, do an outer join (+) on the Purchase table so customers that haven't purchased anything still show up in your report.
IIRC, the + is used in older versions of Oracle to indicate an outer join in the pre-ANSI SQL join syntax. In other words:
select foo,bar
from a, b
where a.id = b.id+
is the equivalent of
select foo,bar
from a left outer join b
on a.id = b.id
NOTE: this may be backwards/slightly incorrect, as I've never used the pre-ANSI SQL syntax.