How to avoid cartesian product with NOT IN condition? - hive

My query looks like:
SELECT * FROM a WHERE a.col NOT IN (SELECT col FROM B)
When I execute the query I get the following error:
FAILED: SemanticException [Error 10052]: In strict mode, cartesian product is not allowed. If you really want to perform the operation, set hive.mapred.mode=nonstrict
Where is the cartesian product in my query and how can I avoid this error?

you can bypass not in completely and use 'except'
SELECT * FROM a WHERE a.col except (SELECT * FROM a WHERE a.col IN (SELECT col FROM B))

you can achieve your task using left join
select a.* from a left outer join b on a.col = b.col where.col is NULL

Related

Standard SQL: LEFT JOIN by two conditions using BETWEEN

I have the following query in BigQuery:
#Standard SQL
SELECT *
FROM `Table_1`
LEFT JOIN `Table_2` ON (timestamp BETWEEN TimeStampStart AND TimeStampEnd)
But I get the following Error:
Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
If I use JOIN instead of LEFT JOIN, it works, but I want to keep all the rows from Table_1 (so also the ones which aren't matched to Table_2)
How to achieve this?
This is absolutely stupid... but the same query will work if you add a condition that matches a column from table1 with a column from table2:
WITH Table_1 AS (
SELECT CAST('2018-08-15' AS DATE) AS Timestamp, 'Foo' AS Foo
UNION ALL
SELECT CAST('2018-09-15' AS DATE), 'Foo'
), Table_2 AS (
SELECT CAST('2018-08-14' AS DATE) AS TimeStampStart, CAST('2018-08-16' AS DATE) AS TimeStampEnd, 'Foo' AS Bar
)
SELECT *
FROM Table_1
LEFT JOIN Table_2 ON Table_1.Foo = Table_2.Bar AND Table_1.Timestamp BETWEEN Table_2.TimeStampStart AND Table_2.TimeStampEnd
See if you have additional matching criteria that you can use (like another column that links table1 and table2 on equality).
A LEFT JOIN is always equivalent to the UNION of :
the INNER JOIN between the same two arguments on the same join predicate, and
the set of rows from the first argument for which no matching row is found (and properly extended with null values for all columns retained from the second argument)
That latter portion can be written as
SELECT T1.*, null as T2_C1, null as T2_C2, ...
FROM T1
WHERE NOT EXISTS (SELECT * FROM T2 WHERE )
So if you spell out the UNION you should be able to get there.
Interesting. This works for me in standard SQL:
select *
from (select 1 as x) a left join
(select 2 as a, 3 as b) b
on a.x between b.a and b.b
I suspect you are using legacy SQL. Such switch to standard SQL. (And drop the parentheses after the between.)
The problem is:
#(Standard SQL)#
This doesn't do anything. Use:
#StandardSQL
Hi as per the documentation, "(" has a special meaning, so please try without the brackets.
SELECT * FROM Table_1
LEFT JOIN Table_2 ON Table_1.timestamp >= Table_2.TimeStampStart AND Table_1.timestamp <= Table_2.TimeStampEnd
Documentation here

Is it possible to JOIN with a CTE in PostgreSQL?

I'm trying to write a query like this
WITH a AS (SELECT key FROM table)
SELECT *
FROM a
JOIN b;
which generates a syntax error in PostgreSQL 10.4.
Why does this error?
It looks like I will be creating a view instead. Is there a better solution?
You are missing the JOIN condition:
WITH a AS (SELECT key FROM table)
SELECT *
FROM a
JOIN b ON a.key = b.key;
The problem is not the CTE, it is a simple syntax error:
SELECT *
FROM a
JOIN b
-- something missing here
Here, JOIN defaults to an INNER JOIN, which requires some condition for which rows should be joined - generally either like ON a.key = b.key or USING key. The same would be true of a LEFT OUTER JOIN or RIGHT OUTER JOIN.
If you wanted all the possible combinations (rare, but occasionally useful), you would use CROSS JOIN:
SELECT *
FROM a
CROSS JOIN b;
Or the similar comma operator:
SELECT *
FROM a, b;

Main query results in Subquery

How can we use reference of main query result set as a source table in subquery
Table A, Table C
Select
(Select * From a)
From
(Select tabA.*
From A tabA
Join C tabC
On tabA.id = tabC.id) as a
I got invalid object a error here
Presumably, you want a common table expression (CTE):
with a as (
select tabA.*
from A tabA Join
C tabC
on tabA.id = tabC.id
)
Select (Select * from a)
From a;
That said, your query makes no sense. The scalar subquery is probably going to be returning an error, either because of the number of rows or number of columns.
if you using sqlserver than modified your query based on below query.
select * from
(select A.* from TableA A inner join TableB B on A.EmployeeID = B.EmployeeID ) a

SQL JOIN with OPENQUERY

How do I correctly join a query with a open query?
Here is a how my query is layed out right now. The query that is part of the OPENQUERY works by itself.
Select d.* from db.dbo.table d
left join (select * from OPENQUERY(otherSource,'
--working query
SELECT...
left join...
inner join..') OQ
ON d.col1 = OQ.col1
I am catching the error 'Incorrect syntax near 'ON'.
This syntax worked for me:
select
a.id, b.ItemId, a.Name, b.[Description]
from
[A_Database]..tblA a
inner join
openquery([linkedServerDbName], 'select * from [B_Database]..[TableToJoin]') b
ON
a.id = b.ItemId
You may have to reverse it, do your SELECT from the OPENQUERY. So something like:
SELECT * FROM OPENQUERY(remotesource,'SELECT blahblah from tableA) A
RIGHT JOIN tableB B ON B.col1 = A.col1

How to overcome the error on join union tables with other table

While we try to join between union tables on one side with other table on the other side,
SELECT A.x,B.y FROM ([DataSet.Liad],[DataSet.Livne]) AS A INNER JOIN [DataSet.Names] AS B ON A.ID = B.ID LIMIT 10
we get this error:
Error: 2.1 - 0.0: JOIN cannot be applied directly to a table union or to a table wildcard function. Consider wrapping the table union or table wildcard function in a subquery (e.g., SELECT *).
In order to solve this error I suggest you to use a View.
Save this Query of union as a View, DataSet.LiadLivne:
SELECT * FROM [DataSet.Liad],[DataSet.Livne]
Execute the origin query using the view:
SELECT A.x,B.y FROM [DataSet.LiadLivne] AS A INNER JOIN [DataSet.Names] AS B ON A.ID = B.ID LIMIT 10
Enjoy
You need to write as:
SELECT A.x,
B.y
FROM
(SELECT A.x
FROM ([DataSet.Liad],[DataSet.Livne])) AS A
INNER JOIN [DataSet.Names] AS B ON A.ID = B.ID LIMIT 10