How to use result of LATERAL VIEW OUTER EXPLODE in a subquery?

How to use result of LATERAL VIEW OUTER EXPLODE in a subquery? - hive

I am trying to make a query like this:
select * from tbl
LATERAL VIEW OUTER explode(column) temp_tbl as the_col
WHERE (the_col IN (select column from tbl2))
and it gives this error:
Unsupported SubQuery Expression: Correlating expression cannot contain
unqualified column references
I looked at this answer and changed the query to:
select * from tbl
LATERAL VIEW OUTER explode(column) temp_tbl as the_col
WHERE (tbl.the_col IN (select column from tbl2))
and now I get this error:
FAILED: SemanticException Line XX:XX Invalid column reference
'the_col' in definition of SubQuery sq_1
What's going on here and how to fix this?

try this,
SELECT * FROM (
SELECT * FROM tbl
LATERAL VIEW OUTER explode(column) temp_tbl as the_col ) a
WHERE a.the_col IN (select column from tbl2);

Related

SemanticException Can not find <TABLE-NAME> in genColumnStatsTask (state=42000,code=40000)

I'm getting the exception for SQL query on the hive
create table temp as
select t.type
from temp1 LATERAL VIEW posexplode(c1.array_of_struct_field1) dummy as alias1, t
union all
select t.type
from temp1 LATERAL VIEW posexplode(c1.array_of_struct_field2) dummy as alias1, t;
Below I'm adding the exception for the query.
Error: Error while compiling statement: FAILED: SemanticException Can
not find database1.temp in genColumnStatsTask
(state=42000,code=40000)

Use the union all query as below way.
create table temp as
select t.type from
(select t.type
from temp1 LATERAL VIEW posexplode(c1.array_of_struct_field1) dummy as alias1, t
union all
select t.type
from temp1 LATERAL VIEW posexplode(c1.array_of_struct_field2) dummy as alias1, t ) a;

try this one in your code
set hive.stats.column.autogather=false;

Why select invalid field in subquery could run in BigQuery?

For the following sql
CREATE or replace TABLE
temp.t1 ( a STRING)
;
insert into temp.t1 values ('val_a');
CREATE or replace TABLE
temp.t2 (b STRING)
;
insert into temp.t2 values ('val_b');
create or replace table `temp.a1` as
select distinct b
from temp.t2
;
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`)
;
Since there is no a in temp.a1 and there should be an error here, However, the output of Bigquery is
Row a
1 val_a
Why the result happened?
On the other side, when run select distinct a from temp.a1; there is one error Unrecognized name: a comes up.

Your query is:
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`);
You think this should be:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct a1.a from `temp.a1` a1);
And hence generate an error. However, the rules of SQL interpret this as:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct t1.a from `temp.a1` a1);
Because the scoping rules say that if a is not found in the subquery then look for it in the outer query.
That is the definition of SQL.
The solution? Always qualify column references. Qualify means to include the table alias in the reference.
Also note that select distinct is meaningless in the subquery for an in, because in does not create duplicates. You should get rid of the distinct in the subquery.

Hive Subquery in SELECT

I have a query like
SELECT name, salary/ (SELECT max(money) from table_sal) FROM table_a;
I get an error saying
Unsupported SubQuery Expression Invalid subquery. Subquery in SELECT could only be top-level expression
Is there a way to resolve this?

Does this work with a CROSS JOIN?
SELECT name, salary / s.max_money
FROM table_a CROSS JOIN
(SELECT max(money) as max_money from table_sal) s

You can also do this as below, please let me know if it works for you.
Select t1.name
, t1.salary/T2.max_money
from
(SELECT name
, salary, 1 as dummy
from table_a ) t1
Join
(SELECT max(money) as max_money
, 1 as dummy
from table_sal) t2
on t1.dummy = t2.dummy ;

Nested SELECT with a WHERE clause in Spark

I have a problem with running a Spark SQL query which uses a nested select with a "where in" clause. In the query below table1 represents a temporary table which comes from a more complicated query. In the end I want to substitute table1 with this query.
select * from (select * from table1) as table2
where (product, price)
in (select product, min(price) from table2 group by product)
The Spark error I get says:
AnalysisException: 'Table or view not found: table2;
How could I possibly change the query to make it work as intended?

subquery (i.e. (select * from table1) as table2 ) is not needed & it is limited to immediate use after subquery defined you can't use with in or where clause, you can use correlated subquery instead :
select t1.*
from table1 t1
where t1.price = (select min(t2.price) from table1 t2 where t2.product = t1.product);

Standard SQL: LEFT JOIN by two conditions using BETWEEN

I have the following query in BigQuery:
#Standard SQL
SELECT *
FROM `Table_1`
LEFT JOIN `Table_2` ON (timestamp BETWEEN TimeStampStart AND TimeStampEnd)
But I get the following Error:
Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
If I use JOIN instead of LEFT JOIN, it works, but I want to keep all the rows from Table_1 (so also the ones which aren't matched to Table_2)
How to achieve this?

This is absolutely stupid... but the same query will work if you add a condition that matches a column from table1 with a column from table2:
WITH Table_1 AS (
SELECT CAST('2018-08-15' AS DATE) AS Timestamp, 'Foo' AS Foo
UNION ALL
SELECT CAST('2018-09-15' AS DATE), 'Foo'
), Table_2 AS (
SELECT CAST('2018-08-14' AS DATE) AS TimeStampStart, CAST('2018-08-16' AS DATE) AS TimeStampEnd, 'Foo' AS Bar
)
SELECT *
FROM Table_1
LEFT JOIN Table_2 ON Table_1.Foo = Table_2.Bar AND Table_1.Timestamp BETWEEN Table_2.TimeStampStart AND Table_2.TimeStampEnd
See if you have additional matching criteria that you can use (like another column that links table1 and table2 on equality).

A LEFT JOIN is always equivalent to the UNION of :
the INNER JOIN between the same two arguments on the same join predicate, and
the set of rows from the first argument for which no matching row is found (and properly extended with null values for all columns retained from the second argument)
That latter portion can be written as
SELECT T1.*, null as T2_C1, null as T2_C2, ...
FROM T1
WHERE NOT EXISTS (SELECT * FROM T2 WHERE )
So if you spell out the UNION you should be able to get there.

Interesting. This works for me in standard SQL:
select *
from (select 1 as x) a left join
(select 2 as a, 3 as b) b
on a.x between b.a and b.b
I suspect you are using legacy SQL. Such switch to standard SQL. (And drop the parentheses after the between.)
The problem is:
#(Standard SQL)#
This doesn't do anything. Use:
#StandardSQL

Hi as per the documentation, "(" has a special meaning, so please try without the brackets.
SELECT * FROM Table_1
LEFT JOIN Table_2 ON Table_1.timestamp >= Table_2.TimeStampStart AND Table_1.timestamp <= Table_2.TimeStampEnd
Documentation here

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use result of LATERAL VIEW OUTER EXPLODE in a subquery? - hive

try this, SELECT * FROM ( SELECT * FROM tbl LATERAL VIEW OUTER explode(column) temp_tbl as the_col ) a WHERE a.the_col IN (select column from tbl2);

Related

SemanticException Can not find <TABLE-NAME> in genColumnStatsTask (state=42000,code=40000)

Why select invalid field in subquery could run in BigQuery?

Hive Subquery in SELECT

Nested SELECT with a WHERE clause in Spark

Standard SQL: LEFT JOIN by two conditions using BETWEEN

Categories

Resources