Impala raise " AnalysisException: Syntax error" when using ROW_NUMBER() OVER - sql

I have a query like this:
SELECT MONTH_ID, 'Total' AS cola, colb
FROM
(
SELECT A.*, ROW_NUMBER()OVER(PARTITION BY MONTH_ID,col3 ORDER BY col4 DESC) AS ROWN
FROM
(
SELECT A.*, B.col3
FROM table1 A
LEFT JOIN table2 B
ON A.col1 = B.col1
) A
)
WHERE ROWN=1
GROUP BY MONTH_ID
If I create a intermediate table with the subqueries this query can work. But when I run entire thing Impala will raise: "AnalysisException: Syntax error in line 12:undefined: WHERE ROWN = 1 ^ Encountered: WHERE Expected: AS, DEFAULT, IDENTIFIER CAUSED BY: Exception: Syntax error"
I tried run this in Hive, different error shows: "Error while compiling statement: FAILED: ParseException line 20:4 cannot recognize input near 'WHERE' 'ROWN' '=' in subquery source"
Then I tried same query in oracle, it works...
Could anyone explain why this is happening and how to solve this?
Thank you for your help ;)

Subquery should have some alias like this (see comment in the code):
SELECT MONTH_ID, 'Total' AS cola, colb
FROM
(
SELECT A.*, ROW_NUMBER()OVER(PARTITION BY MONTH_ID,col3 ORDER BY col4 DESC) AS ROWN
FROM
(
SELECT A.*, B.col3
FROM table1 A
LEFT JOIN table2 B
ON A.col1 = B.col1
) A
) B ----------------------------Alias is a must --------------
WHERE ROWN=1
GROUP BY MONTH_ID, colb -----All columns which are not aggregated and not constants should be in GROUP BY

Related

Hive Subquery in SELECT

I have a query like
SELECT name, salary/ (SELECT max(money) from table_sal) FROM table_a;
I get an error saying
Unsupported SubQuery Expression Invalid subquery. Subquery in SELECT could only be top-level expression
Is there a way to resolve this?
Does this work with a CROSS JOIN?
SELECT name, salary / s.max_money
FROM table_a CROSS JOIN
(SELECT max(money) as max_money from table_sal) s
You can also do this as below, please let me know if it works for you.
Select t1.name
, t1.salary/T2.max_money
from
(SELECT name
, salary, 1 as dummy
from table_a ) t1
Join
(SELECT max(money) as max_money
, 1 as dummy
from table_sal) t2
on t1.dummy = t2.dummy ;

UPDATE statement in SQLite with nested FROM and JOIN

I currently have an SQL statment:
UPDATE table_1 SET
property_1=b.value_1,
property_2=b.value_2,
property_3=b.value_3
FROM (
SELECT a.property_4, a.property_5, b.value_2, b.value_3
FROM (
SELECT id1 AS property_4, MAX(id2) AS property_5
FROM table_2
WHERE
id1 IN (...) AND
id2 NOT IN (...)
) a
JOIN table_3 b ON
a.property_5 = b.id
) a
WHERE
table_1.id = a.property_4
which works fine on our production postgresql db, however the syntax for UPDATE is different in SQLite (what we use in test) and I am finding my self quite stuck as to how to convert it. The error I receive is Error: syntax error near FROM. If anyone is a SQLite whiz I would greatly appreciate some guidance.
Since SQLite doesn't support UPDATE with JOIN/FROM clause. You can use CTE & SubQuery to do it alternatively:
WITH cte AS (
SELECT a.property_4, b.value_1, b.value_2, b.value_3
FROM (
SELECT id1 AS property_4, MAX(id2) AS property_5
FROM table_2
WHERE
id1 IN (...) AND
id2 NOT IN (...)
) a
JOIN table_3 b ON
a.property_5 = b.id
)
UPDATE table_1 SET
property_1=(select value_1 from cte where cte.property_4 = id)
property_2=(select value_2 from cte where cte.property_4 = id)
property_3=(select value_3 from cte where cte.property_4 = id)
WHERE
id IN (select property_4 from cte)

Written a subquery that can return more than one field without using the Exists

The query below is supposed to pull records for fields with the max date.
I am getting an error
You have written a subquery that can return more than one field without using EXISTS reserved word in the Main query's FROM clause. Revise the SELECT statement of the subquery to request only one column.
Code:
SELECT *
FROM TableName
WHERE (((([Project_Name], [Date])) IN (SELECT Project_Name, MAX(Date)
FROM TableName
GROUP BY Project)));
Your probably thinking of a nested subquery used as a table, like the below:
select a.*, b.1, b.2
from FirstTable A
join (Select Id, firstcolumn as 1, secondcolumn as 2
from SecondTable) B on b.ID = a.ID
Works pretty much like a regular join except you are using a subquery. Hope that helps,
SELECT A.*
FROM TableName A
INNER JOIN (select Project_Name, max(Date) MaxDate
from TableName
group by Project) B
ON A.[Project_Name] = B.[Project_Name]
AND A.[Date] = B.MaxDate
A version using EXISTS() looks like this:
SELECT *
FROM TableName AS A
WHERE EXISTS(
SELECT * FROM (
SELECT B.Project_Name, MAX( B.Date ) AS MaxDate
FROM TableName AS B
GROUP BY B.Project_Name ) AS C
WHERE C.Project_Name = A.Project_Name AND C.MaxDate = A.Date
);
Although I have the feeling this will have poorer performance than a JOIN because the GROUP BY statement might have to be executed for each record and each call to the EXISTS() function...

ParseException - mismatched input in subquery source - error in Hive

I am running the next query in Hive:
SELECT COUNT(*)
FROM
(
SELECT *
FROM
(SELECT id, COUNT(*) AS count_p_id FROM palladion GROUP BY id) a,
(SELECT cid, COUNT(*) AS count_q_cid FROM operations GROUP BY cid) b
WHERE a.id=b.cid
)
WHERE count_p_id < count_q_cid;
I keep getting the error like
ParseException line 1:103 mismatched input ',' expecting ) near 'a' in subquery source
What is the problem with the code? I can't see any.
Implicit join notation is supported starting with Hive 0.13.0. This allows the FROM clause to join a comma-separated list of tables, omitting the JOIN keyword. For example:
SELECT *
FROM table1 t1, table2 t2
WHERE t1.id = t2.id
I hope you are using < 0.13.0 version . If your hive version is < 0.13.0
Try this : you have to use JOIN - ON , not Comma - WHERE
SELECT COUNT(*)
FROM
(
SELECT *
FROM
(SELECT id, COUNT(*) AS count_p_id FROM palladion GROUP BY id) a JOIN
(SELECT cid, COUNT(*) AS count_q_cid FROM operations GROUP BY cid) b
ON a.id=b.cid
)
WHERE count_p_id < count_q_cid;

"with... as" in SQL Navigator

The following query works:
select count(*) from everything where num not in (select num from sometable)
The following query is supposed to be equivalent to the above, but results in an "invalid identifier" error:
with unwanted as (select num from sometable)
select count(*) from everything where num not in unwanted
What is wrong with the second query?
the syntax is like this:
with unwanted as (select num from sometable)
select count(*) from everything where num not in (select * from unwanted)
obviously this makes only sense if the select num from sometable part is a bit more complex or used several times later...
You can also join the tables for faster performance
WITH unwanted
AS
(
SELECT num
FROM sometable
)
SELECT COUNT(*)
FROM everything a
LEFT JOIN unwanted b
ON a.num = b.num
WHERE b.num IS NULL