ParseException - mismatched input in subquery source - error in Hive - sql

I am running the next query in Hive:
SELECT COUNT(*)
FROM
(
SELECT *
FROM
(SELECT id, COUNT(*) AS count_p_id FROM palladion GROUP BY id) a,
(SELECT cid, COUNT(*) AS count_q_cid FROM operations GROUP BY cid) b
WHERE a.id=b.cid
)
WHERE count_p_id < count_q_cid;
I keep getting the error like
ParseException line 1:103 mismatched input ',' expecting ) near 'a' in subquery source
What is the problem with the code? I can't see any.

Implicit join notation is supported starting with Hive 0.13.0. This allows the FROM clause to join a comma-separated list of tables, omitting the JOIN keyword. For example:
SELECT *
FROM table1 t1, table2 t2
WHERE t1.id = t2.id
I hope you are using < 0.13.0 version . If your hive version is < 0.13.0
Try this : you have to use JOIN - ON , not Comma - WHERE
SELECT COUNT(*)
FROM
(
SELECT *
FROM
(SELECT id, COUNT(*) AS count_p_id FROM palladion GROUP BY id) a JOIN
(SELECT cid, COUNT(*) AS count_q_cid FROM operations GROUP BY cid) b
ON a.id=b.cid
)
WHERE count_p_id < count_q_cid;

Related

multiple subquery error "invalid identifier"

Im working on this code but i keep getting invalid identifer for t2.nabp_num
`with t1 as (query1),
t2 as (query2)
Select t1.*, t2.device_count,
d.* from t1
inner join t2 on
t1.nabp_num = t2.nabp_num and
t1.dt = t2.dt and
t1.d_member = t2.d_member
inner join drug_product d on
t1.d_product_id = d.product_id
order by claim_count desc;`
i get invalid identifier
As already commented, you can't reference something that doesn't exist.
It would certainly help if you posted actual query instead of invalid
with t1 as (query1 --> what is "query1"?
Order by ...
Anyway: as NABP_NUM is referenced here:
inner join t2 on t1.nabp_num = t2.nabp_num
it means that it has to be part of both t1 and t2. However, as t2 CTEs result is derived from t1, maybe you don't need t1 at all ...
If you add all columns that are currently missing in either select column list or group by clause, query would look like this (see comments within code):
WITH
t1
AS
(SELECT d_member_id,
dt,
device_type,
claim_id,
nabp_num, --> add NABP_NUM
d_member_hq_id --> add D_MEMBER_HQ_ID
d_drug_product_id --> add D_DRUG_PRODUCT_ID
FROM some_table --> which table?
), --> remove ORDER BY, it is useless here
t2
AS
( SELECT d_member_id,
dt,
nabp_num, --> add NABP_NUM
d_member_hq_id, --> add D_MEMBER_HQ_ID
COUNT (DISTINCT device_type) AS device_count,
COUNT (DISTINCT claim_ID) AS claim_count
FROM t1
GROUP BY d_member_id, dt, nabp_num, d_member_hq_id) --> add NABP_NUM and D_MEMBER_HQ_ID
SELECT t1.*, t2.device_count, d.*
FROM t1
INNER JOIN t2
ON t1.nabp_num = t2.nabp_num
AND t1.dt = t2.dt
AND t1.d_member_hq_id = t2.d_member_hq_id
INNER JOIN vmd_drug_product d
ON t1.d_drug_product_id = d.d_drug_product_id
ORDER BY t2.claim_count DESC;
Even though this shouldn't return any syntax errors any more (presuming columns used here really exist in some_table), I can't tell whether this will - or will not - return desired result.

Impala raise " AnalysisException: Syntax error" when using ROW_NUMBER() OVER

I have a query like this:
SELECT MONTH_ID, 'Total' AS cola, colb
FROM
(
SELECT A.*, ROW_NUMBER()OVER(PARTITION BY MONTH_ID,col3 ORDER BY col4 DESC) AS ROWN
FROM
(
SELECT A.*, B.col3
FROM table1 A
LEFT JOIN table2 B
ON A.col1 = B.col1
) A
)
WHERE ROWN=1
GROUP BY MONTH_ID
If I create a intermediate table with the subqueries this query can work. But when I run entire thing Impala will raise: "AnalysisException: Syntax error in line 12:undefined: WHERE ROWN = 1 ^ Encountered: WHERE Expected: AS, DEFAULT, IDENTIFIER CAUSED BY: Exception: Syntax error"
I tried run this in Hive, different error shows: "Error while compiling statement: FAILED: ParseException line 20:4 cannot recognize input near 'WHERE' 'ROWN' '=' in subquery source"
Then I tried same query in oracle, it works...
Could anyone explain why this is happening and how to solve this?
Thank you for your help ;)
Subquery should have some alias like this (see comment in the code):
SELECT MONTH_ID, 'Total' AS cola, colb
FROM
(
SELECT A.*, ROW_NUMBER()OVER(PARTITION BY MONTH_ID,col3 ORDER BY col4 DESC) AS ROWN
FROM
(
SELECT A.*, B.col3
FROM table1 A
LEFT JOIN table2 B
ON A.col1 = B.col1
) A
) B ----------------------------Alias is a must --------------
WHERE ROWN=1
GROUP BY MONTH_ID, colb -----All columns which are not aggregated and not constants should be in GROUP BY

Hive Subquery in SELECT

I have a query like
SELECT name, salary/ (SELECT max(money) from table_sal) FROM table_a;
I get an error saying
Unsupported SubQuery Expression Invalid subquery. Subquery in SELECT could only be top-level expression
Is there a way to resolve this?
Does this work with a CROSS JOIN?
SELECT name, salary / s.max_money
FROM table_a CROSS JOIN
(SELECT max(money) as max_money from table_sal) s
You can also do this as below, please let me know if it works for you.
Select t1.name
, t1.salary/T2.max_money
from
(SELECT name
, salary, 1 as dummy
from table_a ) t1
Join
(SELECT max(money) as max_money
, 1 as dummy
from table_sal) t2
on t1.dummy = t2.dummy ;

SQL: Unable to use 'with as' to save selected result

I have an inner join result that I want to save it by using with as but received an error. I'm using snowflake.
My code:
with t as (select *
from
(select ID, PRICE from DB.TABLE1
WHERE PRICE IS NOT NULL and ID = '1111') A
inner join
(select ID, BID, ACCEPTED from DB.TABLE2
WHERE BID IS NOT NULL and ID = '1111') B
ON A.ID = B.ID);
Error: SQL compilation error: syntax error line 8 at position 25 unexpected ';'.
If I only run the inner join
select *
from
(select ID, PRICE from DB.TABLE1
WHERE PRICE IS NOT NULL and ID = '1111') A
inner join
(select ID, BID, ACCEPTED from DB.TABLE2
WHERE BID IS NOT NULL and ID = '1111') B
ON A.ID = B.ID
I got this result
ID, PRICE,ID,BIDS,ACCEPTED
1111,180,1111,200,FALSE
1111,180,1111,180,FALSE
1111,180,1111,180,FALSE
1111,180,1111,100,TRUE
Any idea why I got the error message?
You use with to essentially create an alias (called a common table expression) for the query that can then be used in that specific query. All you've done is create the alias without using it. You need something like:
with t as (select *
from
(select ID, PRICE from DB.TABLE1
WHERE PRICE IS NOT NULL and ID = '1111') A
inner join
(select ID, BID, ACCEPTED from DB.TABLE2
WHERE BID IS NOT NULL and ID = '1111') B
ON A.ID = B.ID)
select * from t
Although obviously you'd usually do more complex work than that or else you'd just write the base query without using with
WITH is syntax used to introduced a common table expression. This is an expression used within a single query. It is a lot like a subquery in the FROM clause, except that it can be referenced more than once.
So a correct usage would be:
with t as (
select . . .
)
select count(*)
from t;
In other words, you need to follow the with with something that uses the CTE. Otherwise, you want to store the results in a real table -- temporary or otherwise.
To use CTE, join should be made after creating the tables.
with t as
(select ID, PRICE from DB.TABLE1
WHERE PRICE IS NOT NULL and ID = '1111') ,
t1 as
(select ID, BID, ACCEPTED from DB.TABLE2
WHERE BID IS NOT NULL and ID = '1111')
select *
from t
inner join
t1
on t.ID = t1.ID;

BigQuery Subtract Counts of Two Tables?

In MySQL I can do SELECT (SELECT COUNT(*) FROM table1) - (SELECT COUNT(*) FROM table2) to get the difference in counts between two tables. When I try this in BigQuery, I get: Subselect not allowed in SELECT clause. How do I run a query like this in BigQuery?
2019 update:
The original question syntax is now supported with #standardSQL
SELECT (SELECT COUNT(*) c FROM `publicdata.samples.natality`)
- (SELECT COUNT(*) c FROM `publicdata.samples.shakespeare`)
As subselects are not supported inside the SELECT clause, I would use a CROSS JOIN for this specific query:
SELECT a.c - b.c
FROM
(SELECT COUNT(*) c FROM [publicdata:samples.natality]) a
CROSS JOIN
(SELECT COUNT(*) c FROM [publicdata:samples.shakespeare]) b