I want the latest records from HIVE table using the following query-
WITH lot as (select *
from to_burn_in as a where a.rel_lot='${Rel_Lot}')
select a.* from lot AS a
where not exists (select 1 from lot as b
where a.Rel_Lot=b.Rel_Lot and a.SerialNum=b.SerialNum and a.Test_Stage=b.Test_Stage
and cast(a.test_datetime as TIMESTAMP) < cast(b.Test_Datetime as TIMESTAMP))
order by a.SerialNum
this query is throwing a error as
Error while compiling statement: FAILED: SemanticException line 0:undefined:-1 Unsupported SubQuery Expression 'Test_Datetime': SubQuery expression refers to both Parent and SubQuery expressions and is not a valid join condition.
I have tried running with equal operator in place of the less than operator in subquery and it is running fine. I read the HIVE documentation as given in
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
and couldn't figure out why it is throwing a error as 'where' subquery is supported.
What might be the problem here?
EXISTS works the same as a join actually. Not equality join conditions are not supported in Hive prior Hive 2.2.0 (see HIVE-15211, HIVE-15251)
It seems you are trying to get records having latest timestamp per Rel_Lot,SerialNum,Test_Stage. Your query can be rewritten using dense_rank() or rank() function:
WITH lot as (select *
from to_burn_in as a where a.rel_lot='${Rel_Lot}'
)
select * from
(
select a.*,
dense_rank() over(partition by Rel_Lot,SerialNum,Test_Stage order by cast(a.test_datetime as TIMESTAMP) desc) as rnk
from lot AS a
)s
where rnk=1
order by s.SerialNum
Related
I want a set of random data from hive, for example row_number between 772001 and 773000.
My sql is as below:
select * from (
select *, row_number() over (order by `name`) as row_dsa
from `jck_bonc_demo`.`frjc_jbxx`
)tmp_table where row_dsa between 772001 and 773000
and I get the following error:
[Cloudera][Hardy] (80) Syntax or semantic analysis error thrown in
server while executing query. Error message from server: Error while
compiling statement: FAILED: SemanticException Failed to breakup
Windowing invocations into Groups. At least 1 group must only depend
on input columns. Also check for circular dependencies.
What can I do for this error, anyone can help?
I think this is the syntax you want:
select *
from (select *, row_number() over (order by `name`) as row_dsa
from `jck_bonc_demo`.`frjc_jbxx`
) x
where row_dsa between 772001 and 773000;
You need a subquery to use row_dsa in a where clause.
Use select s.*, ... (with table alias) if you want to select all from table plus one more calculated column, not select *. Also, no need to back-quote non-reserved words:
select *
from (select s.*, row_number() over (order by name) as row_dsa
from jck_bonc_demo.frjc_jbxx s
) x
where row_dsa between 772001 and 773000;
there is a bug in my program,name is not the col of the specified table,the error message is weird. Tanks for your answer #Gordon Linoff #eftjoin
I need to query all columns in a table of all customers, the main factor being the latest version for each customer.
My table:
My Query:
SELECT DISTINCT ON(code)
code,
namefile,
versioncol,
status
FROM table_A
ORDER BY versioncol desc
Error:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON(code)
Postgres' error message is trying to tell you what to do:
DISTINCT ON expressions must match initial ORDER BY expressions
Actually that's quite clear: to make your code a valid DISTINCT ON query, you just need to add code (that's the DISTINCT ON expression) as a first sorting criteria to the query (ie as initial ORDER BY).
SELECT DISTINCT ON(code) a.*
FROM table_A a
ORDER BY code, versioncol DESC
Edit: I am using Apache Hive (version 3.1.0.3.1.5.0-152)
When I run the following query:
insert into delta_table (select * from batch_table where loaddate=(select max(loaddate) from batch_table));
I get this error:
Unsupported SubQuery Expression 'loaddate': Only SubQuery expressions
that are top level conjuncts are allowed
We have a table that is written to in daily batches with the column loaddate that is unique for each batch. The purpose of the query is to get all the records from the most recent batch without knowing what it's load date is.
I suspect the issue is because I am using a subquery inside a subquery. Is there a way to change this query to do the same thing, but without the last subquery?
Depends on which version of hive you have , but you can use the Clause with to avoid the second subquery
with max_load as ( select max(loaddate) as loaddate from batch_table)
insert into delta_table
(select * from batch_table a where a.loaddate=max_load.loaddate);
It looks like the error was because the table was created incorrectly and for some reason this caused the query to fail. I recreated the table and it now works
Analytic function + filter will be more efficient than self-join or subquery with one more table scan to find max date:
insert into delta_table
select col1, col2, ... coln --list columns here
from
(
select t.*, rank() over(order by loaddate desc) rnk
from batch_table t
)s
where rnk=1;
I am receiving a syntax error when running the following code
42601: syntax error at or near "."
I think it has something to do with the alias but I cannot see where the problem is.
SELECT * FROM (
SELECT
m.shipment_id
m.route_id,
m.leg_sequence_id,
m.leg_warehouse_id,
m.leg_ship_method,
row_number() over (partition by m.route_id order by m.leg_sequence_id desc) as rn
FROM posimorders.sc_execution_eu.o_detailed_routes_v2 m
)
WHERE rn=1
LIMIT 100;
Your code includes:
SELECT * FROM (
SELECT
m.shipment_id
m.route_id,
...
You are missing a comma after m.shipment_id so it is trying to interpret m.route_id as a column alias for the shipment ID, which isn't what you intended; and an alias is a single identifier rather than a dot-separated hierarchy. Hence the error you are seeing, though that isn't coming from Oracle itself - your client seems to be parsing it first.
Oracle also doesn't support LIMIT, but from 12c it has a row-limiting clause you can use instead:
SELECT * FROM (
SELECT
m.shipment_id,
m.route_id,
m.leg_sequence_id,
m.leg_warehouse_id,
m.leg_ship_method,
row_number() over (partition by m.route_id order by m.leg_sequence_id desc) as rn
FROM posimorders.sc_execution_eu.o_detailed_routes_v2 m
)
WHERE rn=1
FETCH FIRST 100 ROWS ONLY;
or WITH TIES if you prefer.
The three levels in posimorders.sc_execution_eu.o_detailed_routes_v2 looks wrong too though... see the docs.
when I run the following query:
select
(
select t.person_uid
from table1 t
where t.CELL_PH_NUM = table2.CELL_PH_NUM
and rownum<2
order by t.created desc
)
from temp table2 ;
...Oracle return the following error:
ORA-00907: missing right parenthesis
I cannot understand where is the error:
If I remove the order by, the error is not returned and query is performed correctly (but doesn't return what I need)
If I run the subquery standalone (replacing table2.CELL_PH_NUM with a fixed value), the error is not returned and the query is returned correctly (but doesn't return what I need)
Where is the error?
Your query doesn't actually do what you want, because the where is applied before the order by. So, you are not getting the most recent row, necessarily.
Unfortunately, Oracle doesn't allow you to use another level of subqueries, because the correlation clause won't work. But there is a solution:
select (select max(t.person_uid) keep (dense_rank first order by t.created desc)
from table1 t
where table1.CELL_PH_NUM = table2.CELL_PH_NUM
)
from temp table2 ;
In the version of the query in your question, table1 is not defined. That might be related to the error you are getting.