Scroll through command history in hive cli containing multi-line queries - hive

I just ran a query
with mydata as (
select 1 id, map('key11','val11','key12','val12','key13','val13') as mymap
union all
select 2 id, map('key21','val21','key22','val22','key13','val13') as mymap --Key13 also exist in first row
)
select * from mydata lateral view outer explode (mymap) m;
The actual query does not matter: but the question is then when I want to edit or re-run the query by hitting up-arrow I see
> select * from mydata lateral view outer explode (mymap) m;
That of course is not what I am looking for, but rather to view the entire sql again. Other command tools including ipython support editing multi-line commands - I think via the readline library. Is this supported by hive cli ?

Related

Cross joining to an unnested mapping field in HQL (works in athena, not in Hive)

So I have two (mapping) fields I need to unpack and break out into rows. In athena, I can use the following approach (to unpack iether of them:
SELECT
unique_id,
key,
value
FROM
(
select
unique_id,
purchase_history
from table
)
CROSS JOIN unnest(purchase_history) t(key,value)
This works perfectly in athena, I get 1 row for each purchase along with their unique identifier. However, when I try to test it in Hive it doesn't work. Is there anything specific in here that doesn't fly in HQL? I think cross joins are allowed, but perhaps the way I am calling the field isn't working? Or is it the "unnest"? Please let me know if you need further explanation.
You can do the same in Hive using lateral view explode, if purchase_history is of type map, this will work:
SELECT
s.unique_id,
t.key,
t.value
FROM
(
select
unique_id,
purchase_history
from table
) s --alias for sub-queries is a must in Hive
lateral view explode(s.purchase_history) t as key,value

Query result as variable in another using jdbc

Because I want to optimize a query, I want to rennounce at a join. Due of that, I need to declare a variable before the main query, but I can't find a solution to use it in jdbc statement.
Original query:
SELECT
d.orders
SUM(price * qty) / d.orders
FROM main_table
INNER JOIN (
SELECT SUM(qty) AS orders FROM main_table
WHERE status = 1) d
WHERE status = 1
GROUP BY d.orders
formatted query:
SET #orders = SELECT SUM(qty) AS orders FROM main_table WHERE status = 1
SELECT
#orders,
SUM(price * qty) / #orders
FROM main_table
WHERE status = 1
I can't find a solution to execute correctly this formatted query using jdbc. Due of grouping by a variable, I'm not sure this will run correctly. Also, I don't want to split this in 2 separated executions using the results of first in the second because will increase the execution time and queries count.
A part of jdbc code
val statement = conn.prepareStatement(query)
val rset = statement.executeQuery()
if (rset.next()) {
// read results
}
This is executed every 10 seconds because is used in a realtime dashboard. The db type is Impala Kudu (I'm thinking to build the queries as stored procedures, but I'm afraid that Kudu doesn't have support for it). The app is writed in Scala but use jdbc from Java to querying the database.
I already removed some methods from query (decimal casts) to optimize the query and leave it as simple as possible but I still want to remove some unusefull joins there. Is not the only, I have some other similar queries there, so, a small upgrade can have a huge benefit.
thanks
I suggest to use
SELECT
SUM(qty),
SUM(price * qty)
FROM main_table
WHERE status = 1
When you get result just divide second value by first value in you java code.
Or even better:
SELECT
SUM(price * qty) / SUM(qty)
FROM main_table
WHERE status = 1

recursive query with select * raises ORA-01789

This is a minimized version of complex recursive query. The query works when columns in recursive member (second part of union all) of recursive CTE are listed explicitly:
with t (c,p) as (
select 2,1 from dual
), rec (c,p) as (
select c,p from t
union all
select t.c,t.p from rec join t on rec.c = t.p
)
select * from rec
I don't get why error ORA-01789: query block has incorrect number of result columns is raised when specified t.* instead.
with t (c,p) as (
select 2,1 from dual
), rec (c,p) as (
select c,p from t
union all
select t.* from rec join t on rec.c = t.p
)
select * from rec
Why t.* is not equivalent to t.c,t.p here? Could you please point me to documentation for any reasoning?
UPDATE: reproducible on 11g and 18 (dbfiddle).
I finally asked on AskTom forum and according to response from Oracle expert Connor McDonald, this behavior is in compliance with documentation, namely the sentence The number of column aliases following WITH query_name and the number of columns in the SELECT lists of the anchor and recursive query blocks must be the same which can be found in this paragraph.
The point is, the expansion of star expression is done after checking whether the numbers of columns are same. Hence one must list columns explicitly, shortening to star is not possible.
Seems like there could be some kind of bug to me. I modified the query slightly just to test various cases and am now able to reproduce an ORA-00600 error in my Oracle 19.6.0.0.0 database! Running the problematic query on apex.oracle.com or on livesql.oracle.com (which is running 19.8.0.0.0) also results in errors. Reporting it to Oracle now!

Hdp, Hive, Lateral view and null: disappearing rows

Since the upgrade from hdp 3.1.0 to 3.1.4, I have some issue in Hive I do not understand. Note that I am only using ORC transactional tables.
For instance this query:
with cte as (
select
e.id
, '{}' as json
from event e
)
-- select count(*) from cte
select
id
, lv.customfield
from cte
lateral view outer
json_tuple(cte.json, 'customfield') cv AS `customfield`
It worked perfectly before the upgrade.
Now, even if the CTE returns a certain number of rows, using the lateral view will just drop rows from the resultset, without any error, whereas there is no extra where clause outside the CTE (in my real example, the query returns 66 rows without the lateral view, but only 19 with).
In my case I have:
select count(*) give me 66 rows
when the lateral view on a static string is added, I only get 19 rows.
I tried quite a few variations:
if I replace the event table by a static CTE (select stack(1, ...)) I have the result I expect
if I remove the lateral view, I have the number of rows I expect (as long as I do not use is distinct from)
if instead of a CTE I create and use a temporary table, the outcome does not change.
if I put json_tuple(cte.json, 'customfield') in the select part outside the CTE (and nothing else as it would not be valid), without the lateral view, I have the number of expected rows,
If I use get_json_object in the select part outside the CTE (and no lateral view) I have the expected results.
of course, there is nothing in the hive (server or metastore) logs.
as a side note, since the upgrade a merge statement [keeps generating duplicates][1], whereas it worked perfectly before.
Another extremely surprising thing is that inside the CTE there is an if statement, for instance: if(is_deleted is null, 'true', 'false').
If I replace the is null with is not distinct from null, which should be perfectly valid, no rows are returned by the CTE.
I am completely at loss and I have no idea why this happens and how I can trust hive. 
I cannot replicate the error by generating manual data so I cannot give a (not) working example.
The actual reason I do not understand yet, but I could isolate the problem and could actually submit a bug report: https://issues.apache.org/jira/browse/HIVE-22500
In short, a lesser than or equals with implicit string conversion to timestamp fails if a sort by (implicit or explicit) is involved.
-- valid result
select count(*) from ( select * from opens where load_ts <= '2019-11-13 09:07:00') t;
-- invalid result
select count(*) from ( select * from opens where load_ts <= '2019-11-13 09:07:00' sort by id) t;
You can see the bug report for full set up or other examples. The workaround is to explicitly cast the string to a timestamp.

how can i add select query result into dataset later again select query run

i want to add select query result into dataset, so i can write new query to run on it to get net dataset but how?
Original query:
MyDATASET=(
select x, y,z from table1
union all
select k,l,m from table2
)
i wan to this select * from this.MyDATASET
Well, you could perhaps create a CTE, UDF or view? But it really isn't clear what you are trying to do...
CREATE VIEW MyView AS
select x, y,z from table1
union all
select k,l,m from table2
GO
SELECT * FROM MyView
SELECT * FROM MyView WHERE x = 0
etc
Assuming you want to cache data for re-use later...
Use a temp table or table variable if it's contained within one bit of code.
If you want to refer the same data in several processes or calls, then use a temp table. Use a local one for many calls but don't close the connecion, use a global one for many different processes/connections.
If it's just one big select where you want to re-use the same data, then use a CTE.
A view also works but the data may change between executions.