I installed apache hive-0.9.0 and start executing some basic commands and i found one abnormal behavior in select* command. In select statement after * any random characters are allowed in hive but in RDBMS its not allowed. I am not sure its expected behavior or bug in hive. Could some please confirm?
In the below query "abcdef" is random characters.
In RDBMS(oracle):
select *abcdef from mytable;
Output:
ERROR prepare() failed with: ORA-00923: FROM keyword not found where expected
In Hive:
select *abcdef from mytable;
Output:
Query worked fine and display all the records of mytable.
In RDBMS,select is followed by column name/* , but in HIVE,it may be column name/*/any regular expression(partitions) .
Go through this link
http://www.qubole.com/5-tips-for-efficient-hive-queries/
it provides whole description
I got response from apache hive committe. Its a indeed bug in hive. They filed a improvement ticket for this issue
Jira Ticket: https://issues.apache.org/jira/browse/HIVE-8155
Related
I have a table in Hive. When I run the following I always get 0 returned:
select count(*) from <table_name>;
Event though if I run something like:
select * from <table_name> limit 10;
I get data returned.
I am on Hive 1.1.0.
I believe the following two issues are related:
https://issues.apache.org/jira/browse/HIVE-11266
https://issues.apache.org/jira/browse/HIVE-7400
Is there anything I can do to workaround this issue?
The root cause is the old and outdated statistics of the table. Try issuing this command which should solve the problem.
ANALYZE TABLE <table_name> COMPUTE STATISTICS;
When you import the table first there may be various reasons the statistics is not updated by Hive services. I am still looking for options and properties to make it right.
I'm new with LinqPad and I would like to run two simple SQL statements at the same time so I can see the values in two tables. If I run the following individually it works but now when I run them at the same time. I get an error "invalid character".
Select * From Table1; Select * From Table2;
I found this article that suggests this format but it's not working for me.
How to run multiple SQL queries?
BTW: I'm using the free version of LinqPad 5.00.08 at the moment.
I know this is old, but I found this in my search for the same problem. (Using a SQL Server Compact database.) The way I was able to get mine to work was to add GO after each query.
SELECT * FROM Table1;
GO
SELECT * FROM Table2;
GO
You need to use Dump function
Table1.Dump();
Table2.Dump();
I have a HQL(Hive Query) file which has code like
select * ,'(submit_date)?+.+' from test
Table test has several other filed after submit date all of which are returned in the output of this query, but i couldn't understand how this thing works. ?
does any1 have any idea, i couldn't find any Doc related to this syntax
This is documented as the REGEX column specification:
A SELECT statement can take regex-based column specification.
We use java regex syntax. Try http://www.fileformat.info/tool/regex.htm for testing purposes.
The following query select all columns except ds and hr.
SELECT `(ds|hr)?+.+` FROM sales
All of a sudden I cannot get synonyms for tables working in BigQuery, so a query like the following works fine:
select id as id, value as value
from pos_dw_api.test
But a query like the following fails:
select a.id as id, a.value as value
from pos_dw_api.test a
The error returned is the following. I have run this from the web console:
Query Failed
Error: Unknown field: a.id
Synonyms were working just fine last week ... The example table I'm using for this select is 387047224813.pos_dw_api.test.
Has the syntax for synonyms changed? Is this a bug?
Table synonyms generally only work when you're doing a JOIN. I don't know of anything that would have caused this to change. I realize that this is kind of strange, and I've filed an internal bug to fix it.
Is there a programmatic way to validate HiveQL statements for errors like basic syntax mistakes? I'd like to check statements before sending them off to Elastic Map Reduce in order to save debugging time.
Yes there is!
It's pretty easy actually.
Steps:
1. Get a hive thrift client in your language.
I'm in ruby so I use this wrapper - https://github.com/forward/rbhive (gem install rbhive)
If you're not in ruby, you can download the hive source and run thrift on the included thrift configuration files to generate client code in most languages.
2. Connect to hive on port 10001 and run a describe query
In ruby this looks like this:
RBHive.connect(host, port) do |connection|
connection.fetch("describe select * from categories limit 10")
end
If the query is invalid the client will throw an exception with details of why the syntax is invalid. Describe will return you a query tree if the syntax IS valid (which you can ignore in this case)
Hope that helps.
"describe select * from categories limit 10" didn't work for me.
Maybe this is related to the Hive version one is using.
I'm using Hive 0.8.1.4
After doing some research I found a similar solution to the one Matthew Rathbone provided:
Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:
EXPLAIN [EXTENDED] query
So for everyone who's also using rbhive:
RBHive.connect(host, port) do |c|
c.execute("explain select * from categories limit 10")
end
Note that you have to substitute c.fetch with c.execute, since explain won't return any results if it succeeds => rbhive will throw an exception no matter if your syntax is correct or not.
execute will throw an exception if you've got an syntax error or if the table / column you are querying doesn't exist. If everything is fine, no exception is thrown but also you'll receive no results, which is not an evil thing
In the latest version hive 2.0 comes with hplsql tool which allows us to validate hive commands without actually running them.
Configuration:
add the below XML in hive/conf folder and restart hive
https://github.com/apache/hive/blob/master/hplsql/src/main/resources/hplsql-site.xml
To Run the hplsql and validate the query , please use the below command:
To validate Singe Query
hplsql -offline -trace -e 'select * from sample'
(or)
To Validate Entire File
hplsql -offline -trace -f samplehql.sql
If the query syntax is correct , the response from hplsql would be something like this:
Ln:1 SELECT // type
Ln:1 select * from sample // command
Ln:1 Not executed - offline mode set // execution status
if the query Syntax is wrong , the syntax issue in the query will be reported
If the hive version is older, we need to manually place the hplsql jars inside the hive/lib and proceed.