Validate Hive HQL syntax?

Validate Hive HQL syntax? - hive

Is there a programmatic way to validate HiveQL statements for errors like basic syntax mistakes? I'd like to check statements before sending them off to Elastic Map Reduce in order to save debugging time.

Yes there is!
It's pretty easy actually.
Steps:
1. Get a hive thrift client in your language.
I'm in ruby so I use this wrapper - https://github.com/forward/rbhive (gem install rbhive)
If you're not in ruby, you can download the hive source and run thrift on the included thrift configuration files to generate client code in most languages.
2. Connect to hive on port 10001 and run a describe query
In ruby this looks like this:
RBHive.connect(host, port) do |connection|
connection.fetch("describe select * from categories limit 10")
end
If the query is invalid the client will throw an exception with details of why the syntax is invalid. Describe will return you a query tree if the syntax IS valid (which you can ignore in this case)
Hope that helps.

"describe select * from categories limit 10" didn't work for me.
Maybe this is related to the Hive version one is using.
I'm using Hive 0.8.1.4
After doing some research I found a similar solution to the one Matthew Rathbone provided:
Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:
EXPLAIN [EXTENDED] query
So for everyone who's also using rbhive:
RBHive.connect(host, port) do |c|
c.execute("explain select * from categories limit 10")
end
Note that you have to substitute c.fetch with c.execute, since explain won't return any results if it succeeds => rbhive will throw an exception no matter if your syntax is correct or not.
execute will throw an exception if you've got an syntax error or if the table / column you are querying doesn't exist. If everything is fine, no exception is thrown but also you'll receive no results, which is not an evil thing

In the latest version hive 2.0 comes with hplsql tool which allows us to validate hive commands without actually running them.
Configuration:
add the below XML in hive/conf folder and restart hive
https://github.com/apache/hive/blob/master/hplsql/src/main/resources/hplsql-site.xml
To Run the hplsql and validate the query , please use the below command:
To validate Singe Query
hplsql -offline -trace -e 'select * from sample'
(or)
To Validate Entire File
hplsql -offline -trace -f samplehql.sql
If the query syntax is correct , the response from hplsql would be something like this:
Ln:1 SELECT // type
Ln:1 select * from sample // command
Ln:1 Not executed - offline mode set // execution status
if the query Syntax is wrong , the syntax issue in the query will be reported
If the hive version is older, we need to manually place the hplsql jars inside the hive/lib and proceed.

Related

Hive "Show Tables" Fails with MetaException

Using Hive 2.3.7 on AWS EMR (5.33.1) I have created a database which shows correctly when calling show databases;. I then create a table which seems to work correctly (no exceptions). When I call describe <table>; It correctly returns the name and schema of the table. However when I run show tables; the following error is returned:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException
Exception thrown when executing query :
SELECT A0.TBL_NAME,A0.TBL_NAME AS NUCORDER0 FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON
A0.DB_ID = B0.DB_ID WHERE B0.`NAME` = ? AND LOWER(A0.TBL_NAME) LIKE '_%' ESCAPE '\' ORDER BY NUCORDER0)
If anyone can shed any light on this issue it would be really appreciated.
I have googled around and found nothing of any use.
EDIT: show tables in <schema>; returned the same result
EDIT 2: This issue was solved by updating the EMR to emr-6.4.0. I have no great insight into the issue beyond what is mentioned here.

I think your metadata database has been corrupted/has bad data. I would take a backup. And then see if you can restore some previous backups. I would connect to the database directly and look at the those tables and see if anything looks out of the ordinary. If you find a bad table entry don't delete it. I'd try using "Delete table" commands (via hive) to remove it to keep integrity. If you have to you can delete entries in your database, you have a backup and could restore back the tables.

Hive meta store is using datanucleus, https://www.datanucleus.org/, for all CRUD of metastore database. It's generating \\ to escape backslash itself, but Mariadb interprete \\ as string literal. So it needs to use \\ as escape character.
You can see sql_mode setting here, https://mariadb.com/kb/en/sql-mode/#sql_mode-values.
Get rid of NO_BACKSLASH_ESCAPE from the mode and it should be all right.

Try providing the schema which you want to see the tables:
show tables in schema_name;

Apache Derby - Does it support 'WITH'

Couldn't find anything on the Internet for this at all and wondered if someone could help me.
Using Apache Derby with a query in a Spring Boot app:
The query starts like this:
WITH dbName AS {
(Then goes on to select data from tables)
}
When running my test cases I get an exception which reads:
java.sql.SQLSyntaxErrorException: Syntax error: Encountered "with" at line 1, column 1
Does Apache Derby support WITH? I can't see any other reason why this shouldn't work.

After some more research, I can confirm that Derby does not support recursive queries
Found this Issue https://issues.apache.org/jira/browse/DERBY-11

When querying MongoDB using DBeaver, what's the right syntax for filtering by date?

I recently discovered that DBeaver can connect to MongoDB. My next discovery was that DBeaver expects SQL-like queries instead of the JavaScript-like queries I use with the mongo command line client. I've been unable to find any good documentation on the syntax I should be using, so I've been learning by trial and error. I need some help filtering query results by date.
I have a collection named tasks. Each object in the collection has a startedAt attribute that holds a timestamp.
This query gives me lots of results using the command line client: db.tasks.find({startedAt:{$gt:ISODate("2017-03-03")}});
I'm guessing the syntax in DBeaver should be something like this: select * from tasks where startedAt > '2017-03-03';
But, I'm doing something wrong because I don't get any results in DBeaver unless I drop the where clause. What's the right way?

Is there alternative to "autotrace traceonly" on Apache Drill?

I'm new to Apache Drill.
For performance testing purpose, I'm trying to measure the time to execute a query. And also I do not need to print the executed result.
In Oracle SQL Plus, there is set autotrace traceonly. This setting feature is the following(quoting from the oracle web site):
Similar to SET AUTOTRACE ON, but suppresses the printing of the user's query output, if any. If STATISTICS is enabled, query data is still fetched, but not printed.
In Apache Drill's sqlline, I got the error like the following: Error: PARSE ERROR: Encountered "traceonly" at line 1, column 15...
Do you have any ideas for alternatives?
Thanks,
p.s.
I also read this answered question. Any command in mysql equivalent to Oracle's autotrace for performance turning
Unfortunately, it doesn't work on Apache Drill.

You could put your query(s) in a text file (e.g. query.sql), then run sqlline and forward the output to /dev/null:
bin/sqlline -u jdbc:drill:zk=localhost:2181 -f query.sql > /dev/null
It will still display some data, but only minimal:
1/1 select * from cp.`employee.json`;
1,155 rows selected (0.65 seconds)
Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
apache drill 1.4.0
"say hello to my little drill"

Hive: Select * command is not working similar to RDBMS

I installed apache hive-0.9.0 and start executing some basic commands and i found one abnormal behavior in select* command. In select statement after * any random characters are allowed in hive but in RDBMS its not allowed. I am not sure its expected behavior or bug in hive. Could some please confirm?
In the below query "abcdef" is random characters.
In RDBMS(oracle):
select *abcdef from mytable;
Output:
ERROR prepare() failed with: ORA-00923: FROM keyword not found where expected
In Hive:
select *abcdef from mytable;
Output:
Query worked fine and display all the records of mytable.

In RDBMS,select is followed by column name/* , but in HIVE,it may be column name/*/any regular expression(partitions) .
Go through this link
http://www.qubole.com/5-tips-for-efficient-hive-queries/
it provides whole description

I got response from apache hive committe. Its a indeed bug in hive. They filed a improvement ticket for this issue
Jira Ticket: https://issues.apache.org/jira/browse/HIVE-8155

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Validate Hive HQL syntax? - hive

Is there a programmatic way to validate HiveQL statements for errors like basic syntax mistakes? I'd like to check statements before sending them off to Elastic Map Reduce in order to save debugging time.

Related

Hive "Show Tables" Fails with MetaException

Apache Derby - Does it support 'WITH'

When querying MongoDB using DBeaver, what's the right syntax for filtering by date?

Is there alternative to "autotrace traceonly" on Apache Drill?

Hive: Select * command is not working similar to RDBMS

Categories

Resources