I am trying Hive on Spark execution engine.I am using Hadoop2.6.0 ,hive 1.2.1,spark 1.6.0.Hive is successfully running in mapreduce engine.Now I am trying Hive on Spark engine.Individually all are working properly.In Hive I set property as
set hive.execution.engine=spark;
set spark.master=spark://INBBRDSSVM294:7077;
set spark.executor.memory=2g;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
Added spark -asembly jar in hive lib.
and I am trying this command,
select count(*) from sample;
I am getting like this,
Starting Spark Job = b1410161-a414-41a9-a45a-cb7109028fff
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Am I missing any other settings required,please guide me.
I think the problem may be because you use incompatible versions. If you see the version compatibility on Hive on Spark: Getting Started, you'll see that these two specific versions don't ensure the correct work.
I advise you to change the version and use the compatibility version that they advise. I had same problem and I solved when change the versions for compatibility versions.
Related
I have coded a pyspark script to execute a SQL file, it worked perfectly fine on the spark latest version, but the target machine has 2.3.1, and it throws exception:
pyspark.sql.utils.AnalysisException: u"Undefined function: 'array_remove'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'
It seems these are not present in the older versions :( can anyone suggest something, i have searched alot but in vain.
my sql piece which is failing is
SELECT NIEDC.*, array_join(array_remove(split(groupedNIEDC.appearedIn,'-'), StudyCode),'-') AS subjects_assigned_to_other_studies
array_remove and array_join functions were added on spark version 2.4. You can make an UDF and register it to use in a query using this method.
I am upgrading CRATE DB from version 2.x to 3.1.6 as per suggestion when I try to upgrade tables created in version 2.x using document,
https://crate.io/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-upgraded
In step 5,
Query -
alter CLUSTER SWAP table transactions2 to transactions;
I am getting error SQLActionException[SQLParseException: line 1:15: mismatched input 'SWAP' expecting 'REROUTE']
I am not sure what would be the correct query to resolve this.
You are following the latest documentation instead of the e.g. 2.3 (https://crate.io/docs/crate/reference/en/2.3/admin/system-information.html#tables-need-to-be-recreated) documentation version.
The SWAP SQL command support was added in version 3.2, see https://crate.io/docs/crate/reference/en/latest/appendices/release-notes/3.2.0.html#database-administration.
With Apache Drill I can get the version through a JDBC connection by dispatching the query: SELECT version FROM sys.version. Is there an analogous way to determine the Hive version?
I know I can use hive --version from a machine where Hive is running and available via the command line. However a query-based approach would fit my use case a little better as JDBC connections may be made from anywhere inside my network.
It's easy if you have a JDBC Connection.
Connection conn = // get it from somewhere
DatabaseMetaData md = conn.getMetadata();
System.out.println(md.getDatabaseMajorVersion() + "." + md.getDatabaseMinorVersion());
I don't know if you can get the information from a SQL/HiveQL query.
you can try this query
select version();
I have created a job scheduler using pgagent in postgresql:
What I did is mentioned as screen shots
I had created like this to update name in my database field in a certain time. But when I check it is getting failed.
The failed status as follows:
What I did wrong? How can I correct it?
I have also faced the same problem exactly. By trial and error, I changed the Connection Type from Local to Remote and gave the following connection string
user=some_user password=some_password host=localhost port=5432 dbname=some_database
in the properties of the Step. And, it worked. So, the trick is to treat even the local server as a remote server.
Is there a programmatic way to validate HiveQL statements for errors like basic syntax mistakes? I'd like to check statements before sending them off to Elastic Map Reduce in order to save debugging time.
Yes there is!
It's pretty easy actually.
Steps:
1. Get a hive thrift client in your language.
I'm in ruby so I use this wrapper - https://github.com/forward/rbhive (gem install rbhive)
If you're not in ruby, you can download the hive source and run thrift on the included thrift configuration files to generate client code in most languages.
2. Connect to hive on port 10001 and run a describe query
In ruby this looks like this:
RBHive.connect(host, port) do |connection|
connection.fetch("describe select * from categories limit 10")
end
If the query is invalid the client will throw an exception with details of why the syntax is invalid. Describe will return you a query tree if the syntax IS valid (which you can ignore in this case)
Hope that helps.
"describe select * from categories limit 10" didn't work for me.
Maybe this is related to the Hive version one is using.
I'm using Hive 0.8.1.4
After doing some research I found a similar solution to the one Matthew Rathbone provided:
Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:
EXPLAIN [EXTENDED] query
So for everyone who's also using rbhive:
RBHive.connect(host, port) do |c|
c.execute("explain select * from categories limit 10")
end
Note that you have to substitute c.fetch with c.execute, since explain won't return any results if it succeeds => rbhive will throw an exception no matter if your syntax is correct or not.
execute will throw an exception if you've got an syntax error or if the table / column you are querying doesn't exist. If everything is fine, no exception is thrown but also you'll receive no results, which is not an evil thing
In the latest version hive 2.0 comes with hplsql tool which allows us to validate hive commands without actually running them.
Configuration:
add the below XML in hive/conf folder and restart hive
https://github.com/apache/hive/blob/master/hplsql/src/main/resources/hplsql-site.xml
To Run the hplsql and validate the query , please use the below command:
To validate Singe Query
hplsql -offline -trace -e 'select * from sample'
(or)
To Validate Entire File
hplsql -offline -trace -f samplehql.sql
If the query syntax is correct , the response from hplsql would be something like this:
Ln:1 SELECT // type
Ln:1 select * from sample // command
Ln:1 Not executed - offline mode set // execution status
if the query Syntax is wrong , the syntax issue in the query will be reported
If the hive version is older, we need to manually place the hplsql jars inside the hive/lib and proceed.