Setting hive configuration properties through hive --service jar - hive

Can somebody tell me if i can pass hiveconf properties through cli. Actually i want to run a jar using hive --service jar and in this command i want to set some properties. I have tried the following commands but didnot work:
hive --service jar myjar.jar my.example.jar.MyMainClass -hiveconf x=y
hive --service jar myjar.jar my.example.jar.MyMainClass HIVE_OPTS x=y
Thanks in advance

No, As far as I have seen, it does not work with hive --service jar. --hiveconf can be used to set some configuration when launching a CLI, or thrift server.

Related

I am getting an exception classnotfounnd error while trying to configure metastore in hive in ubuntu

I am trying install hive in ubuntu. But I am getting this error when i try to create derby metastore
schematool -dbType derby -initSchema
I have configured these things in .bashrc
HIVE_HOME
I configured these things in bin/hive-config.sh
hive-config.sh
What am I doing wrong here? Please help me with this
Thanks in advance
I have tried different versions of hive. Also tried pasting there HIVE_HOME variables in different lines. Hadoop was running while configuring these things.

Not able to start hiveserver2 for Apache Hive

Could any one help to resolve below problem, I'm trying to start hserver2 and I configured hive_site.xml and configuration file for Hadoop Directory path as well and jar file hive-service-rpc-2.1.1.jar also available at directory lib. And I am able to start using hive but not hiveserver2
$ hive --service hiveserver2 Exception in thread "main" java.lang.ClassNotFoundException: /home/directory/Hadoop/Hive/apache-hive-2/1/1-bin/lib/hive-service-rpc-2/1/1/jar
export HIVE_HOME=/usr/local/hive-1.2.1/
export HIVE_HOME=/usr/local/hive-2.1.1
I am glad that I solve it's problem. Here is my question ,I have different version hive ,and My command use 1.2.1, but it find it's jar form 2.1.1.
you can user command which hive server 2 ,find where is you command from .

How to access custom UDFs through Spark Thrift Server?

I am running Spark Thrift Server on EMR. I start up the Spark Thrift Server by:
sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh --queue interactive.thrift --jars /opt/lib/custom-udfs.jar
Notice that I have a customer UDF jar and I want to add it to the Thrift Server classpath, so I added --jars /opt/lib/custom-udfs.jar in the above command.
Once I am in my EMR, I issued the following to connect to the Spark Thrift Server.
beeline -u jdbc:hive2://localhost:10000/default
Then I was able to issue command like show databases. But how do I access the custom UDF? I thought by adding the --jars option in the Thrift Server startup script, that would add the jar for Hive resource to use as well.
The only way I can access the custom UDF now is by adding the customer UDF jar to Hive resource
add jar /opt/lib/custom-udfs.jar
Then create function of the UDF.
Question:
Is there a way to auto config the custom UDF jar without adding jar each time to the spark session?
Thanks!
The easiest way is to edit the file start-thriftserver.sh, at the end:
Wait server is ready
Execute setup SQL query
You could also post a proposal on JIRA, this is a very good feature "Execute setup code at start up".
The problem here seems to be that the --jars should be positioned correctly; which should be the first argument. I too had trouble getting the jars to work properly. This worked for me
# if your spark installation is in /usr/lib/
sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh \
--jars /path/to/jars/jar1.jar,/path/to/jars/jar2.jar \
--properties-file ./spark-thrift-sparkconf.conf \ # this is only needed if you want to customize spark configuration, it looks similar to spark-defaults.conf
--class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

Logs for hive query executed via. beeline

i am running below hive coomand from beeline . Can someone please tell where can I see Map reudce logs for this ?
0: jdbc:hive2://<servername>:10003/> select a.offr_id offerID , a.offr_nm offerNm , b.disp_strt_ts dispStartDt , b.disp_end_ts dispEndDt , vld_strt_ts validStartDt, vld_end_ts validEndDt from gcor_offr a, gcor_offr_dur b where a.offr_id = b.offr_id and b.disp_end_ts > '2016-09-13 00:00:00';
When using beeline, MapReduce logs are part of HiveServer2 log4j logs.
If your Hive install was configured by Cloudera Manager (CM), then it will typically be in /var/log/hive/hadoop-cmf-HIVE-1-HIVESERVER2-*.out on the node where HiveServer2 is running (may or may not be the same as where you are running beeline from)
Few other scenarios:
Your Hive install was not configured by CM ? You will need to manually create log4j config file:
Create hive-log4j.properties config file in directory specified by HIVE_CONF_DIR environment variable. (This makes it accessible to HiveServer2 JVM classpath)
In this file, log location is specified by log.dir and log.file. See conf/hive-log4j.properties.template in your distribution for an example template for this file.
You run beeline in "embedded HS2 mode" (i.e. beeline -u jdbc:hive2:// user password) ?:
You will customize beeline log4j (as opposed to HiveServer2 log4j).
Beeline log4j properties file is strictly called beeline-log4j2.properties (in versions prior to Hive 2.0, it is called beeline-log4j.properties). Needs to be created and made accessible to beeline JVM classpath via HIVE_CONF_DIR. See HIVE-10502 and HIVE-12020 for further discussion on this.
You want to customize what HiveServer2 logs get printed on beeline stdout ?
This can be configured at HiveServer2 level using hive.server2.logging.operation.enabled and hive.server2.logging.operation configs.
Hive uses log4j for logging. These logs are not emitted to the standard output by default but are instead captured to a log file specified by Hive's log4j properties file. By default, Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
set hive.async.log.enabled=false

How to connect Spark-Notebook to Hive metastore?

This is a cluster with Hadoop 2.5.0, Spark 1.2.0, Scala 2.10, provided by CDH 5.3.2. I used a compiled spark-notebook distro
It seems Spark-Notebook cannot find the Hive metastore by default.
How to specify the location of hive-site.xml for spark-notebook so that it can load the Hive metastore?
Here is what I tried:
link all files from /etc/hive/conf, with hive-site.xml included, to the current directory
specify SPARK_CONF_DIR variable in bash
When you start the notebook set the environment variable EXTRA_CLASSPATH with the path where you have located the hive-site.xml,
this works for me:EXTRA_CLASSPATH=/path_of_my_mysql_connector/mysql-connector-java.jar:/my_hive_site.xml_directory/conf ./bin/spark-notebook
I have also passed the jar of my mysqlconnector because I have Hive with MySql.
I have found some info from this link: https://github.com/andypetrella/spark-notebook/issues/351
Using CDH 5.5.0 Quickstart VM, the solution is the following: You need the reference hive-site.xmlto the notebook which provides the access information to the hive metastore. By default, spark-notebooks uses an internal metastore.
You can the define the following environmental variable in ~/.bash_profile:
HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hive/conf.cloudera.hive/
export HADOOP_CON_DIR
(Make sure you execute source ~/.bash_profile if you do not open a new terminal the terminal)
(The solution is given here: https://github.com/andypetrella/spark-notebook/issues/351)