How to connect Spark-Notebook to Hive metastore? - hive

This is a cluster with Hadoop 2.5.0, Spark 1.2.0, Scala 2.10, provided by CDH 5.3.2. I used a compiled spark-notebook distro
It seems Spark-Notebook cannot find the Hive metastore by default.
How to specify the location of hive-site.xml for spark-notebook so that it can load the Hive metastore?
Here is what I tried:
link all files from /etc/hive/conf, with hive-site.xml included, to the current directory
specify SPARK_CONF_DIR variable in bash

When you start the notebook set the environment variable EXTRA_CLASSPATH with the path where you have located the hive-site.xml,
this works for me:EXTRA_CLASSPATH=/path_of_my_mysql_connector/mysql-connector-java.jar:/my_hive_site.xml_directory/conf ./bin/spark-notebook
I have also passed the jar of my mysqlconnector because I have Hive with MySql.
I have found some info from this link: https://github.com/andypetrella/spark-notebook/issues/351

Using CDH 5.5.0 Quickstart VM, the solution is the following: You need the reference hive-site.xmlto the notebook which provides the access information to the hive metastore. By default, spark-notebooks uses an internal metastore.
You can the define the following environmental variable in ~/.bash_profile:
HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hive/conf.cloudera.hive/
export HADOOP_CON_DIR
(Make sure you execute source ~/.bash_profile if you do not open a new terminal the terminal)
(The solution is given here: https://github.com/andypetrella/spark-notebook/issues/351)

Related

Hive with HBase (both Kerberos) java.net.SocketTimeoutException .. on table 'hbase:meta'

Error
Receiving Timeout errors when trying to query HBase from Hive using HBaseStorageHandler.
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68199: row 'phoenix_test310,,'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase-master.example.com,16020,1583728693297, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
... 3 more
I tried to follow what documentation I could and have some hbase configuraiton options added to hive-site.xml based on this Cloudera link
Environment:
Hadoop 2.9.2
HBase 1.5
Hive 2.3.6
Zookeeper 3.5.6
First, the Cloudera link should be ignored, Hive detects the presence of HBase through environment variables and then automatically reads the hbase-site.xml configuration settings.
There is no need to duplicate HBase settings within hive-site.xml
Configuring Hive for HBase
Modify your hive-env.sh as folllows:
# replace <hbase-install> with your installation path /etc/hbase for example
export HBASE_BIN="<hbase-install>/bin/hbase"
export HBASE_CONF_DIR="<hbase-install>/conf"
Separately you should ensure HADOOP_* environment variables are set as well in hive-env.sh,
and that the hbase lib directory is added to HADOOP_CLASSPATH.
We solved this error,by adding this property hbase.client.scanner.timeout.period=600000
hbase 1.2
https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/admin_hbase_scanner_heartbeat.html#concept_xsl_dz1_jt

Not able to start hiveserver2 for Apache Hive

Could any one help to resolve below problem, I'm trying to start hserver2 and I configured hive_site.xml and configuration file for Hadoop Directory path as well and jar file hive-service-rpc-2.1.1.jar also available at directory lib. And I am able to start using hive but not hiveserver2
$ hive --service hiveserver2 Exception in thread "main" java.lang.ClassNotFoundException: /home/directory/Hadoop/Hive/apache-hive-2/1/1-bin/lib/hive-service-rpc-2/1/1/jar
export HIVE_HOME=/usr/local/hive-1.2.1/
export HIVE_HOME=/usr/local/hive-2.1.1
I am glad that I solve it's problem. Here is my question ,I have different version hive ,and My command use 1.2.1, but it find it's jar form 2.1.1.
you can user command which hive server 2 ,find where is you command from .

Logs for hive query executed via. beeline

i am running below hive coomand from beeline . Can someone please tell where can I see Map reudce logs for this ?
0: jdbc:hive2://<servername>:10003/> select a.offr_id offerID , a.offr_nm offerNm , b.disp_strt_ts dispStartDt , b.disp_end_ts dispEndDt , vld_strt_ts validStartDt, vld_end_ts validEndDt from gcor_offr a, gcor_offr_dur b where a.offr_id = b.offr_id and b.disp_end_ts > '2016-09-13 00:00:00';
When using beeline, MapReduce logs are part of HiveServer2 log4j logs.
If your Hive install was configured by Cloudera Manager (CM), then it will typically be in /var/log/hive/hadoop-cmf-HIVE-1-HIVESERVER2-*.out on the node where HiveServer2 is running (may or may not be the same as where you are running beeline from)
Few other scenarios:
Your Hive install was not configured by CM ? You will need to manually create log4j config file:
Create hive-log4j.properties config file in directory specified by HIVE_CONF_DIR environment variable. (This makes it accessible to HiveServer2 JVM classpath)
In this file, log location is specified by log.dir and log.file. See conf/hive-log4j.properties.template in your distribution for an example template for this file.
You run beeline in "embedded HS2 mode" (i.e. beeline -u jdbc:hive2:// user password) ?:
You will customize beeline log4j (as opposed to HiveServer2 log4j).
Beeline log4j properties file is strictly called beeline-log4j2.properties (in versions prior to Hive 2.0, it is called beeline-log4j.properties). Needs to be created and made accessible to beeline JVM classpath via HIVE_CONF_DIR. See HIVE-10502 and HIVE-12020 for further discussion on this.
You want to customize what HiveServer2 logs get printed on beeline stdout ?
This can be configured at HiveServer2 level using hive.server2.logging.operation.enabled and hive.server2.logging.operation configs.
Hive uses log4j for logging. These logs are not emitted to the standard output by default but are instead captured to a log file specified by Hive's log4j properties file. By default, Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
set hive.async.log.enabled=false

Apache oozie sharedlib is showing a blank list

Relatively new to Apache OOZIE and did an installation on Ubuntu 14.04, Hadoop 2.6.0, JDK 1.8. I was able to install oozie and the web console is visible at the 11000 port of my server.
Now while i copied the examples bundled with oozie and tried to run them i am running into an error which says no sharedlib exists.
Installed the sharedlib as below-
bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310
(my namenode is running on localhost 54310 and JT on localhost 54311)
hadoop fs -ls /user/hduser/share/lib is showing shared library created as per the oozie-site.xml file. However when i check the shared library using the command -
oozie admin -oozie http://localhost:11000/oozie -shareliblist the list is blank and also jobs are failing for the same reason.
Any clues on how should i approach this problem?
Thanks.
The sharelib create command looks fine.
If you havent done so already copy the core-site.xml from your hadoop installation folder into $OOZIE_HOME/conf/hadoop-conf/.
There might already be a "placeholder" core-site.xml in the hadoop-conf folder, delete or rename that one. Oozie doesnt get its hadoop configuration directly from your hadoop install (like hive for example) but from the core-site.xml you place in that hadoop-conf folder.
Okay i got a solution for this.
So when i was trying to create the sharedlib directory it was doing on HDFS but while running the job local path was being refereed. So i extracted the oozie-sharedlib tar.gz file in my local /user/hduser/share/lib directory and its working now.
But did not get the reason so its still an open question.
I have encountered the same issue and it turned out that
oozie was not able to communicate with hdfs, as it was not able to find the location for core-site.xml or any other hadoop configuration which has to be declared inside oozie-site.xml.
Corresponding property in oozie-site.xml is oozie.service.HadoopAccessorService.hadoop.configurations
this property was defined wrongly in my case.
changed it to point to where my Hadoop configuration xmls are present and then it started communicating with hdfs and hence was able to locate the sharelib on hdfs

Sqoop - Could not find or load main class org.apache.sqoop.Sqoop

I installed Hadoop, Hive, HBase, Sqoop and added them to the PATH.
When I try to execute sqoop command, I'm getting this error:
Error: Could not find or load main class org.apache.sqoop.Sqoop
Development Environment:
OS : Ubuntu 12.04 64-bit
Hadoop Version: 1.0.4
Hive Version: 0.9.0
Hbase Version: 0.94.5
Sqoop Version: 1.4.3
make sure you have sqoop-1.4.3.jar under your SQOOP HOME directory.
Note : May be because you had downloaded wrong distribution under Sqoop Distribution
I have resolved this issue on CentOS 6.3.
I have Hadoop-1.0.4, hbase-0.94.6, hive-0.10.0, pig-0.11.1, sqoop-1.4.3.bin__hadoop-1.0.0, zookeeper-3.4.5 installed.
I was also running same problem at sqoop: Error - Could not find the main class: org.apache.sqoop.Sqoop.
To resolve this issue I have copied the jar file: sqoop-1.4.3.jar from $SQOOP_HOME/ into the $HADOOP_HOME/lib/.
Hope this would help someone who struggling sqoop to be work with hadoop.
Unfortunately, I didn't find a complete answer for my problems. Current sqoop installation version I used was 1.4.6 . I am not sure about sqoop-1.4.6.tar.gz if one has to compile the source code, I was able to beat the same error Error - Could not find the main class: org.apache.sqoop.Sqoop using following instructions:
Instead I downloaded sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz from apache sqoop and installed it at /home/ubuntu/SQOOP/ renamed sqoop-1.4.6.bin__hadoop-2.0.4-alpha to sqoop. I wanted to use with Yarn.
Then export and set $SQOOP_HOME
I used this
export SQOOP_HOME=/home/ubuntu/SQOOP/sqoop/
export PATH=$PATH:$SQOOP_HOME/bin
Now if one go to $SQOOP_HOME/bin and try
./sqoop help
It should work without any issue.
The problem in my case was that hadoop-env.sh file has this line in it:
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
It seems that when you call sqoop it internally calls configure-sqoop which sets the HADOOP_CLASSPATH correctly but then when it (sqoop) calls hadoop, hadoop ignores that variable and reset it back to what is in hadooop-env.sh
The fix was to change the hadoop-env.sh to have this line instead:
export HADOOP_CLASSPATH="${JAVA_HOME}/lib/tools.jar:$HADOOP_CLASSPATH"
#user225003 solution magically worked and I looked into some of the files and here is what happens under the hood when you execute "sqoop" script.
The "sqoop" script essentially executes "hadoop" script from $HADOOP_COMMON_HOME/bin/ directory. While configuring sqoop, in "sqoop-env.sh" we set the $HADOOP_COMMON_HOME to hadoop installation directory. If your sqoop and hadoop installations are not in regular location /usr/local, I believe sqoop-x.x.x.jar is not in the hadoop script's classpath.