zeppelin hive interpreter throws ClassNotFoundException - hive

I have deployed zeppelin 0.6 and configured hive under Jdbc interpreter.
Tried executing
%hive
show databases
Throws:
org.apache.hive.jdbc.HiveDriver class java.lang.ClassNotFoundException
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:264)
org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:220)
org.apache.zeppelin.jdbc.JDBCInterpreter.getStatement(JDBCInterpreter.java:233)
org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:292)
org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:398)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:383)
org.apache.zeppelin.scheduler.Job.run(Job.java:176)
org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

I just ran into this issue this morning. I'm not sure if this is the recommended way to fix, but I downloaded the binary packages for Hive 1.2, and Hadoop 2.6.4. I then copied the following jars to ./interpreter/jdbc/ and reloaded zeppelin ./bin/zeppelin-daemon.sh reload
cp ~/Dev/Hadoop/apache-hive-1.2.1-bin/lib/hive-jdbc-1.2.1-standalone.jar ./interpreter/jdbc/
cp ~/Dev/Hadoop/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar ./interpreter/jdbc/

1)
You could download just Hive JDBC driver instead of whole Hive jars set, for example, one from Cloudera:
http://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-17.html
2)
Hive starting with 0.14 will have a standalone jar for JDBC part:
hive-jdbc-standalone.jar
but until https://issues.apache.org/jira/browse/HIVE-9600 is resolved,
you would need two more jars:
hadoop-common.jar
hadoop-auth.jar
to put into classpath along with hive-jdbc-standalone.jar

The top rated answer given here fixes the issue
However I have added the classpath of HADOOP_HOME to interpreter.sh to take the jar files in common
Below is the line which I have added to bin/interpreter.sh inside zeppelin
HADOOP_HOME=/opt/hadoop-2.6.2/
addJarInDirForIntp "${HADOOP_HOME}/share/hadoop/common

Related

java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()

I'm using hadoop 3.2.1 and hive 2.3.6.
When I run show databases, it shows the following error
'''
hive> show databases;
Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator
at org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:108)
at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
'''
What does it mean?And why do i get this error? Please give clarity.
Thanks in Advance.
According to the release page [1] Hive 2.3.3 works with Hadoop 2.x.y (not 3.x.y) so if you want to run with Hadoop 3.2.1 try a newer version.
Other than that the error looks like a classpath problem related with guava. I guess you have one Guava version coming from Hive and another version coming from Hadoop. Try removing one of them. For instance:
cd apache-hive-2.3.3-bin/lib
rm guava*
Even if you solve the problem above most likely you will bump up into another so it is better to choose versions that are compatible.
[1] https://hive.apache.org/downloads.html
Please upgrade to apache-hive-3.1.2 if that is an option for you. I had exact same issue that was resolved by the upgrade. Other option may be would be to compare lib folder of hive 2.3.6 versus hive 3.1.2; this issue is primarily due to incompatible jars.

Nifi org.apache.thrift.transport.TTransportException

I have Nifi 1.4.0 and Hive 2.3.0 . Metastore Service is running fine but some some reason Nifi can't execute PutHiveStreaming Processor.
Following is the full stack. any idea on this?
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:527)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$392/1467727491.apply(Unknown Source)
at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54)
at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError$$Lambda$394/2094052256.apply(Unknown Source)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:545)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$389/23901131.apply(Unknown Source)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$12(PutHiveStreaming.java:677)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$383/664174107.process(Unknown Source)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2174)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2144)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:631)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:555)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$379/701280946.execute(Unknown Source)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:555)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://
Caused by: org.apache.nifi.util.hive.HiveWriter$TxnBatchFailure: Failed acquiring Transaction Batch from EndPoint: {m
Caused by: org.apache.thrift.transport.TTransportException: null
The Hive processors in Apache NiFi 1.4.0 are built against Apache NiFi 1.2.1, so are not guaranteed to work with Apache Hive 2.3.0. Also there may be incompatibilities between Apache NiFi and a vendor-specific version of Hive. For example, if you are using the Hortonworks Data Platform (HDP), it has a Hive version based on 1.2.x but is closer to 2.0. Apache NiFi's Hive processors are not compatible with HDP 2.5+, you would likely want to use the NiFi-only Hortonworks Data Flow (HDF) package. This version of NiFi is built against the HDP Hive 1.2.x version.
Having said that, none of the above solutions are guaranteed to work against Hive 2.x (whether Apache or vendor-specific), as the NiFi baseline is still 1.2.x.

I am looking for a solution on the issue(org.apache.hive.service.cli.thrift.TCLIService$Iface) while connecting talend open studio with hive

i am facing this issue while connecting talend open studio with hive. Below is the error:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hive/service/cli/thrift/TCLIService$Iface at
org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at
java.sql.DriverManager.getConnection(DriverManager.java:664) at
java.sql.DriverManager.getConnection(DriverManager.java:247) at
mtn_project.hive_test_0_1.hive_test.tHiveConnection_1Process(hive_test.java:353)
at
mtn_project.hive_test_0_1.hive_test.runJobInTOS(hive_test.java:674)
at mtn_project.hive_test_0_1.hive_test.main(hive_test.java:523)
Caused by: java.lang.ClassNotFoundException:
org.apache.hive.service.cli.thrift.TCLIService$Iface at
java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
[statistics] disconnected at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 6 more
NoClassDefFoundError usually indicates that certain libraries in your environment are missing.
See for example Connect Hive thorugh Java JDBC
In your case it might be a possibility that you need the Big Data edition.
I had the same error message and using following jars helped me.
They're located in the $SPARK_HOME/jars folder:
commons-logging-1.1.3.jar
hadoop-common-3.0.0.jar
hive-jdbc-1.2.1.spark2.jar
hive-metastore-1.2.1.spark2.jar
httpclient-4.5.2.jar
libthrift-0.9.3.jar
guava-14.0.1.jar
hive-exec-1.2.1.spark2.jar
hive-service-1.2.2.jar
httpcore-4.4.4.jar

Apache zeppelin error

I have started apache zeppelin and running successfully in the configured port.
While I am executing simple spark commands like
sc.version
println(zeppelin)
I can just see "ERROR" string near run button without any error output on my console.
My error log :
ERROR [2016-10-21 22:38:05,837] ({pool-2-thread-6} Job.java[run]:189) - Job failed
org.apache.spark.SparkException: Found both spark.driver.extraClassPath and SPARK_CLASSPATH. Use only the former.
at org.apache.spark.SparkConf$$anonfun$validateSettings$7$$anonfun$apply$8.apply(SparkConf.scala:492)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7$$anonfun$apply$8.apply(SparkConf.scala:490)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7.apply(SparkConf.scala:490)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7.apply(SparkConf.scala:478)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:478)
at org.apache.spark.SparkContext.(SparkContext.scala:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_1(SparkInterpreter.java:440)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:354)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:137)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:743)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Also I have mentioned the spark and java path in _zeppelin-env.sh_ file.
I know, this is quite a late answer, but for any one who faces similar issue.
Even I had similar issue, I had added some external dependency {com.twitter:algebird-core_2.11:0.11.0} in interpreter Mode for Spark. If any dependency added for any component, there might be a case that dependency is not compatible with Zeppelin Version or does not exist on remote. Once I removed that dependency, I started seeing other errors on Zeppelin console.
Just to make sure, "zeppelin.spark.printREPLOutput" is set to true, generally by default it is true only, but just to make sure!!

Install Spark on existing Hadoop cluster (ISSUE with HIVE)

I am trying to get a Spark/Shark cluster up but keep running into the same problem. I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
Here are the details, any help would be great.
I already installed the following package:
Spark/Shark 1.0.0
Apache Hadoop 2.4.0
Apache Hive 0.13
Scala 2.9.3
Java 7
I configure ~/spark/conf/spark-env.sh as:
export HADOOP_HOME=/path/to/hadoop/
export HIVE_HOME=/path/to/hive/
export MASTER=spark://xxx.xxx.xxx.xxx:7077
export SPARK_HOME=/path/to/spark
export SPARK_MEM=4g
export HIVE_CONF_DIR=/path/to/hive/conf/
source $SPARK_HOME/conf/spark-env.sh
When start spark with "./spark-withinfo", I get the following errors:
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
14/07/07 16:26:57 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
14/07/07 16:26:57 [main]: WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO SessionState:
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)
... 2 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 7 more
Caused by: java.lang.NoSuchFieldError: METASTOREINTERVAL
at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:78)
at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:60)
at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)
... 12 more
I guess Spark can not find some libs ton connect metastore in Hive, but I have been stacked here for a couple days and don't know how to solve it. BTW, I use MYSQL for hive metadata, and everything works well in hive.
Any help is appreciated. Thanks in advance.
You may need to add mysql connector jar file before you start spark...
In my case, I added mysql connector jar like below.
$SPARK_HOME/bin/compute-classpath.sh
CLASSPATH=$CLASSPATH:/opt/big/hive/lib/mysql-connector-java-5.1.25-bin.jar