Oozie hive action fails - hive

I am creating oozie workflow for hive create table command.
I have added hive-site.xml in hdfs location.
I am getting below error:-
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, com/facebook/fb303/FacebookService$Iface
java.lang.NoClassDefFoundError: com/facebook/fb303/FacebookService$Iface
at java.lang.ClassLoader.defineClass1(Native Method)

This might be because you are missing Thrift jar or version mismatch.
Refer the following
Error while executing program with Hive JDBC

Related

Reading files from S3 to kafka topic

I have a situation wherein all the event data is getting stored in an s3 bucket and I need to fetch that from S3 to Kafka topic on ec2. I am using CamelAWSS3Connector and am facing issues of the connector not working.
Following is the error I am facing
[2023-01-06 10:11:21,048] ERROR Failed to create job for config/s3_connect.properties (org.apache.kafka.connect.cli.ConnectStandalone:107)
[2023-01-06 10:11:21,053] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117)
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114)
Caused by: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
I was expecting the publisher to push msg to topic from s3 to kafka
Following is my properties files
name=CamelAwss3SourceConnector
connector.class=org.apache.camel.kafkaconnector.aws2s3.CamelAws2s3SourceConnector
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
camel.source.maxPollDuration=10000
topics=mytopic
camel.component.aws-s3.access-key=XXXXXXXX
camel.component.aws-s3.region=ap-south-1
camel.source.path.bucketNameOrArn=poc-s3-kafkatopic
camel.source.endpoint.autocloseBody=true
camel.source.endpoint.deleteAfterRead=true
After using export command and adding jars location before calling the publisher following is the error
[2023-01-11 06:43:05,528] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117) java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114) Caused by: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
Make sure you have added plugin.path=/path/to/extracted-camel-connector to the connect-standalone.properties file.
And if that doesn't work, you'll need to export CLASSPATH environment variable to include the jar files in that path.

Apache Hive query on Tez FileNotFoundException

I'm receiving this exception when executing a Hive query on Tez with Hive 2.3.6 and Tez 0.9.2
I know Tez is configured correctly because I can manually run map-reduce jobs via Hadoop.
Dag submit failed due to java.io.FileNotFoundException: Path is not a file: /tmp/hive/root/_tez_session_dir/f4f4b17c-0657-41fa-8674-df83fa3ad362/lib
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1829)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:709)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
This error is seen on Hive 2.2+ or Hive 2.3.x+ when
hive.aux.jars.path in hive-site.xml is configured with an invalid path.
or
HIVE_AUX_JARS_PATH environment variable is configured improperly (usually in hive-env.sh)

spark.read.parquet not working in Colab

Py4JJavaError: An error occurred while calling o188.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
I tried adding the missing hadoop-aws jar file using spark-submit to the classpath but was unable to add it. This is what I tried:
!spark-submit --jars /content/hadoop-aws-2.7.1.jar
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
os.environ['PYSPARK_SUBMIT_ARGS'] = "--packages=org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell"

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails Exception

I am trying to read a table that is on a azure blob storage via pyspark and the below exception is raised even though I have added the below jars in the pyspark --jars.
azure-storage-2.0.0.jar
hadoop-azure-2.7.0.jar
Exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o38.showString.
: java.lang.NoClassDefFoundError: com/microsoft/azure/storage/blob/BlobListingDetails
Caused by: java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails
Any idea as which specific jar needs to be added to resolve the issue and read azure tables in spark?
My suggestion is that as below.
Please download the jar files of the newest version of Azure Storage Java Client & Hadoop Azure Support instead of their old version.
Check whether the path of these jars were added into the SPARK_CLASSPATH environment variable in the conf/spark-env file, or you can programmatically add the jar path via code SparkContext.addJar("Path to jar created from maven [hint: mvn package]").
Hope it helps.

Install Spark on existing Hadoop cluster (ISSUE with HIVE)

I am trying to get a Spark/Shark cluster up but keep running into the same problem. I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
Here are the details, any help would be great.
I already installed the following package:
Spark/Shark 1.0.0
Apache Hadoop 2.4.0
Apache Hive 0.13
Scala 2.9.3
Java 7
I configure ~/spark/conf/spark-env.sh as:
export HADOOP_HOME=/path/to/hadoop/
export HIVE_HOME=/path/to/hive/
export MASTER=spark://xxx.xxx.xxx.xxx:7077
export SPARK_HOME=/path/to/spark
export SPARK_MEM=4g
export HIVE_CONF_DIR=/path/to/hive/conf/
source $SPARK_HOME/conf/spark-env.sh
When start spark with "./spark-withinfo", I get the following errors:
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
14/07/07 16:26:57 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
14/07/07 16:26:57 [main]: WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO SessionState:
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)
... 2 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 7 more
Caused by: java.lang.NoSuchFieldError: METASTOREINTERVAL
at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:78)
at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:60)
at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)
... 12 more
I guess Spark can not find some libs ton connect metastore in Hive, but I have been stacked here for a couple days and don't know how to solve it. BTW, I use MYSQL for hive metadata, and everything works well in hive.
Any help is appreciated. Thanks in advance.
You may need to add mysql connector jar file before you start spark...
In my case, I added mysql connector jar like below.
$SPARK_HOME/bin/compute-classpath.sh
CLASSPATH=$CLASSPATH:/opt/big/hive/lib/mysql-connector-java-5.1.25-bin.jar