Apache Flume :- Data Pipe line - Kafka-Memory-HDFS - apache

I have installed Apache Flume 1.6.0 version.
And I wanted to create a data pipeline from Kafka->MemoryChannel->HDFS.
But I got an null pointer exception :- java.lang.NullPointerException which got resolved - https://issues.apache.org/jira/browse/FLUME-2672
but now i am getting a new exception :- java.lang.AbstractMethodError: org.apache.flume.source.kafka.KafkaSource.getBackOffSleepIncrement()J
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:136)
at java.lang.Thread.run(Thread.java:745)

Related

Reading files from S3 to kafka topic

I have a situation wherein all the event data is getting stored in an s3 bucket and I need to fetch that from S3 to Kafka topic on ec2. I am using CamelAWSS3Connector and am facing issues of the connector not working.
Following is the error I am facing
[2023-01-06 10:11:21,048] ERROR Failed to create job for config/s3_connect.properties (org.apache.kafka.connect.cli.ConnectStandalone:107)
[2023-01-06 10:11:21,053] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117)
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114)
Caused by: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
I was expecting the publisher to push msg to topic from s3 to kafka
Following is my properties files
name=CamelAwss3SourceConnector
connector.class=org.apache.camel.kafkaconnector.aws2s3.CamelAws2s3SourceConnector
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
camel.source.maxPollDuration=10000
topics=mytopic
camel.component.aws-s3.access-key=XXXXXXXX
camel.component.aws-s3.region=ap-south-1
camel.source.path.bucketNameOrArn=poc-s3-kafkatopic
camel.source.endpoint.autocloseBody=true
camel.source.endpoint.deleteAfterRead=true
After using export command and adding jars location before calling the publisher following is the error
[2023-01-11 06:43:05,528] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117) java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114) Caused by: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
Make sure you have added plugin.path=/path/to/extracted-camel-connector to the connect-standalone.properties file.
And if that doesn't work, you'll need to export CLASSPATH environment variable to include the jar files in that path.

spark.read.parquet not working in Colab

Py4JJavaError: An error occurred while calling o188.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
I tried adding the missing hadoop-aws jar file using spark-submit to the classpath but was unable to add it. This is what I tried:
!spark-submit --jars /content/hadoop-aws-2.7.1.jar
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
os.environ['PYSPARK_SUBMIT_ARGS'] = "--packages=org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell"

IllegalAccessError for RequestHedgingRMFailoverProxyProvider while launching Apache Twill Application in hadoop cluster after HDP upgrade

I'm trying to launch Apache Twill application from hadoop cluster, the cluster is recently upgraded from HDP 2.2 to HDP 2.5 but I'm getting llegalAccessError for RequestHedgingRMFailoverProxyProvider class . This class is part of org.apache.hadoop.yarn.client package. I'm getting this error in the Application Master. The job status goes directy to 'not running state' right after 'accepted state'.
Exception in thread "Hadoop22YarnAMClient STARTING" Exception in thread "YarnAMClientService STARTING" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal()Ljava/lang/Object; from class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
at org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider.init(RequestHedgingRMFailoverProxyProvider.java:75)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:163)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.twill.internal.yarn.Hadoop21YarnAMClient.startUp(Hadoop21YarnAMClient.java:77)
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
at java.lang.Thread.run(Thread.java:745)
com.google.common.util.concurrent.ExecutionError: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal()Ljava/lang/Object; from class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1008)
at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
at org.apache.twill.internal.appmaster.ApplicationMasterMain$YarnAMClientService.startUp(ApplicationMasterMain.java:221)
at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal()Ljava/lang/Object; from class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
at org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider.init(RequestHedgingRMFailoverProxyProvider.java:75)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:163)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.twill.internal.yarn.Hadoop21YarnAMClient.startUp(Hadoop21YarnAMClient.java:77)
... 2 more
In general when you see IllegalAccessError this means you have a runtime incompatibility between compiled and runtime code. In this case,
the getProxyInternal() method of ConfiguredRMFailoverProxyProvider is now private. You need to recompile your client code and/or use updated hadoop client libraries to connect to your cluster.

Oozie hive action fails

I am creating oozie workflow for hive create table command.
I have added hive-site.xml in hdfs location.
I am getting below error:-
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, com/facebook/fb303/FacebookService$Iface
java.lang.NoClassDefFoundError: com/facebook/fb303/FacebookService$Iface
at java.lang.ClassLoader.defineClass1(Native Method)
This might be because you are missing Thrift jar or version mismatch.
Refer the following
Error while executing program with Hive JDBC

Install Spark on existing Hadoop cluster (ISSUE with HIVE)

I am trying to get a Spark/Shark cluster up but keep running into the same problem. I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
Here are the details, any help would be great.
I already installed the following package:
Spark/Shark 1.0.0
Apache Hadoop 2.4.0
Apache Hive 0.13
Scala 2.9.3
Java 7
I configure ~/spark/conf/spark-env.sh as:
export HADOOP_HOME=/path/to/hadoop/
export HIVE_HOME=/path/to/hive/
export MASTER=spark://xxx.xxx.xxx.xxx:7077
export SPARK_HOME=/path/to/spark
export SPARK_MEM=4g
export HIVE_CONF_DIR=/path/to/hive/conf/
source $SPARK_HOME/conf/spark-env.sh
When start spark with "./spark-withinfo", I get the following errors:
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
14/07/07 16:26:57 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
14/07/07 16:26:57 [main]: WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO SessionState:
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)
... 2 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 7 more
Caused by: java.lang.NoSuchFieldError: METASTOREINTERVAL
at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:78)
at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:60)
at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)
... 12 more
I guess Spark can not find some libs ton connect metastore in Hive, but I have been stacked here for a couple days and don't know how to solve it. BTW, I use MYSQL for hive metadata, and everything works well in hive.
Any help is appreciated. Thanks in advance.
You may need to add mysql connector jar file before you start spark...
In my case, I added mysql connector jar like below.
$SPARK_HOME/bin/compute-classpath.sh
CLASSPATH=$CLASSPATH:/opt/big/hive/lib/mysql-connector-java-5.1.25-bin.jar