nullpointerexception when starting apache flume - apache

I am trying to run flume and I am getting nullpointerexception:
.jar:/usr/local/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/local/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/local/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/local/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/home/training/Downloads/hive-0.10.0/lib/*'
-Djava.library.path=:/usr/local/hadoop/libexec/../lib/native/Linux-i386-32
org.apache.flume.node.Application --name agent SLF4J: Class path
contains multiple SLF4J bindings. SLF4J: Found binding in
[jar:file:/home/training/Downloads/apache-flume-1.6.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/training/Downloads/hive-0.10.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. 15/10/13 05:47:28 ERROR node.Application: A fatal error
occurred while running. Exception follows.
java.lang.NullPointerException at java.io.File.(File.java:251)
at org.apache.flume.node.Application.main(Application.java:302)
The command I used to start flume is as follows:
flume:
./flume-ng agent --conf
/home/training/Downloads/apache-flume-1.6.0-bin/conf/flume-conf.properties.template
--name agent
The flume config file is as follows:
agent.sources=seqGenSrc agent.channels=memoryChannel
agent.sinks=loggerSink
agent.sources.seqGenSrc.type=exec agent.sources.seqGenSrc.command=tail
-F /home/training/Desktop/log.txt agent.sources.seqGenSrc.channels=memoryChannel
agent.sinks.loggerSink.type=logger
agent.sinks.loggerSink.channel=memoryChannel
agent.channels.memoryChannel.type=memory
agent.channels.memoryChannel.capacity=100
agent.sinks.loggerSink.type=hdfs
agent.sinks.loggerSink.hdfs.path=hdfs://localhost:54310/user/training/logs
agent.sinks.loggerSink.hdfs.fileType=DataStream
Could you please let me know what I am missing.
Thanks in advance for your response.

Two things I notice:
In your command you use the --conf argument, but you omit the --conf-file argument. For example, I start flume like this:
./bin/flume-ng agent -Dflume.root.logger=INFO,console --conf ./conf --conf-file ./conf/test.conf --name agent
In your config you first set the type of the sink to loggerSink and then you set the type of the sink to hdfs. I don't think setting the type of the sink twice is what you want (or what Flume wants).
btw, I reformatted your config:
agent.sources=seqGenSrc
agent.channels=memoryChannel
agent.sinks=loggerSink
agent.sources.seqGenSrc.type=exec
agent.sources.seqGenSrc.command=tail -F /home/training/Desktop/log.txt
agent.sources.seqGenSrc.channels=memoryChannel
agent.sinks.loggerSink.type=logger
agent.sinks.loggerSink.channel=memoryChannel
agent.channels.memoryChannel.type=memory
agent.channels.memoryChannel.capacity=100
agent.sinks.loggerSink.type=hdfs
agent.sinks.loggerSink.hdfs.path=hdfs://localhost:54310/user/training/logs
agent.sinks.loggerSink.hdfs.fileType=DataStream

Related

Error while deploying Flink custom JAR file in AWS EMR

Basically I want to deploy a Flink custom JAR file to a new AWS EMR cluster. Here is a summary of what I did. I created a new AWS EMR cluster.
Step1:Software and steps changes -
Created a AWS EMR cluster with flink as the service. (EMR release version - 5.17.0) and clicked Flink 1.5.2 as the software configuration.
Entered the Configuration JSON:-
[
{
"Classification": "flink-conf",
"Properties": {
"jobmanager.heap.mb": "3072",
"taskmanager.heap.mb": "51200",
"taskmanager.numberOfTaskSlots":"2",
"taskmanager.memory.preallocate": "false",
"parallelism.default": "1"
}
]
Step2:Hardware - No change in the hardware configuration.By default we have 1 master, 2 core and 0 Task instances. All are m3.xlarge type.
Step3:General Cluster Settings - No change here.
Step4:Security - Provided my EC2 key pair.
Once the cluster creation is ready I SSHed to the EC2 machine and tried to deploy the custom jar file. Below are the different errors I got everytime tried to deploy it via the CLI.
1)
flink run -m yarn-cluster -yn 2 -c com.deepak.flink.examples.WordCount flink-examples-assembly-1.0.jar
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-10-09 06:30:36,766 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-IPADDRESS.ec2.internal/IPADDRESS:8032
2018-10-09 06:30:36,909 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-10-09 06:30:37,168 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing YARN application
2)
flink run -c com.deepak.flink.examples.WordCount flink-examples-assembly-1.0.jar
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterRetrieveException: Couldn't retrieve standalone cluster
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:51)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:31)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:253)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: org.apache.flink.util.ConfigurationException: Config parameter 'Key: 'jobmanager.rpc.address' , default: null (deprecated keys: [])' is missing (hostname/address of JobManager to connect to).
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getJobManagerAddress(HighAvailabilityServicesUtils.java:141)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:81)
at org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:158)
at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:183)
at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:156)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:49)
... 10 more
Even I tried to deploy via the AWS Web UI, there also the jar failed to deploy.
So, Basically I want to deploy the custom JAR to the flink YARN Cluster. I am not sure what I am missing for the YARN flink configuration or anything else. Thanks for any help in advance.
You should reduce memory allocation for task manager. Currently, you are trying to allocate 51.2G of memory whereas single m3.xlarge machine has only 15G of memory and in total 30G for 2 machines cluster.

"Configuring hive with derby"

hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
[user1#slave3 ~]$ hive
which: no hbase in (/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/home/user1/hadoop-2.9.0/bin:/usr/local/jdk1.8.0_161/bin:/home/user1/hadoop-2.9.0/sbin:/home/user1/sqoop-1.4.7.bin__hadoop-2.6.0/bin:/home/user1/apache-hive-2.3.2-bin/bin:/usr/local/derby/bin:/home/user1/.local/bin:/home/user1/bin:usr/local/jdk1.8.0_161/bin:/home/user1/hadoop-2.9.0/bin:/usr/local/jdk1.8.0_161/bin:/home/user1/hadoop-2.9.0/sbin:/home/user1/sqoop-1.4.7.bin__hadoop-2.6.0/bin:/home/user1/apache-hive-2.3.2-bin/bin:/usr/local/derby/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user1/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/user1/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/home/user1/apache-hive-2.3.2-bin/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
enter image description here
I'm trying to install hive, but it gives SemanticException. can anyone help?
Have you started the metastore? Your metastore must be up and running before running the queries on hive.
run the below command to create the metastore db
schematool -dbType derby -initSchema
then start the metastore
hive --service metastore &
I got the same error, it is just because of two SLF4j are running but java uses only a single mandatory dependency so you just need to delete on of this file.
I delete this from my hive and it just worked.
All you have to do is, delete this file
log4j-slf4j-impl-2.6.2.jar!
from
/home/user1/apache-hive-2.3.2-bin/lib/
and you are good to go.

FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoSuchMethodError:

I am trying to run a mapreduce job on EMR cluster. The version of Hadoop on EMR is 2.7.3.
The code is used to read HFiles residing on S3 bucket. But every time I run it fails with the below error.
2018-02-22 20:02:11,641 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.TaskLog.createLogSyncer()Ljava/util/concurrent/ScheduledExecutorService;
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:250)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:233)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1472)
2018-02-22 20:02:12,188 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
End of LogType:syslog
The actual code was designed to read files from HDFS which was all doing fine in CDH based clusters where the hadoop version is 2.6.0. However there was a requirement to read the HFiles from S3 bucket on EMR based cluster in AWS. I made few changes in the code which will allow it to read any file system. Below is the snippet of the change
...
Path JSONOutputjob2 = new Path( args[1] );
FileSystem.get(JSONOutputjob2.toUri(), conf2).delete(JSONOutputjob2, true);
...
I am passing the path as an argument and here are the options that I have tried with the file path.
s3n://emr-ip/path/to/the/file
s3a://emr-ip/path/to/the/file
s3://emr-ip/path/to/the/file
This error is really driving me crazy. I have updated my pom.xml file to use the available Hadoop version of the cluster and built the project. The build was also successful. But does not work. Any suggestions or help is much appreciated.
Edit:
I have update my pom to have the aws hadoop version i.e 2.7.3 which did not fix the issue.

Error in installation of Hive 2.1.0 on Hadoop 2.7.2 - Pseudo distributed mode

I followed Apache Hadoop installation links and could install the same along with PIG. They all are working fine.
Following is the configuration:
Hadoop: 2.7.2
Hive: 2.1.0
Machine: Ubuntu 14.04 LTS 64-bit
Java: Version 9
Now I tried to install Apache Hive 2.1.0 according to this link [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation#AdminManualInstallation-InstallingfromaTarball].
... and started test execution of Hive CLI but everytime it throws following error and exits.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.ClassCastException: jdk.internal.loader.ClassLoaders$AppClassLoader (in module: java.base) cannot be cast to java.net.URLClassLoader (in module: java.base)
at org.apache.hadoop.hive.ql.session.SessionState.<init> (SessionState.java:374)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:350)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:663)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base#9-ea/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base#9-ea/NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base#9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base#9-ea/Method.java:533)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
..But there is a catch. If I invoke Beeline CLI then it works fine.
Could you please help :
a. Are the Beeline CLI and Hive CLI same or any specific difference?
b. Help to install/configure Hive on my machine
A : Beeline CLI VS Hive CLI https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_dataintegration/content/beeline-vs-hive-cli.html
B : According to :
http://openjdk.java.net/projects/jigsaw/talks/prepare-for-jdk9-j1-2015.pdf
Java 9 Uses no longer uses java.net.URLClassLoader.
However, I was able to solve the issue by pointing Hive to JDK8.
** I have only begun using HIVE/HADOOP... Perhaps someone could proved a better explanation or a workaround so that we are able to use JDK9...

Show databases command not working in hive?

I connected hive, and when I try to show all databases using command below, I get the following error,:
techgene#slaveone:~/apps/hive-0.12.0$ hive
Logging initialized using configuration in jar:file:/home/techgene/apps/hive-0.12.0/lib/hive-common-0.12.0.jar!/hive-log4j.properties
hive> show databases;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Can you please provide a solution for this?
This problem usually occurs when hive CLI session is improperly ended. In such case, kill the improperly closed hive CLI session as follows. After this launch hive CLI fresh.
ramisetty#aspire:~$ jps
3710 SecondaryNameNode
4103 RunJar -------------------------> hive CLI instance.
4019 TaskTracker
3467 DataNode
3242 NameNode
4366 Jps
3788 JobTracker
ramisetty#aspire:~$ kill -9 4103
ramisetty#aspire:~$
still problem persists means follow the available solutions # FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient