Basically I want to deploy a Flink custom JAR file to a new AWS EMR cluster. Here is a summary of what I did. I created a new AWS EMR cluster.
Step1:Software and steps changes -
Created a AWS EMR cluster with flink as the service. (EMR release version - 5.17.0) and clicked Flink 1.5.2 as the software configuration.
Entered the Configuration JSON:-
[
{
"Classification": "flink-conf",
"Properties": {
"jobmanager.heap.mb": "3072",
"taskmanager.heap.mb": "51200",
"taskmanager.numberOfTaskSlots":"2",
"taskmanager.memory.preallocate": "false",
"parallelism.default": "1"
}
]
Step2:Hardware - No change in the hardware configuration.By default we have 1 master, 2 core and 0 Task instances. All are m3.xlarge type.
Step3:General Cluster Settings - No change here.
Step4:Security - Provided my EC2 key pair.
Once the cluster creation is ready I SSHed to the EC2 machine and tried to deploy the custom jar file. Below are the different errors I got everytime tried to deploy it via the CLI.
1)
flink run -m yarn-cluster -yn 2 -c com.deepak.flink.examples.WordCount flink-examples-assembly-1.0.jar
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-10-09 06:30:36,766 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-IPADDRESS.ec2.internal/IPADDRESS:8032
2018-10-09 06:30:36,909 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-10-09 06:30:37,168 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing YARN application
2)
flink run -c com.deepak.flink.examples.WordCount flink-examples-assembly-1.0.jar
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterRetrieveException: Couldn't retrieve standalone cluster
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:51)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:31)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:253)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: org.apache.flink.util.ConfigurationException: Config parameter 'Key: 'jobmanager.rpc.address' , default: null (deprecated keys: [])' is missing (hostname/address of JobManager to connect to).
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getJobManagerAddress(HighAvailabilityServicesUtils.java:141)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:81)
at org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:158)
at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:183)
at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:156)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:49)
... 10 more
Even I tried to deploy via the AWS Web UI, there also the jar failed to deploy.
So, Basically I want to deploy the custom JAR to the flink YARN Cluster. I am not sure what I am missing for the YARN flink configuration or anything else. Thanks for any help in advance.
You should reduce memory allocation for task manager. Currently, you are trying to allocate 51.2G of memory whereas single m3.xlarge machine has only 15G of memory and in total 30G for 2 machines cluster.
Related
I use docker compose to start 3 services in their respect containers:
zookeeper, kafka broker, and minio-connector
The three services can be started and connected successfully when I use the following configurations in minio-connector to consume from kafka and dump record in JSON format to minio:
start up command:
root#e1d1294c6fe6:/opt/bitnami/kafka/bin# ./connect-standalone.sh /plugins/connector.properties /plugins/s3-sink.properties
connector.properties file:
bootstrap.servers=kafka:9092
plugin.path=/plugins
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
s3-sink.properties file:
name=s3-sink
connector.class=io.confluent.connect.s3.S3SinkConnector
tasks.max=1
topics=202208.minio.connector.test
s3.region=us-east-1
s3.bucket.name=minioUsr
s3.part.size=5242880
flush.size=1
store.url=https://minio.kube.url
storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.json.JsonFormat
schema.compatibility=NONE
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
Now I'd like the connector to consume records and dump to minio in parquet format.
kafka and zookeeper services remain the same. I modified the connector.properties and s3-sink.properties but the connector cannot start.
new connector.properties file:
bootstrap.servers=kafka:9092
plugin.path=/plugins
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=true
key.converter.schema.registry.url=https://registry...:1443
value.converter.schema.registry.url=https://registry...:1443
new s3-sink.properties:
name=s3-sink
connector.class=io.confluent.connect.s3.S3SinkConnector
tasks.max=1
topics=202208.minio.connector.test
s3.region=us-east-1
s3.bucket.name=minioUsr
s3.part.size=5242880
flush.size=1
store.url=https://minio.kube.url
storage.class=io.confluent.connect.s3.storage.S3Storage
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
schema.compatibility=NONE
format.class=io.confluent.connect.s3.format.parquet.ParquetFormat
enhanced.avro.schema.support=true
My questions are:
With the above configuration, the connector fails to start with an exception
[2022-08-29 15:21:10,610] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:130)
org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.connect.avro.AvroConverter for configuration value.converter: Class
io.confluent.connect.avro.AvroConverter could not be found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:728)
at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:474)
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:467)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:108)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:129)
at org.apache.kafka.connect.runtime.WorkerConfig.<init>(WorkerConfig.java:385)
at org.apache.kafka.connect.runtime.standalone.StandaloneConfig.<init>(StandaloneConfig.java:42)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:81)
I installed the connector by downloading the confluentinc-kafka-connect-s3-10.1.0 zip file and manually unzip and copy all jars to /plugins/lib. I found the following jars related to parquet:
root#e1d1294c6fe6:/opt/bitnami/kafka/bin# ls -1 /plugins/lib/*parquet*
/plugins/lib/parquet-avro-1.11.1.jar
/plugins/lib/parquet-column-1.11.1.jar
/plugins/lib/parquet-common-1.11.1.jar
/plugins/lib/parquet-encoding-1.11.1.jar
/plugins/lib/parquet-format-structures-1.11.1.jar
/plugins/lib/parquet-hadoop-1.11.1.jar
What is missing in installation?
Any further change is required in the configuration?
I have to manually install avro converter using confluent-hub
download confluent-hub client:
https://docs.confluent.io/5.5.1/connect/managing/confluent-hub/client.html
using confluent-hub to install avro converter, following answers in:
Kafka Connect Confluent S3 Sink Connector: Class io.confluent.connect.avro.AvroConverter could not be found
I am trying to run a mapreduce job on EMR cluster. The version of Hadoop on EMR is 2.7.3.
The code is used to read HFiles residing on S3 bucket. But every time I run it fails with the below error.
2018-02-22 20:02:11,641 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.TaskLog.createLogSyncer()Ljava/util/concurrent/ScheduledExecutorService;
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:250)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.<init>(MRAppMaster.java:233)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1472)
2018-02-22 20:02:12,188 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
End of LogType:syslog
The actual code was designed to read files from HDFS which was all doing fine in CDH based clusters where the hadoop version is 2.6.0. However there was a requirement to read the HFiles from S3 bucket on EMR based cluster in AWS. I made few changes in the code which will allow it to read any file system. Below is the snippet of the change
...
Path JSONOutputjob2 = new Path( args[1] );
FileSystem.get(JSONOutputjob2.toUri(), conf2).delete(JSONOutputjob2, true);
...
I am passing the path as an argument and here are the options that I have tried with the file path.
s3n://emr-ip/path/to/the/file
s3a://emr-ip/path/to/the/file
s3://emr-ip/path/to/the/file
This error is really driving me crazy. I have updated my pom.xml file to use the available Hadoop version of the cluster and built the project. The build was also successful. But does not work. Any suggestions or help is much appreciated.
Edit:
I have update my pom to have the aws hadoop version i.e 2.7.3 which did not fix the issue.
We are migrating JBOSS from 4.0.3SP1 to Wildfly 10.1.0. Our applications are bundled in separate sars which contain standard as well as Dynamic beans.
We are getting "javax.management.ReflectionException: The MBean class could not be loaded by the default loader repository" exception caused by "java.lang.ClassNotFoundException: com.xxx.ccr.common.adapter.PEAdapterLCM". Stack Trace is added in the end. PEAdapter class is a Dynamic MBean and is present in one of the jars in the common modules under /opt/coreservices/wildfly-10.1.0.Final/modules.
We are using standalone-full.xml to start our Wildfly instance (ccr2)
[root#puiqr710dev08 CCR]# ps -eaf|grep ccr2
xxxiq 28948 7529 21 15:17 ? 00:07:10 /opt/Xxx/CCR/jre/bin/java -verbose:class -Ddss.port=31002 -DCONTAINER_UUID=14132c5860baba870160bac5d84e084a -Dlcm.host= xxx.xxxx.xxx.xxx -server -XX:NewRatio=1 -XX:+UseG1GC -XX:+UseLargePages -XX:MaxGCPauseMillis=1000 -XX:GCTimeRatio=10 -XX:+DisableExplicitGC -Dsun.nio.ch.disableSystemWideOverlappingFileLockCheck=true -Ddss.thread_pool_size=24 -Xss250k -Xms256m -Xmx4096m -DUSE_DELAY1=60000 -DUSE_DELAY2=60000 -Dlog4j.configuration=file:/opt/Xxx/CCR/appserver/jboss-boot.log4j.properties -DLOG_FILE_PREFIX=DataProcessingJBoss_puiqdevdads07 -Ddss.message_lifetime=180 -Djava.awt.headless=true -Dorg.jboss.logging.Log4jService.catchSystemErr=false -classpath /opt/Xxx/CCR/jre/lib/tools.jar -jar /opt/coreservices/wildfly-10.1.0.Final/jboss-modules.jar -mp /opt/coreservices/wildfly-10.1.0.Final/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/opt/coreservices/wildfly-10.1.0.Final -Djboss.server.base.dir=/opt/coreservices/wildfly-10.1.0.Final/ccr2 -Djboss.bind.address=xxx.xxxx.xxx.xxx -Djboss.bind.address.management= xxx.xxxx.xxx.xxx -c standalone-full.xml
· We have four sars that are deployed in this instance. All sars are deployed successfully.
· We have packaged our sars as follows:
--lib (contains all jars)
--META-INF
|--jboss-deployment-structure.xml
|--jboss-service.xml
· Each sar contains some standard MBeans and few Dynamic MBeans.
· We have defined standard MBeans in jboss-services.xml. They are created properly and can be seen in JConsole.
· We can NOT include definition of Dynamic MBeans in jboss-service.xml as their name is constructed at runtime.
· PEAdapterLCM for which we are getting ClassNotFoundException is a dynamic bean and we are getting this error for all Dynamic MBeans.
· We have created modules for all common jars that were part of “server/lib” folder in JBOSS 4.0.3SP1.
We tried following to fix this error but no luck:
1) packaged contents of all sars in one single sar and deployed it.
2) Added a global module entry in standalone-full.xml which contents dynamic MBeans.
3) Added definition of dynamic MBean in jboss-service.xml (just to see if this makes any difference). With this change, we were able to see the MBean in JConsole. But still got the same error.
So what should be done to fix this error. Are there any changes - how to implement and deploy Dynamic MBeans in Wildfly?
Is there any way in Wildfly we can explicitly mention the MBean server class while starting the Wildfly as it was done in JBOSS 4.0.3SP1 using "-Djavax.management.builder.initial=org.jboss.system.server.jmx.MBeanServerBuilderImpl -Djboss.platform.mbeanserver -Dcom.sun.management.jmxremote"
Here is complete stack trace:
^[[0m^[[31m15:01:30,324 ERROR [com.xxx.coreservice.lifecycle.jmx.PEController] (MSC service thread 1-6) Exception in PEController:PEControllerJmxJBossService start for peID 14132c58609742440160974fe69307e0: : javax.management.ReflectionException: The MBean class could not be loaded by the default loader repository
at com.sun.jmx.mbeanserver.MBeanInstantiator.findClassWithDefaultLoaderRepository(MBeanInstantiator.java:104)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.createMBean(DefaultMBeanServerInterceptor.java:268)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.createMBean(DefaultMBeanServerInterceptor.java:206)
at com.sun.jmx.mbeanserver.JmxMBeanServer.createMBean(JmxMBeanServer.java:326)
at com.avaya.lifecycle.jmx.agent.PEControllerJmx._start(PEControllerJmx.java:55)
at com.xxx.coreservice.lifecycle.jmx.PEController.start(PEController.java:397)
at com.xxx.coreservice.lifecycle.jmx.PEController.startAll(PEController.java:314)
at com.xxx.coreservice.lifecycle.jmx.PEController.startup(PEController.java:238)
at com.xxx.coreservice.lifecycle.jmx.PEController.postRegister(PEController.java:700)
at com.sun.jmx.mbeanserver.MBeanSupport.postRegister(MBeanSupport.java:182)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.postRegister(DefaultMBeanServerInterceptor.java:1024)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:974)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.jboss.as.jmx.PluggableMBeanServerImpl$TcclMBeanServer.registerMBean(PluggableMBeanServerImpl.java:1527)
at org.jboss.as.jmx.PluggableMBeanServerImpl.registerMBean(PluggableMBeanServerImpl.java:871)
at org.jboss.as.jmx.MBeanRegistrationService.start(MBeanRegistrationService.java:101)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1948)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1881)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.xxx.ccr.common.adapter.PEAdapterLCM
at com.sun.jmx.mbeanserver.ClassLoaderRepositorySupport.loadClass(ClassLoaderRepositorySupport.java:232)
Thanks for help in advance.
I followed Apache Hadoop installation links and could install the same along with PIG. They all are working fine.
Following is the configuration:
Hadoop: 2.7.2
Hive: 2.1.0
Machine: Ubuntu 14.04 LTS 64-bit
Java: Version 9
Now I tried to install Apache Hive 2.1.0 according to this link [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation#AdminManualInstallation-InstallingfromaTarball].
... and started test execution of Hive CLI but everytime it throws following error and exits.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.ClassCastException: jdk.internal.loader.ClassLoaders$AppClassLoader (in module: java.base) cannot be cast to java.net.URLClassLoader (in module: java.base)
at org.apache.hadoop.hive.ql.session.SessionState.<init> (SessionState.java:374)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:350)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:663)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base#9-ea/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base#9-ea/NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base#9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base#9-ea/Method.java:533)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
..But there is a catch. If I invoke Beeline CLI then it works fine.
Could you please help :
a. Are the Beeline CLI and Hive CLI same or any specific difference?
b. Help to install/configure Hive on my machine
A : Beeline CLI VS Hive CLI https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_dataintegration/content/beeline-vs-hive-cli.html
B : According to :
http://openjdk.java.net/projects/jigsaw/talks/prepare-for-jdk9-j1-2015.pdf
Java 9 Uses no longer uses java.net.URLClassLoader.
However, I was able to solve the issue by pointing Hive to JDK8.
** I have only begun using HIVE/HADOOP... Perhaps someone could proved a better explanation or a workaround so that we are able to use JDK9...
I am trying to run flume and I am getting nullpointerexception:
.jar:/usr/local/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/local/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/local/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/local/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/home/training/Downloads/hive-0.10.0/lib/*'
-Djava.library.path=:/usr/local/hadoop/libexec/../lib/native/Linux-i386-32
org.apache.flume.node.Application --name agent SLF4J: Class path
contains multiple SLF4J bindings. SLF4J: Found binding in
[jar:file:/home/training/Downloads/apache-flume-1.6.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/training/Downloads/hive-0.10.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. 15/10/13 05:47:28 ERROR node.Application: A fatal error
occurred while running. Exception follows.
java.lang.NullPointerException at java.io.File.(File.java:251)
at org.apache.flume.node.Application.main(Application.java:302)
The command I used to start flume is as follows:
flume:
./flume-ng agent --conf
/home/training/Downloads/apache-flume-1.6.0-bin/conf/flume-conf.properties.template
--name agent
The flume config file is as follows:
agent.sources=seqGenSrc agent.channels=memoryChannel
agent.sinks=loggerSink
agent.sources.seqGenSrc.type=exec agent.sources.seqGenSrc.command=tail
-F /home/training/Desktop/log.txt agent.sources.seqGenSrc.channels=memoryChannel
agent.sinks.loggerSink.type=logger
agent.sinks.loggerSink.channel=memoryChannel
agent.channels.memoryChannel.type=memory
agent.channels.memoryChannel.capacity=100
agent.sinks.loggerSink.type=hdfs
agent.sinks.loggerSink.hdfs.path=hdfs://localhost:54310/user/training/logs
agent.sinks.loggerSink.hdfs.fileType=DataStream
Could you please let me know what I am missing.
Thanks in advance for your response.
Two things I notice:
In your command you use the --conf argument, but you omit the --conf-file argument. For example, I start flume like this:
./bin/flume-ng agent -Dflume.root.logger=INFO,console --conf ./conf --conf-file ./conf/test.conf --name agent
In your config you first set the type of the sink to loggerSink and then you set the type of the sink to hdfs. I don't think setting the type of the sink twice is what you want (or what Flume wants).
btw, I reformatted your config:
agent.sources=seqGenSrc
agent.channels=memoryChannel
agent.sinks=loggerSink
agent.sources.seqGenSrc.type=exec
agent.sources.seqGenSrc.command=tail -F /home/training/Desktop/log.txt
agent.sources.seqGenSrc.channels=memoryChannel
agent.sinks.loggerSink.type=logger
agent.sinks.loggerSink.channel=memoryChannel
agent.channels.memoryChannel.type=memory
agent.channels.memoryChannel.capacity=100
agent.sinks.loggerSink.type=hdfs
agent.sinks.loggerSink.hdfs.path=hdfs://localhost:54310/user/training/logs
agent.sinks.loggerSink.hdfs.fileType=DataStream