I am trying to write a spark dataframe to Elasticsearch as follows:
df.write.format("es").save("db/test")
Unfortunately, I receive the following error:
Py4JJavaError: An error occurred while calling o50.save.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0
in stage 3.0 (TID 8, localhost, executor driver):
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data
nodes with HTTP-enabled available; node discovery is disabled and none
of nodes specified fit the criterion [XX.XXX.XX.X:XXXX]
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:152)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:549)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I use:
spark = 2.1.0 scala = 2.11 elasticsearch = 2.4.5 Jupyter notebook
and the following command to start:
sudo PYSPARK_DRIVER_PYTHON=jupyter-notebook $SPARK_HOME/bin/pyspark --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.3.1 --conf spark.es.nodes="52.XXX.XX.XX" --conf spark.es.port="XXX" --conf spark.es.nodes.discovery=false --conf spark.es.net.http.auth.user="user" --conf spark.es.net.http.auth.pass="password"
Also when using spark.es.nodes.discovery=true, I receive an error:
Py4JJavaError: An error occurred while calling o50.save.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 3.0
in stage 3.0 (TID 11, localhost, executor driver):
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection
error (check network and/or proxy settings)- all nodes failed; tried
[[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:469)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:537)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:543)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:412)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:580)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:568)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Could anybody help?
Related
I am using Flink1.6.1 and Hadoop2.7.5. on first I start a flink
bin/yarn-session.sh -n 2 -jm 1024 -tm 1024 -d
then submit a task
./bin/flink run ./examples/batch/WordCount.jar -input hdfs://CS-201:9000/LICENSE -output hdfs://CS-201:9000/wordcount-result.txt
I got a error:
[root#CS-201 flink-1.6.1]# ./bin/flink run
./examples/batch/WordCount.jar -input hdfs://CS-201:9000/LICENSE
-output hdfs://CS-201:9000/wordcount-result.txt 2019-05-19 15:31:11,357 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- Found Yarn properties file under /tmp/.yarn-properties-root. 2019-05-19 15:31:11,357 INFO
org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found
Yarn properties file under /tmp/.yarn-properties-root. 2019-05-19
15:31:11,737 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- YARN properties set default parallelism to 2 2019-05-19 15:31:11,737 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli -
YARN properties set default parallelism to 2 YARN properties set
default parallelism to 2 2019-05-19 15:31:11,777 INFO
org.apache.hadoop.yarn.client.RMProxy -
Connecting to ResourceManager at CS-201/192.168.1.201:8032 2019-05-19
15:31:11,887 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2019-05-19 15:31:11,887 INFO
org.apache.flink.yarn.cli.FlinkYarnSessionCli - No
path for the flink jar passed. Using the location of class
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2019-05-19 15:31:11,891 WARN
org.apache.flink.yarn.AbstractYarnClusterDescriptor -
Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable
is set.The Flink YARN Client needs one of these to be set to properly
load the Hadoop configuration for accessing YARN. 2019-05-19
15:31:11,979 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor
- Found application JobManager host name 'cs-202' and port '52389' from supplied application id 'application_1558248666499_0003' Starting
execution of program
------------------------------------------------------------ The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not
retrieve the execution result. (JobID:
471f0c2d047aba74ea621c5bfe782cbf) at
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:260)
at
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
at
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:474)
at
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:426)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
at
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException:
Failed to submit JobGraph. at
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by:
java.util.concurrent.CompletionException:
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could
not complete the operation. Exception is not retryable. at
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 12 more Caused by:
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could
not complete the operation. Exception is not retryable. ... 10 more
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.runtime.rest.util.RestClientException: [Job
submission failed.] at
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 4 more Caused by:
org.apache.flink.runtime.rest.util.RestClientException: [Job
submission failed.] at
org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
at
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
... 5 more
why it happen? and How to fix that..
I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep running into timeouts after submitting the job. The flink job runs on our hadoop cluster (version 2.7) with Yarn.
I've seen the following behavior:
Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 / 1.7.1 all versions timeout while 1.4.2 works.
Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
Using 1.7.1 with "mode: legacy" gives timeout (I assume this option was removed but the documentation is outdated? https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy)
When the timeout happens I get the following stacktrace:
INFO class java.time.Instant does not contain a getter for field seconds
INFO class com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain a getter for field globalId
INFO Submitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).
INFO ------------------------------------------------------------
INFO The program finished with the following exception:
INFO org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
INFO at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
INFO at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
INFO at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
INFO at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)
INFO at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)
INFO at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
INFO at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)
INFO at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)
INFO at java.security.AccessController.doPrivileged(Native Method)
INFO at javax.security.auth.Subject.doAs(Subject.java:422)
INFO at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
INFO at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
INFO at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)
INFO Caused by: java.lang.RuntimeException: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
INFO at com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)
INFO at com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)
INFO at com.bol.fin_hdp.Main.main(Main.java:8)
INFO at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
INFO at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
INFO at java.lang.reflect.Method.invoke(Method.java:498)
INFO at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
INFO ... 12 more
INFO Caused by: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
INFO at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)
INFO at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
INFO at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
INFO at com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)
INFO at com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)
INFO ... 19 more
INFO Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
INFO at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)
INFO at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
INFO at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
INFO at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
INFO at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
INFO at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)
INFO at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
INFO at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
INFO at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
INFO at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
INFO at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
INFO at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
INFO at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
INFO at java.lang.Thread.run(Thread.java:748)
INFO Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
INFO at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
INFO ... 17 more
INFO Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-01...
INFO at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
INFO at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
INFO at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
INFO at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
INFO ... 15 more
INFO Caused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-017.example.com/some.ip.address:46500
INFO at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)
INFO ... 7 more
What changed in flip-6 that might cause this behavior and how can I fix this?
For our jobs on YARN w/Flink 1.6, we had to bump up the web.timeout setting via -yD web.timeout=100000.
In our case, there was a firewall between the machine submitting the job and our Hadoop cluster.
In newer Flink versions (1.7 and up) Flink uses REST to submit jobs. The port number for this REST service is random on yarn setups and could not be set.
Flink 1.8.0 introduced a config option to set this to a port or port range using:
rest.bind-port: 55520-55530
Stack: Ambari 2.4.2.0, HDP 2.5.3.0, CentOS 6.8, FreeIPA 3.0.0
When I try to use hdp user to submit a job on yarn, _000001 container can be created and launched successfully, but I got error when _000002 container is being launched after container created:
2018-11-27 22:13:35,919 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(170)) - Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 1
main : run as user is hdp
main : requested yarn user is hdp
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
Full command array for failed execution:
[/usr/hdp/current/hadoop-yarn-nodemanager/bin/container-executor, hdp, hdp, 1, application_1543327888220_0001, container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/usercache/hdp/appcache/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/launch_container.sh, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.tokens, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid, /hadoop/yarn/local, /hadoop/yarn/log, cgroups=none]
2018-11-27 22:13:35,921 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(107)) - Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255:
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:103)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
There is no more log about Privilege, anybody has some idea?
Thanks in advance!
Problem resolved and the problem is submitted job itself not YARN/Privilege.
Suggestion is that you'd better try to find details in container log not resourcemanager/nodemanager log.
I have installed Apache Tez 0.8.1, Hadoop version 2.7.0 and Hive version 2.01.I am able to successfully run the Map Reduce Jobs.But when I configure hive and tried to run a simple count query, it returned the below error.From the error it is trying to look for a jar,I have placed the jar in classpath but still error did not resolve.
Please help me in resolving this.Thanks in Advance!!.
hive> select count(*) from sample1;
Query ID = root_20160728215555_a58e91a6-8913-4a57-8715-bc1739a2cb02
Total jobs = 1
Launching Job 1 out of 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container FAILED -1 0 0 -1 0 0
Reducer 2 container KILLED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.17 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1469720608711_0011_1_00, diagnostics=[Vertex vertex_1469720608711_0011_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:138)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:115)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4676)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4300(VertexImpl.java:204)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3445)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3394)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3375)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1975)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2090)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2076)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 20 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/split/SplitLocationProvider
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:96)
... 25 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.split.SplitLocationProvider
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 26 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1469720608711_0011_1_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1469720608711_0011_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
UPDATE:
After I face the above issue, I have copied the hadoop-core-1.2.1.jar in hive lib folder.After that I am facing another issue while starting hive.From the trace I could figure out that there is some illegal argument passed.
Exception in thread "main" java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1550)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:516)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
... 14 more
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 1.2.1
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:165)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:132)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:93)
at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:376)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:219)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:67)
Issue is with the Apache Tez version.Apache Tez 0.8.1 is not compatible with Hadoop 2.7.0 and hive 2.0.1.
Have downloaded 0.8.4 src and built it which has resolved the issue.
Thanks!!.
I have 4 nodes CentOS hadoop cluster. I installed cloudera manager 5.5.1.
I failed to start Hbase Master.
FATAL org.apache.hadoop.hbase.master.HMaster Unhandled exception.
Starting shutdown
.
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=hbase, access=WRITE,
inode="/":hdfs:supergroup:drwxr-xr-x
Any thoughts?
Thanks