Error running tez in hive. Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask - amazon-s3

Hadoop 3.3.5
Hive 3.1.3
Tez 0.10.2
I follow the instruction in this link to build tez 0.10.2 for hadoop 3.3.5: https://tez.apache.org/install.html
The db is stored on s3 bucket and I am able to run 'select count(*) from m1.t1' using hive.execution.engine=mr.
When I set hive.execution.engine=tez, and run the same query, I got this error immediately:
2023-02-15T21:21:09,208 INFO [a6e2cd1a-b2c9-42d8-9568-8e0b64677f77 main] client.TezClient: App did not succeed. Diagnostics: Application application_1676506240754_0019 failed 2 times due to AM Contai
ner for appattempt_1676506240754_0019_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2023-02-15 21:21:08.730]Exception from container-launch.
Container id: container_1676506240754_0019_02_000001
Exit code: 1
[2023-02-15 21:21:08.732]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
[2023-02-15 21:21:08.733]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
If I set tez.use.cluster.hadoop-libs to true in tez-site.xml, I got YARN running but failed with load aws credential error even I have set the fs.s3a credentials in hadoop's core-site.xml, hive's hive-site.xml and .bashrc environment variables.
keys are masked to show sample only:
echo $AWS_ACCESS_KEY_ID
I9U996400005XXXXXXXX
echo $AWS_SECRET_KEY
mPY8GiU6NegNWoVnaODXXXXXXXXXXXXXXXXXXXX
hive> set hive.execution.engine=tez;
hive> select count(*) from m1.t1;
Query ID = hdp-user_20230215210146_62ed9fab-5d4a-42a9-bf54-5fb6f84a9048
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1676506240754_0015)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container INITIALIZING -1 0 0 -1 0 0
Reducer 2 container INITED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 2.03 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1676506240754_0015_3_00, diagnostics=[Vertex vertex_1676506240754_0015_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1676506240754_0015_3_00 [Map 1], java.nio.file.AccessDeniedException: s3a://hadoop-cluster/warehouse/tablespace/managed/hive/m1.db/t1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Tried to add all fs.s3a properties from core-site.xml to tez-site.xml and set fs,s3a,access.key and set fs.s3a.secret.key= inside hive session but still get same error.
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Question: according to tez install instruction
Ensure tez.use.cluster.hadoop-libs is not set in tez-site.xml, or if it is set, the value should be false
But when set to false, tez could not run.
When set to true, I got aws credential error even though I set them in every possible location or environment variables.
==========================================================
Update:
Not sure if this is the right answer to this problem but I finally got it working by adding this property to hive-site.xml
<property>
<name>hive.conf.hidden.list</name>
<value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
</property>
Default all fs.s3a credential are hidden config even you don't set this property. I explicitly add this property and remove all fs.s3a credential related from the value.
Now, I can run select count(*) with tez.

Related

Yarn Launch Container Failed with Privilege Issue

Stack: Ambari 2.4.2.0, HDP 2.5.3.0, CentOS 6.8, FreeIPA 3.0.0
When I try to use hdp user to submit a job on yarn, _000001 container can be created and launched successfully, but I got error when _000002 container is being launched after container created:
2018-11-27 22:13:35,919 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(170)) - Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 1
main : run as user is hdp
main : requested yarn user is hdp
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
Full command array for failed execution:
[/usr/hdp/current/hadoop-yarn-nodemanager/bin/container-executor, hdp, hdp, 1, application_1543327888220_0001, container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/usercache/hdp/appcache/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/launch_container.sh, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.tokens, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid, /hadoop/yarn/local, /hadoop/yarn/log, cgroups=none]
2018-11-27 22:13:35,921 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(107)) - Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255:
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:103)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
There is no more log about Privilege, anybody has some idea?
Thanks in advance!
Problem resolved and the problem is submitted job itself not YARN/Privilege.
Suggestion is that you'd better try to find details in container log not resourcemanager/nodemanager log.

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS.
Start the hiveserver2 in the Ambari UI and check the contents of /var/log/hive/hiveserver2.log.
Below is the error log.
Any help would be appreciated.
Contents of hiveserver2.log
2018-03-08 04:41:53,345 WARN [main-EventThread]: server.HiveServer2 (HiveServer2.java:process(343)) - This instance of HiveServer2 has been removed from the list of server instances available for dynamic service discovery. The last client session has ended - will shutdown now.
2018-03-08 04:41:53,347 INFO [main]: zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x16203aad5af0040 closed
2018-03-08 04:41:53,347 INFO [main]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:stop(405)) - Shutting down HiveServer2
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2018-03-08 04:41:53,348 WARN [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(508)) - Error starting HiveServer2 on attempt 1, will retry in 60 seconds
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1520480101488_0046 failed 2 times due to AM Container for appattempt_1520480101488_0046_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://ip-10-0-91-7.ap-northeast-2.compute.internal:8088/cluster/app/application_1520480101488_0046 Then click on links to logs of each attempt.
Diagnostics: ExitCodeException exitCode=2: tar: Removing leading `/' from member names
tar: Skipping to next header
gzip: /hadoopfs/fs1/yarn/nodemanager/filecache/60_tmp/tmp_tez.tar.gz: invalid compressed data--format violated
tar: Exiting with failure status due to previous errors
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.startPool(TezSessionPoolManager.java:76)
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:488)
at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:87)
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:720)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I had exactly the same issue with HDP on AWS. FYI, In my case the issue was with HDP version 2.6.4.5-2. I'm going to show how I fixed using this version because it is the latest at this time.
As the error log shows the problem is with tez.tar.gz that is corrupted then YARN is unable to decompress it in the YARN container.
This tez.tar.gz file is copied from the hdfs:///hdp/apps/<hdp_version>/tez/tez.tar.gz.
To reproduce the error and confirm that this file is corrupted, you can run the following command:
sudo su
su hdfs
hdfs dfs -get /hdp/apps/2.6.4.5-2/tez.tar.gz
tar -xvzf tez.tar.gz
You will get the following error:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The fix is pretty simple, you must just replace the HDFS file with the one that you have on your local file-system running the following command:
hdfs dfs -rm /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
hdfs dfs -put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
Now restart Hive Server 2 service and done!
NOTE: If something similar happens with other services you can do the same thing. Please check the following link that has more details: https://community.hortonworks.com/articles/30096/foxing-broken-targz-and-jar-files-in-hdp-24.html
Hope this helps!

HIve2 with Tez gives Execution Error

I am using Hive2 with Tez. When I run the query it gives execution error which shown below.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
ERROR [432a4475-d246-4596-ad4c-54de6fea86c8 main] exec.Task: Failed to execute tez graph.
java.lang.IllegalArgumentException: Can not create a Path from an empty string
You have to put tez tar into local hdfs (/user/hadoop/tez) also set this path tez.lib.uris in tez-site.xml (tez/conf/tez-site.xml).

Apache hadoop Installation on Windows 10

While setting up a single node cluster without Cygwin on windows 10,I followed the specific document- Link for Hadoop installation in windows 10
I am facing the below error while starting the hdfs using D:\hadoop-2.6.2.tar\hadoop-2.6.2\hadoop-2.6.2\sbin>start-dfs.cmd
Error message stack trace:
17/01/12 12:25:42 FATAL datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2341)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2323)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2215)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2262)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2438)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2462)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:620)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2341)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2323)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2215)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2262)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2438)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2462) 17/01/12 12:25:42 INFO util.ExitUtil: Exiting with status 1
Also this error message about starting namenode:
17/01/12 12:25:43 FATAL namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:309)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)
17/01/12 12:25:43 INFO util.ExitUtil: Exiting with status 1
[]Problem analysis ] /data directory permissions is not enough, the NameNode cannot be started.
[Solution]
(1) in the root, the operation of the/data/directory permissions assigned to hadoop users;
(2) empty /data directory file;
(3) to reformat the NameNode, restart the hadoop cluster.

Need your help to troubleshoot an Aerospike restore issue

I am new to Aerospike and need your help to troubleshoot a restore issue. I have Aerospike running on my mac and it seem to work all fine except that it do not allow me to restore from .asb file. I took backup from an aerospike instance running on an Ubuntu machine using asbackup utility. But when I try to restore the .asb file using asrestore command on my mac instance, it throws following exception:
asrestore -d ~
restoring: host 127.0.0.1 port 3000 bin_list (null) from directory /home/vagrant
2015-08-25 15:13:43 INFO Add node BB9A9EAAB270008 127.0.0.1:3000
Aug 25 2015 15:13:43 GMT: starting restore: filename: /home/vagrant/BB9A3F5AA1ED512_00000.asb FILE 0x7f63680008c0
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
put failed in restore: unusual error 20 trying again
restore: too many consecutive put failure
Aug 25 2015 15:13:44 GMT: expired 0 : skipped 0 : attempted 0 : [updated 0 not-updated (existed 0 gen-old 0)]
I tried using -t option to restrict the thread count, but no respite.
Has any one faced a similar issue?
Looking forward to your help.
Error 20 indicates a bad namespace parameter. Check your server errorlog for more details. Seems like the namespace that is there in the backup file is not defined in the configuration of the cluster where you are trying to load using asrestore.
Two options
Create a namespace with the same namespace name as in the backup file
Write a script to change the namespace name in the backup files to the intended name which is valid in the cluster where you are trying to load.
The backup file format is documented at http://www.aerospike.com/docs/tools/backup/file_format.html