HIve2 with Tez gives Execution Error - hive

I am using Hive2 with Tez. When I run the query it gives execution error which shown below.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
ERROR [432a4475-d246-4596-ad4c-54de6fea86c8 main] exec.Task: Failed to execute tez graph.
java.lang.IllegalArgumentException: Can not create a Path from an empty string

You have to put tez tar into local hdfs (/user/hadoop/tez) also set this path tez.lib.uris in tez-site.xml (tez/conf/tez-site.xml).

Related

Error running tez in hive. Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

Hadoop 3.3.5
Hive 3.1.3
Tez 0.10.2
I follow the instruction in this link to build tez 0.10.2 for hadoop 3.3.5: https://tez.apache.org/install.html
The db is stored on s3 bucket and I am able to run 'select count(*) from m1.t1' using hive.execution.engine=mr.
When I set hive.execution.engine=tez, and run the same query, I got this error immediately:
2023-02-15T21:21:09,208 INFO [a6e2cd1a-b2c9-42d8-9568-8e0b64677f77 main] client.TezClient: App did not succeed. Diagnostics: Application application_1676506240754_0019 failed 2 times due to AM Contai
ner for appattempt_1676506240754_0019_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2023-02-15 21:21:08.730]Exception from container-launch.
Container id: container_1676506240754_0019_02_000001
Exit code: 1
[2023-02-15 21:21:08.732]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
[2023-02-15 21:21:08.733]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
If I set tez.use.cluster.hadoop-libs to true in tez-site.xml, I got YARN running but failed with load aws credential error even I have set the fs.s3a credentials in hadoop's core-site.xml, hive's hive-site.xml and .bashrc environment variables.
keys are masked to show sample only:
echo $AWS_ACCESS_KEY_ID
I9U996400005XXXXXXXX
echo $AWS_SECRET_KEY
mPY8GiU6NegNWoVnaODXXXXXXXXXXXXXXXXXXXX
hive> set hive.execution.engine=tez;
hive> select count(*) from m1.t1;
Query ID = hdp-user_20230215210146_62ed9fab-5d4a-42a9-bf54-5fb6f84a9048
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1676506240754_0015)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container INITIALIZING -1 0 0 -1 0 0
Reducer 2 container INITED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 2.03 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1676506240754_0015_3_00, diagnostics=[Vertex vertex_1676506240754_0015_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1676506240754_0015_3_00 [Map 1], java.nio.file.AccessDeniedException: s3a://hadoop-cluster/warehouse/tablespace/managed/hive/m1.db/t1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Tried to add all fs.s3a properties from core-site.xml to tez-site.xml and set fs,s3a,access.key and set fs.s3a.secret.key= inside hive session but still get same error.
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Question: according to tez install instruction
Ensure tez.use.cluster.hadoop-libs is not set in tez-site.xml, or if it is set, the value should be false
But when set to false, tez could not run.
When set to true, I got aws credential error even though I set them in every possible location or environment variables.
==========================================================
Update:
Not sure if this is the right answer to this problem but I finally got it working by adding this property to hive-site.xml
<property>
<name>hive.conf.hidden.list</name>
<value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
</property>
Default all fs.s3a credential are hidden config even you don't set this property. I explicitly add this property and remove all fs.s3a credential related from the value.
Now, I can run select count(*) with tez.

the file is not owned by hive and load data is also not ran as hive

I use HDP3.1. And I Ambari to deploy hadoop cluster and hive. I want to use only one user(hdfs) to run the all programs(such as hadoop, hive, sqoop, yarn...). So I change the users all to hdfs in set ACCOUNTS step when deploy hadoop cluster in ambari. After deployed, I run sqoop to import data from mysql to hive. I have the following issue.
19/02/20 18:44:13 INFO hive.HiveImport: ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: Load Data failed for hdfs://datacenter1:8020/user/hdfs/person/part-m-00000 as the file is not owned by hive and load data is also not ran as hive
19/02/20 18:44:13 INFO hive.HiveImport: INFO : Completed executing command(queryId=hdfs_20190220184412_d61d8591-04fc-41a7-b412-d64935ddd046); Time taken: 0.235 seconds
19/02/20 18:44:13 INFO hive.HiveImport: Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: Load Data failed for hdfs://datacenter1:8020/user/hdfs/person/part-m-00000 as the file is not owned by hive and load data is also not ran as hive (state=08S01,code=1)
19/02/20 18:44:13 INFO hive.HiveImport: Closing: 0: jdbc:hive2://datacenter2:2181,datacenter1:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/20 18:44:13 ERROR tool.ImportTool: Import failed: java.io.IOException: Hive exited with status 2
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:299)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:234)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:558)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:656)
at org.apache.sqoop.Sqoop.run(Sqoop.java:150)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249)
at org.apache.sqoop.Sqoop.main(Sqoop.java:258)
This issue happened in reduce step. I don't why it need hive user. Does anyone know how to resolve it?
change config of hive in hive-site.xml.
Change value: hive.load.data.owner from hive to nifi.
Restart all hive service and check again.
Duonghb

Yarn Launch Container Failed with Privilege Issue

Stack: Ambari 2.4.2.0, HDP 2.5.3.0, CentOS 6.8, FreeIPA 3.0.0
When I try to use hdp user to submit a job on yarn, _000001 container can be created and launched successfully, but I got error when _000002 container is being launched after container created:
2018-11-27 22:13:35,919 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(170)) - Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 1
main : run as user is hdp
main : requested yarn user is hdp
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
Full command array for failed execution:
[/usr/hdp/current/hadoop-yarn-nodemanager/bin/container-executor, hdp, hdp, 1, application_1543327888220_0001, container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/usercache/hdp/appcache/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/launch_container.sh, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.tokens, /hadoop/yarn/local/nmPrivate/application_1543327888220_0001/container_e14_1543327888220_0001_01_000002/container_e14_1543327888220_0001_01_000002.pid, /hadoop/yarn/local, /hadoop/yarn/log, cgroups=none]
2018-11-27 22:13:35,921 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(107)) - Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255:
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:103)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
There is no more log about Privilege, anybody has some idea?
Thanks in advance!
Problem resolved and the problem is submitted job itself not YARN/Privilege.
Suggestion is that you'd better try to find details in container log not resourcemanager/nodemanager log.

Failed to schematool -initSchema -dbType derby

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Failed to create database 'metastore_db', see the next exception for details.
SQL Error code: 40000
Use --verbose for detailed stacktrace.
* schemaTool failed *
FYI,
please check the permission on hive installation directory.
hive installation directory should be owned by the same user that is for hadoop.
that's how it worked for me.

Pig Hcatalog error

I am trying to run pig script in local mode on a single node cluster as given below.
hduser#ubuntu:~$ pig -x local -f "/home/hduser/ddsoft/pigscript/FirstUDF.pig"
But I am getting below error.
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file
'/home/hduser/ddsoft/hive-0.13.1-bin/hcatalog/share/hcatalog/hcatalog-core-0.13.1.jar'
does not exist.
how do I register the jar file mentioned in the error message. I tried updating the .bashrc, but it didn’t fix the error.