Hive with Tez, No input paths specified in job - hive

I have used hadoop-0.20.x.x, hive-0.11.0. I would talk about hive queries: with the specified configuration every thing is good and working fine.
Now, we have upgraded to hadoop-2.6.x (hadoop2)and hive-0.14.x. Also using Apache Tez.
The problem is, hadoop works as is. But hive sql queries doesn't.
The below query works fine in the older version's. But throw errors in the upgraded version's:
QUERY : SELECT abc.property_name, xyz.date, xyz.time, xyz.value_as_number, xyz.value_units FROM dbname.xyz JOIN dbname.abc ON (xyz.id = abc.src_id) WHERE xyz.person_id=138312;
EXCEPTION:
INFO : Session is already open
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1435524970199_0035)
INFO : Map 1: -/- Map 2: -/-
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1435524970199_0035_1_00, diagnostics=[Vertex vertex_1435524970199_0035_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: concept initializer failed, vertex=vertex_1435524970199_0035_1_00 [Map 1], java.io.IOException: No input paths specified in job
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:328)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : Vertex failed, vertexName=Map 2, vertexId=vertex_1435524970199_0035_1_01, diagnostics=[Vertex vertex_1435524970199_0035_1_01 [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: observation initializer failed, vertex=vertex_1435524970199_0035_1_01 [Map 2], java.io.IOException: No input paths specified in job
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:328)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : DAG failed due to vertex failure. failedVertices:2 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)
Exception says, No input path specified. Well, i understand and know how to do solve in haodop-mapreduce program. But, how do we do it using hive query. Anyway, i don't think this is the same.
To make out, i have used hive shell and beeline shell, hive returned expected output but, beeline returned the same exception as above.
The beauty of the problem is query on individual table works fine. But, when i try to work on the JOIN, it throws the above exception.
But, i have understood that, there's an impact of Apache Tez on my query. Can some one suggest the solution or pin point tez reference, so i could read and rewrite the query accordingly. Thanks

It worked by disabling apache tez.
Look's like apache tez isn't stable yet.

Related

Hive query throw "code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask" exception when query has GROUP BY cluase

I have Hive + LLAP on HDP 3.1.4
Hive and Tez Config is:
yarn.nodemanager.resource.memory-mb = 40960
yarn.scheduler.minimum-allocation-mb = 1024
yarn.scheduler.maximum-allocation-mb = 40960
hive.tez.container.size = 4096
num_llap_nodes=4
hive.llap.daemon.num.executors=8
hive.llap.daemon.yarn.container.mb = 35840
llap_headroom_space=2048
llap_heap_size=32768
hive.llap.io.memory.size=1024
tez.am.resource.memory.mb=4096
hive.tez.java.opts=-server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps **-Xmx3276m**
tez.runtime.io.sort.mb= 1638
tez.runtime.unordered.output.buffer.size-mb=409
The following query runs properly:
select count(*) from balance;
but when use group by expression in the following query:
select count(*),jobdate from balance group by jobdate;
I
I've tried many configurations but this long exception is thrown:
ERROR: Error while processing statement: **FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.**
Vertex failed, vertexName=Map 1,
vertexId=vertex_1617520101397_0014_1_00, diagnostics=[Task
failed, taskId=task_1617520101397_0014_1_00_000013,
diagnostics=[TaskAttempt 0 failed, **info=[Error: Error while
running task ( failure ) : java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.tez.runtime.library.api.TezRuntimeConfiguration** at
**BLABLA**
at java.lang.Thread.run(Thread.java:748) ]], Task failed,
taskId=task_1617520101397_0014_1_00_000006,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while
running task ( failure ) : java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.tez.runtime.library.api.TezRuntimeConfiguration at
at java.lang.Thread.run(Thread.java:748) ]], Task failed,
taskId=task_1617520101397_0014_1_00_000005,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while
running task ( failure ) : java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.tez.runtime.library.api.TezRuntimeConfiguration at
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:111)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) ]], **Vertex did not
succeed due to OWN_TASK_FAILURE, failedTasks:9 killedTasks:31761,
Vertex vertex_1617520101397_0014_1_00 [Map 1] killed/failed due
to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1617520101397_0014_1_01, diagnostics=[Vertex
received Kill while in RUNNING state., Vertex did not succeed due
to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:18, Vertex
vertex_1617520101397_0014_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:1 Error Code: 2**
There are two sections for set hive.tez.container.size in Ambari Hive Config page. One of them appears in the SETTINGS tab and the other that has related to LLAP goes under the Advanced hive-interactive-site in the ADVANCED tab. I was trying with hive.tez.container.size value the SETTINGS tab instead of Advanced hive-interactive-site section. Finally, I set the following configs and the error solved:
set hive.tez.container.size=10240;
set hive.tez.java.opts=-Xmx9216m;
set tez.runtime.io.sort.mb=3072;
set tez.runtime.unordered.output.buffer.size-mb=1024;

cannot feed a table in hive from hbase

on HDP 3.1.x I created a table linked to Hbase with the option STORED BY org.apache.hadoop.hive.hbase.HBaseStorageHandler.
When executing a select, it works fine.
When I try to populate a table from this, it crashes with the error
create table test as select * from hbase_xxx;
INFO : Completed executing command(queryId=hive_20210205161427_a49ca7bc-0637-4c19-9a62-6657376373a1); Time taken: 74.951 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1611574680060_3923_1_00, diagnostics=[Vertex vertex_1611574680060_3923_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: raw_eff_ann_ent initializer failed, vertex=vertex_1611574680060_3923_1_00 [Map 1],
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0
When having a look to YARN logs, it appears that it tries to connect to zookeeper from a datanode with localhost:2181 ... and failed
2021-02-05 11:22:41,921 [WARN] [ReadOnlyZKClient-localhost:2181#0x48730f2c] |zookeeper.ReadOnlyZKClient|: 0x48730f2c to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 1
The same log on a select show the zookeeper_quorum connection string to zookeeper and succeed
Any ideas?
I had a return from the support; you have to force hbase-site.xml into the template of hive env
This is the solution
Add the below:
export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}:/etc/hbase/conf/hbase-site.xml
to your Advanced hive-env->hive-env template, before this statement export METASTORE_PORT={{hive_metastore_port}}

Why am I getting negative allocated mappers in Tez job? Vertex failure?

I'm trying to use the PhoenixStorageHandler as documented here, and populate it with the following query in beeline shell:
insert into table pheonix_table select * from hive_table;
I get the following breakdown of the mappers in the Tez session:
...
INFO : Map 1: 0(+50)/50
INFO : Map 1: 0(+50)/50
INFO : Map 1: 0(+50,-2)/50
INFO : Map 1: 0(+50,-3)/50
...
before the session crashes with a very long error message (422 lines) about vertex failure:
Error: Error while processing statement: FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex
failed, vertexName=Map 1, vertexId=vertex_1499857429667_0084_2_00,
diagnostics=[Task failed, taskId=task_1499857429667_0084_2_00_000007,
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Map
operator initialization failed [.........] Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:49, Vertex vertex_1499857429667_0084_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
What is this error referring to? Why are there 'negative mappers'?
Negative number indicates the number of failed or killed attempts. The format is:
finished(running,-failed or killed)/total
You can see details about why some mapper has failed in job tracker logs.
See also this answer: https://stackoverflow.com/a/39144600/2700344

Configuration values for hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode in HIVE

I am trying to add data to an external table using apache-hive. I am getting the following error in the hive logs
2015-06-15 17:27:44,614 ERROR [LocalJobRunner Map Task Executor #0]: mr.ExecMapper (ExecMapper.java:map(171)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"transactiondate":"05-01-2015 08:26:21","transactiontype":"CASHOUT","transactionid":144590889,"sourcenumber":null,"destnumber":null,"amount":19000,"assumedfield1":880,"customerid":33394093,"transactionstatus":"COMPLETED","assumedfield2":325,"assumedfield3":175870}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 256
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:933)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:709)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 10 more
I googled for this error and came across this link which says that we must change the values of hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode variables to higher values. What are the optimum configurations for these variables on a single node hadoop installation? None of these configuration values are working for me. Please help.
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=250;
Please do not try to increase hive partitions to higher value .
It may cause Namenode crash . If possible try to change the partition column and apply new logic over it

Pig: STORE with MongoInsertStorage don't work

I'm executing this simple code in a pig script:
REGISTER /home/myuser/mongodb/mongo-2.10.1.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-core_cdh4.3.0-1.2.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-pig_cdh4.3.0-1.2.0.jar
set mapred.map.tasks.speculative.execution false;
set mapred.reduce.tasks.speculative.execution false;
col = LOAD 'mongodb://localhost:27017/mydb.mycollection' using com.mongodb.hadoop.pig.MongoLoader ('id:chararray, companyId:chararray, ts:chararray', 'id');
STORE col INTO 'mongodb://localhost:27017/mydb.mycollection2' USING com.mongodb.hadoop.pig.MongoInsertStorage ('', '');
it returns the following error:
Location Config: Configuration: For URI: file:/tmp/temp449583595/tmp-109467318
2014-04-04 14:30:40,913 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /home/myuser/pig/pig_1396614639609.log
the end of file pig_1396614639609.log:
... at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused
by: java.lang.IllegalArgumentException: Invalid URI Format. URIs must
begin with a mongodb:// protocol string. at
com.mongodb.hadoop.pig.MongoInsertStorage.setStoreLocation(MongoInsertStorage.java:159)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:576)
... 17 more
I don't know where is the error so that mongodb protocol string "mongodb://" is well-written.
I have a similar issue when running LOAD and STORE using mongo-hadoop on the same Pig script.
It throws
java.net.UnknownHostException: localhost:27017 is not a valid Inet address
at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:587)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I didn't investigate further, but either is a bug or some parameter related to locking. I don't know.
If I run the same code, but loading and storing in different scripts it runs without a problem.