cannot feed a table in hive from hbase

cannot feed a table in hive from hbase - hive

on HDP 3.1.x I created a table linked to Hbase with the option STORED BY org.apache.hadoop.hive.hbase.HBaseStorageHandler.
When executing a select, it works fine.
When I try to populate a table from this, it crashes with the error
create table test as select * from hbase_xxx;
INFO : Completed executing command(queryId=hive_20210205161427_a49ca7bc-0637-4c19-9a62-6657376373a1); Time taken: 74.951 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1611574680060_3923_1_00, diagnostics=[Vertex vertex_1611574680060_3923_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: raw_eff_ann_ent initializer failed, vertex=vertex_1611574680060_3923_1_00 [Map 1],
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0
When having a look to YARN logs, it appears that it tries to connect to zookeeper from a datanode with localhost:2181 ... and failed
2021-02-05 11:22:41,921 [WARN] [ReadOnlyZKClient-localhost:2181#0x48730f2c] |zookeeper.ReadOnlyZKClient|: 0x48730f2c to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 1
The same log on a select show the zookeeper_quorum connection string to zookeeper and succeed
Any ideas?

I had a return from the support; you have to force hbase-site.xml into the template of hive env
This is the solution
Add the below:
export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}:/etc/hbase/conf/hbase-site.xml
to your Advanced hive-env->hive-env template, before this statement export METASTORE_PORT={{hive_metastore_port}}

Related

Hive query throw "code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask" exception when query has GROUP BY cluase

I have Hive + LLAP on HDP 3.1.4
Hive and Tez Config is:
yarn.nodemanager.resource.memory-mb = 40960
yarn.scheduler.minimum-allocation-mb = 1024
yarn.scheduler.maximum-allocation-mb = 40960
hive.tez.container.size = 4096
num_llap_nodes=4
hive.llap.daemon.num.executors=8
hive.llap.daemon.yarn.container.mb = 35840
llap_headroom_space=2048
llap_heap_size=32768
hive.llap.io.memory.size=1024
tez.am.resource.memory.mb=4096
hive.tez.java.opts=-server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps **-Xmx3276m**
tez.runtime.io.sort.mb= 1638
tez.runtime.unordered.output.buffer.size-mb=409
The following query runs properly:
select count(*) from balance;
but when use group by expression in the following query:
select count(*),jobdate from balance group by jobdate;
I
I've tried many configurations but this long exception is thrown:
ERROR: Error while processing statement: **FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.**
Vertex failed, vertexName=Map 1,
vertexId=vertex_1617520101397_0014_1_00, diagnostics=[Task
failed, taskId=task_1617520101397_0014_1_00_000013,
diagnostics=[TaskAttempt 0 failed, **info=[Error: Error while
running task ( failure ) : java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.tez.runtime.library.api.TezRuntimeConfiguration** at
**BLABLA**
at java.lang.Thread.run(Thread.java:748) ]], Task failed,
taskId=task_1617520101397_0014_1_00_000006,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while
running task ( failure ) : java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.tez.runtime.library.api.TezRuntimeConfiguration at
at java.lang.Thread.run(Thread.java:748) ]], Task failed,
taskId=task_1617520101397_0014_1_00_000005,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while
running task ( failure ) : java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.tez.runtime.library.api.TezRuntimeConfiguration at
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:111)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) ]], **Vertex did not
succeed due to OWN_TASK_FAILURE, failedTasks:9 killedTasks:31761,
Vertex vertex_1617520101397_0014_1_00 [Map 1] killed/failed due
to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1617520101397_0014_1_01, diagnostics=[Vertex
received Kill while in RUNNING state., Vertex did not succeed due
to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:18, Vertex
vertex_1617520101397_0014_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:1 Error Code: 2**

There are two sections for set hive.tez.container.size in Ambari Hive Config page. One of them appears in the SETTINGS tab and the other that has related to LLAP goes under the Advanced hive-interactive-site in the ADVANCED tab. I was trying with hive.tez.container.size value the SETTINGS tab instead of Advanced hive-interactive-site section. Finally, I set the following configs and the error solved:
set hive.tez.container.size=10240;
set hive.tez.java.opts=-Xmx9216m;
set tez.runtime.io.sort.mb=3072;
set tez.runtime.unordered.output.buffer.size-mb=1024;

Dbeaver Exception: Data Source was invalid

I am trying to work with Dbeaver and processing data via Spark Hive. The connection is stable as the following command works:
select * from database.table limit 100
However, as soon as I differ from the simple fetching query I get an exception. E.g. runing the query
select count(*) from database.table limit 100
results in the exception:
SQL Error [2] [08S01]: org.apache.hive.service.cli.HiveSQLException: Error
while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1526294345914_23590_12_00,
diagnostics=[Vertex vertex_1526294345914_23590_12_00 [Map 1]
killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: postings
initializer failed, vertex=vertex_1526294345914_23590_12_00 [Map 1],
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad
Request; Request ID: 95BFFF20D13AECDA), S3 Extended Request ID:
fSbzZDf/Xi0b+CL99c5DKi8GYrJ7TQXj5/WWGCiCpGa6JU5SGeoxA4lunoxPCNBJ2MPA3Hxh14M=
Can someone help me here?

400/Bad Request is the S3/AWS Generic "didn't like your payload/request/auth" response. There's some details in the ASF S3A docs, but that is for the ASF connector, not the amazon one (which yours is, from the stack trace). Bad endpoint for v4-authenticated buckets is usually problem #1, after that...who knows?
try and do some basic hadoop fs -ls s3://bucket/path operations first.
you can try running the cloudstore diagnostics against it; that's my first call for debugging a client. Its not explicitly EMR-s3-connector aware though, so it won't look at the credentials in any detail

Why am I getting negative allocated mappers in Tez job? Vertex failure?

I'm trying to use the PhoenixStorageHandler as documented here, and populate it with the following query in beeline shell:
insert into table pheonix_table select * from hive_table;
I get the following breakdown of the mappers in the Tez session:
...
INFO : Map 1: 0(+50)/50
INFO : Map 1: 0(+50)/50
INFO : Map 1: 0(+50,-2)/50
INFO : Map 1: 0(+50,-3)/50
...
before the session crashes with a very long error message (422 lines) about vertex failure:
Error: Error while processing statement: FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex
failed, vertexName=Map 1, vertexId=vertex_1499857429667_0084_2_00,
diagnostics=[Task failed, taskId=task_1499857429667_0084_2_00_000007,
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Map
operator initialization failed [.........] Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:49, Vertex vertex_1499857429667_0084_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
What is this error referring to? Why are there 'negative mappers'?

Negative number indicates the number of failed or killed attempts. The format is:
finished(running,-failed or killed)/total
You can see details about why some mapper has failed in job tracker logs.
See also this answer: https://stackoverflow.com/a/39144600/2700344

Hive with Tez, No input paths specified in job

I have used hadoop-0.20.x.x, hive-0.11.0. I would talk about hive queries: with the specified configuration every thing is good and working fine.
Now, we have upgraded to hadoop-2.6.x (hadoop2)and hive-0.14.x. Also using Apache Tez.
The problem is, hadoop works as is. But hive sql queries doesn't.
The below query works fine in the older version's. But throw errors in the upgraded version's:
QUERY : SELECT abc.property_name, xyz.date, xyz.time, xyz.value_as_number, xyz.value_units FROM dbname.xyz JOIN dbname.abc ON (xyz.id = abc.src_id) WHERE xyz.person_id=138312;
EXCEPTION:
INFO : Session is already open
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1435524970199_0035)
INFO : Map 1: -/- Map 2: -/-
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1435524970199_0035_1_00, diagnostics=[Vertex vertex_1435524970199_0035_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: concept initializer failed, vertex=vertex_1435524970199_0035_1_00 [Map 1], java.io.IOException: No input paths specified in job
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:328)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : Vertex failed, vertexName=Map 2, vertexId=vertex_1435524970199_0035_1_01, diagnostics=[Vertex vertex_1435524970199_0035_1_01 [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: observation initializer failed, vertex=vertex_1435524970199_0035_1_01 [Map 2], java.io.IOException: No input paths specified in job
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:328)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : DAG failed due to vertex failure. failedVertices:2 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)
Exception says, No input path specified. Well, i understand and know how to do solve in haodop-mapreduce program. But, how do we do it using hive query. Anyway, i don't think this is the same.
To make out, i have used hive shell and beeline shell, hive returned expected output but, beeline returned the same exception as above.
The beauty of the problem is query on individual table works fine. But, when i try to work on the JOIN, it throws the above exception.
But, i have understood that, there's an impact of Apache Tez on my query. Can some one suggest the solution or pin point tez reference, so i could read and rewrite the query accordingly. Thanks

It worked by disabling apache tez.
Look's like apache tez isn't stable yet.

PDI Error occured while trying to connect to the database

I got the following error while executing a PDI job.
I do have mysql driver in place (libext/JDBC). Can some one say, what would be the reason of failure?
Despite the error while connecting to DB, my DB is up and I can access it by command prompt.
Error occured while trying to connect to the database
Error connecting to database: (using class org.gjt.mm.mysql.Driver)
Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
ERROR 03-08 11:05:10,595 - stepname- Error initializing step [Update]
ERROR 03-08 11:05:10,595 - stepname - Step [Update.0] failed to initialize!
INFO 03-08 11:05:10,595 - stepname - Finished reading query, closing connection.
ERROR 03-08 11:05:10,596 - stepname - Unable to prepare for execution of the transformation
ERROR 03-08 11:05:10,596 - stepname - org.pentaho.di.core.exception.KettleException:
We failed to initialize at least one step. Execution can not begin!
Thanks

Is this a long running query by any chance? Or; in PDI world it can be because your step kicks off at the start of the transform, waits for something to do, and if nothing comes along by the net write timeout then you'll see this error.
If so your problem is caused by a timeout that MySQL uses and frequently needs increasing from the default which is 10 mins.
See here:
http://wiki.pentaho.com/display/EAI/MySQL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

cannot feed a table in hive from hbase - hive

Related

Hive query throw "code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask" exception when query has GROUP BY cluase

Dbeaver Exception: Data Source was invalid

Why am I getting negative allocated mappers in Tez job? Vertex failure?

Hive with Tez, No input paths specified in job

PDI Error occured while trying to connect to the database

Categories

Resources