Access Hbase With Hiveserver2 Error

Access Hbase With Hiveserver2 Error - hive

I use hue to execute hive sql show tables; everything is ok.
But executed hive sql select * from tablea limit 1; and got the exception:
java.net.SocketTimeoutException:callTimeout=60000, callDuration=68043:
row 'log,,00000000000000' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740, hostname=node4,16020,1476410081203,
seqNum=0:5:1",
'org.apache.hadoop.hbase.client.RpcRetryingCaller:callWithRetries:RpcRetryingCaller.java:159',
'org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture:run:ResultBoundedCompletionService.java:64',
'*org.apache.hadoop.hbase.exceptions.ConnectionClosingException:Call
to node4/192.168.127.1:16020 failed on local exception:
org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
Connection to node4/192.168.127.1:16020 is closing. Call id=9,
waitTime=1:16:11',
'org.apache.hadoop.hbase.ipc.RpcClientImpl:wrapException:RpcClientImpl.java:1239',
'org.apache.hadoop.hbase.ipc.RpcClientImpl:call:RpcClientImpl.java:1210',
'org.apache.hadoop.hbase.ipc.AbstractRpcClient:callBlockingMethod:AbstractRpcClient.java:213',
'org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation:callBlockingMethod:AbstractRpcClient.java:287',
'org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub:scan:ClientProtos.java:32651',
'org.apache.hadoop.hbase.client.ScannerCallable:openScanner:ScannerCallable.java:372',
'org.apache.hadoop.hbase.client.ScannerCallable:call:ScannerCallable.java:199',
'org.apache.hadoop.hbase.client.ScannerCallable:call:ScannerCallable.java:62',
'org.apache.hadoop.hbase.client.RpcRetryingCaller:callWithoutRetries:RpcRetryingCaller.java:200',
'org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC:call:ScannerCallableWithReplicas.java:369',
'org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC:call:ScannerCallableWithReplicas.java:343',
'org.apache.hadoop.hbase.client.RpcRetryingCaller:callWithRetries:RpcRetryingCaller.java:126',
'*org.apache.hadoop.hbase.exceptions.ConnectionClosingException:Connection
to node4/192.168.127.1:16020 is closing. Call id=9, waitTime=1:3:2',
'org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection:cleanupCalls:RpcClientImpl.java:1037',
'org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection:close:RpcClientImpl.java:844',
'org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection:run:RpcClientImpl.java:572'],
statusCode=3), results=None, hasMoreRows=None)

in the configuration file hive-site.xml
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
set the value to false.
the true means execute the hadoop job with the user who loginin the hiveserver2.
the false means execute the hadoop job with the user who start the hiveserver2.

Related

PyHive unable to fetch logs from HiveServer2 when running in async mode

I am running into a strange issue with PyHive running a Hive query in async mode. Internally, PyHive uses Thrift client to execute the query and to fetch logs (along with execution status). I am unable to fetch the logs of Hive query (map/reduce tasks, etc). cursor.fetch_logs() returns an empty data structure
Here is the code snippet
rom pyhive import hive # or import hive or import trino
from TCLIService.ttypes import TOperationState
def run():
cursor = hive.connect(host="10.x.y.z", port='10003', username='xyz', password='xyz', auth='LDAP').cursor()
cursor.execute("select count(*) from schema1.table1 where date = '2021-03-13' ", async_=True)
status = cursor.poll(True).operationState
print(status)
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print("running ")
print(message)
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
print("running ")
status = cursor.poll().operationState
print
cursor.fetchall()
The cursor is able to get operationState correctly but its unable to fetch the logs. Is there anything on HiveServer2 side that needs to be configured?
Thanks in advance

Closing the loop here in case someone else has same or similar issue with hive.
In my case the problem was the hiveserver configuration. Hive Server won't stream the logs if logging operation is not enabled. Following is the list I configured
hive.server2.logging.operation.enabled - true
hive.server2.logging.operation.level EXECUTION (basic logging - There are other values that increases the logging level)
hive.async.log.enabled false
hive.server2.logging.operation.log.location

Data Ingestion from SAS to Hive

I have a requirement where the tables from SAS in the form of "abc.sas7bdat" file will be provided at a particular location along with libref(see below code:libname xxx '/workspace/abc/xyz'). I need to create a hive table from this dataset. I am using the below code. This code creates the table in Hive but its empty. Upon further research i noticed that the following parameter may be missing in hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
I did not find this property in hdfs-site.xml and that explains why the table is empty.
My question is, I need to ingest data from size ranging 1GB to greater than 200GB. How can I speed up the process? I have access only to Unix machine where the files are dropped and not sure what all has been installed.
Current Code:
options set=SAS_HADOOP_RESTFUL=1;
options set=SAS_HADOOP_JAR_PATH=<jar path>;
options set=SAS_HADOOP_CONFIG_PATH=<config path>;
options nofmterr;
%let svr = %NRSTR('server.abc.com');
%let stng = %NRSTR('stored as parquet');
libname aaa hadoop server=&svr hdfs_tempdir='/tmp/sastmp' user = 'username'
password = pxxx schema='schema name' port 10000
DBCREATE_TABLE_OPTS=`&stng`
subprotocol=hive2;
libname xxx '/workspace/abc/xyz';
data aaa.test;
set xxx.abc.test;
run;

Impala [Catalog] and Hive [Metastore/Sentry] Not Sync

We use Cloudera (CDH 5.7.5) and Hue [3.9.0]. For admin user, some of hive tables (60%) is accessible through impala. The other hive tables is not accessible. For non admin user, no database which is accessible through Impala. And again, some of database is accessible via hive.
Is it because Impala catalog not sync with hive metastore?
When I try to run invalidate metadata (for all database) I got read operation timeout error message.
I try to run invalidate metadata for some tables, but do not solve the problem, no impact. What I need to check.,
FYI, I'll got this error every time I run query via Impala. But not for hive.
AuthorizationException: User 'test.user' does not have privileges to execute 'SELECT' on: default.test01
FYI2, invalidate metadata running well. For admin user, all databases and tables is accessible via hive & impala. But for non admin user, authorized database only accessible through hive (impala no)
This is part of hue log:
[13/Jul/2018 10:32:05 +0700] thrift_util DEBUG Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.CloseOperation(args=(TCloseOperationReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=3, operationId=THandleIdentifier(secret="o\xe8}\x9a\xf6'F\x8d\x9aC\xd4!\xb2#:\x91", guid="o\xe8}\x9a\xf6'F\x8d\x9aC\xd4!\xb2#:\x91"))),), kwargs={})
[13/Jul/2018 10:32:05 +0700] thrift_util DEBUG Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.CloseOperation returned in 1ms: TCloseOperationResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0))
[13/Jul/2018 10:32:05 +0700] access INFO 10.192.64.252 myuser.test - "POST /notebook/api/autocomplete/ HTTP/1.1"
[13/Jul/2018 10:32:05 +0700] dbms DEBUG Query Server: {'SESSION_TIMEOUT_S': 43200, 'QUERY_TIMEOUT_S': 600, 'server_name': 'impala', 'server_host': 'serverhost.com', 'querycache_rows': 50000, 'server_port': 21050, 'auth_password_used': False, 'impersonation_enabled': True, 'auth_username': 'hue', 'principal': 'impala/serverhost.com'}
[13/Jul/2018 10:32:05 +0700] dbms DEBUG Query Server: {'SESSION_TIMEOUT_S': 43200, 'QUERY_TIMEOUT_S': 600, 'server_name': 'impala', 'server_host': 'serverhost.com', 'querycache_rows': 50000, 'server_port': 21050, 'auth_password_used': False, 'impersonation_enabled': True, 'auth_username': 'hue', 'principal': 'impala/serverhost.com'}

Hive: acquire explicit exclusive lock

Configuration (hortonworks)
hive: BUILD hive-1.2.1.2.3.0.0
Hadoop 2.7.1.2.3.0.0-2557
I'm trying to execute
lock table event_metadata EXCLUSIVE;
Hive response:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Current transaction manager does not support explicit lock requests. Transaction manager: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
In the code there is obvious place where explicit locks are disabled:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-exec/1.2.0/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java#DbTxnManager
321 #Override
322 public boolean supportsExplicitLock() {
323 return false;
324 }
Questions:
how can I make explicit locks work? In what version of hive do they appear?
Here is an example http://www.ericlin.me/how-table-locking-works-in-hive for cloudera that explicit locks work.

You may set the concurrency parameter on the fly:
set hive.support.concurrency=true;
After this you may try executing your command

Hive includes a locking feature that uses Apache Zookeeper for locking. Zookeeper implements highly reliable distributed coordination. Other than some additional setup and configuration steps, Zookeeper is invisible to Hive users.
In the $HIVE_HOME/hive-site.xml file, set the following properties:
<property>
<name>hive.zookeeper.quorum</name>
<value>zk1.site.pvt,zk1.site.pvt,zk1.site.pvt</value>
<description>The list of zookeeper servers to talk to.
This is only needed for read/write locks.
</description>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>Whether Hive supports concurrency or not.
A Zookeeper instance must be up and running for the default Hive lock manager to support read-write locks.</description>
</property>
After restarting hive, run the command
hive> lock table event_metadata EXCLUSIVE;
Reference: Programing Hive, O'REILLY
EDIT:
DummyTxnManager.java, which provides default Hive behavior, has
#Override
public boolean supportsExplicitLock() {
return true;
}
DummyTxnManager replicates pre Hive-0.13 behavior doesn't support transactions
where as
DbTxnManager.java,which stores the transactions in the metastore database, has:
#Override
public boolean supportsExplicitLock() {
return false;
}

Try the following:
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
unlock table tablename;

Execute Oracle 'for update skip locked' clause with Tomcat DataSource not working

I am working on a spring application which select rows from an Oracle8i DB in locked mode. Am using Tomcat7.0.57. I have configure my data source in context.xml as below.
<Resource name="jdbc/MyDataSource" auth="Container"
type="javax.sql.DataSource" driverClassName="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:#hostname:SID"
username="user1" password="pwd1" maxActive="20" maxIdle="10"
maxWait="-1"/>
In my spring application context i have mentioned lookup statement as
<jee:jndi-lookup id="testds" jndi-name="jdbc/MyDataSource"/>
Using Spring jdbcTemplate am execute below select query
select * from employees where deptno=3 for update skip locked
I am getting this error
StatementCallback; uncategorized SQLException for SQL [select * from employees where deptno=3 for update skip locked]; SQL state [null]; error code [17410]; No more data to read from socket; nested exception is java.sql.SQLException: No more data to read from socket
However as a temporary fix am using apache BasicDataSource to test my functionality.
<bean id="clarifyds" class="org.apache.commons.dbcp.BasicDataSource"
p:driverClassName="${jdbc.driverClassName}"
p:url="${jdbc.url}"
p:username="${jdbc.username}" p:password="${jdbc.password}" />
But when deploying in UAT and Prod environments i need to lookup data source configured in context.xml
This problem is when using select query with 'for update' clause. It is working fine with simple select query.
Any ideas please suggest

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Access Hbase With Hiveserver2 Error - hive

Related

PyHive unable to fetch logs from HiveServer2 when running in async mode

Data Ingestion from SAS to Hive

Impala [Catalog] and Hive [Metastore/Sentry] Not Sync

Hive: acquire explicit exclusive lock

Execute Oracle 'for update skip locked' clause with Tomcat DataSource not working

Categories

Resources