Hive policy in ranger is not working - hive

I am using a Kerberos certified cluster. Ranger version is 0.7 and Hive version is 1.2.
After ranger configured Hive policy, it is not working, but hbase and hdfs works fine. I am using Beeline to connect to Hiveserver2.

After careful examination found that parameter configuration error.
Enable Authorization changed to true
hive.security.authorization.manager changed to :
org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
This parameter was :
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory

Related

Migration from HDP non-secure cluster to CDP secure cluster

We are running a migration of HDFS data from an HDP non-sercure cluster to CDP secure cluster, when I read the Cloudera documentation, they are mentioning "distcp" as a tool to handle the migration, but also they mention only from HDP secure cluster to CDP secure/non-secure cluster which is not my case.
I have few questions :
Should I secure the exiting cluster first and then use distcp ?
or is it okay if I use distcp without security checks ?
from you're experiences how can I handle such a situation ?
Thanks in advance
From my experience you will have to run the distcp from the CDP secure cluster, with a valid kerberos ticket, and with the following parameter :
ipc.client.fallback-to-simple-auth-allowed=true
Full example :
hadoop distcp \
-D ipc.client.fallback-to-simple-auth-allowed=true \
hdfs://<hdp_namenode>:8020/<dir> \
hdfs://<cdp_namenode>:8020/<dir>

presto + Hive Security Configuration

we have presto cluster with Hadoop cluster
when all presto workers servers are installed on data-nodes machines
The following is example of a Hive Connector configuration file that is configured on the presto workers under catalog folder
connector.name=hive-hadoop2
hive.metastore.uri=thrift://metastore-node:9083
we want to know what are the risks , when the access from each of the presto workers isn't secured to hive metastore machine
as we understand presto worker/s are connect to hive meta-store by using thrift protocol and port 9083
but not clearly how presto-worker perform the authentication against the hive meta-store?
We'll appreciate to get more details about - how presto workers access to hive meta-store without hive secured and with hive secured
reference - https://docs.starburstdata.com/302-e/connector/hive-security.html
Hive metastore provides can be configured:
not to use authentication (trust user identity provided by the caller)
to use Kerberos authentication.
Both these modes are supported in Presto.
The basic mode (no auth) requires no additional configuration properties.
For the Kerberos authentication you need to set
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=...
hive.metastore.client.principal=...
hive.metastore.client.keytab=...
See full example & more at https://docs.starburstdata.com/latest/connector/hive-security.html#example-configuration-with-kerberos-authentication
If you need further help, you can get it on #troubleshooting channel on Trino (formerly Presto SQL) community slack.

How to use Hive Metastore standalone?

I installed and ran the Metastore Server standalone, without installing Hive. However, I cannot find any documentation about the thrift network API for communicating with the server. I need to be able to connect to the Metastore server directly or through HCatalog. Please advise.
There is a HCatalog Java client in hive-webhcat-java-client, which can be used in both client mode (which connect to hcatalog thrift server) and embed mode (which do all the things internally, connect to mysql directly).
HiveConf hiveConf = new HiveConf();
hiveConf.addResource("/Users/tzp/Documents/env/apache-hive-3.1.2-bin/conf/hive-site.xml");
//if you set this param, the client try to connect external hive metastore
hiveConf.set("metastore.thrift.uris", "thrift://localhost:9083");
HCatClient client = HCatClient.create(new Configuration(hiveConf));
List<String> dbNames = client.listDatabaseNamesByPattern("*");
System.out.println(dbNames);
I don't think Hive provide similar client in Python, but there is a third party lib hmsclient, do the same thing.
from hmsclient import hmsclient
client = hmsclient.HMSClient(host='localhost', port=9083)
with client as c:
c.check_for_named_partition('db', 'table', 'date=20180101')
HCatalog is functionally identical to Hive Metastore.
The JavaDoc for "Hive Metastore client" and its API (branch 1.x) is available at
https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.html
https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/metastore/api/package-summary.html
Now, good luck finding a tutorial or just code snippets...

what is difference between org.apache.hive.jdbc.HiveDriver and org.apache.hadoop.hive.jdbc.HiveDriver?

what is difference between org.apache.hive.jdbc.HiveDriver and org.apache.hadoop.hive.jdbc.HiveDriver ?
Which one to use to Write a JDBC Client to connect to hive ?
Hive 0.11 includes a new JDBC driver that works with HiveServer2, enabling users to write JDBC applications against Hive. The application needs to use the JDBC driver class and specify the network address and port in the connection URL in order to connect to Hive.
HiveServer2 (HIVE-2935), brings concurrency, authentication, and a foundation for authorization to Hive
HiveServer2 is an improved version of HiveServer that supports Kerberos authentication and multi-client concurrency and use the driver "org.apache.hive.jdbc.HiveDriver"
HiveServer1 or Thrift server cannot handle concurrent requests from more than one client. This is actually a limitation imposed by the Thrift interface that HiveServer exports, and can't be resolved by modifying the HiveServer code. The Driver which Hive Server "org.apache.hadoop.hive.jdbc.HiveDriver"
Please find the links which will help you to understand more.
org.apache.hadoop.hive.jdbc.HiveDriver and org.apache.hive.jdbc.HiveDriver
To work with, it depend on ur requirement which version you are having and how you have done Hive Configuration.

Hive Server 1 vs Hive Server 2

We have hive 0.10 version and we were wondering if we should be using Hive Server 1 or Hive Server2. Another question is to connect to Hive Server running on port 10000, using 3rd party tools do we need anything else?
Thanks,
I had the Hive 1 v 2 question and found the basics at:
http://www.slideshare.net/cwsteinbach/hiveserver2-for-apache-hive
HiveServer2 Thrift API spec
JDBC/ODBC HiveServer2 drivers
Concurrent Thrift clients with memory leak fixes and session/config
info
Kerberos authentication
Authorization to improve GRANT/ROLE and code injection vectors
I'm sure there's more given intervening development.
Hive Server 2 supports Rest API. Tools like beeline can be used to connect from any client outside of your cluster to the hive database. In a secured environment beeline(Hive Rest API client) will connect through knox gateway. Literally there can be multiple beeline connections possible to connect with Hive Server2. So, go with hiveserver2 for more secured and to have multiple connections
HiveServer2 is an improved version of HiveServer that supports a Thrift API tailored for JDBC and ODBC clients, Kerberos authentication, and multi-client concurrency. The CLI for HiveServer2 is Beeline.
Src: Cloudera
Kerberos (authentication) and Sentry (authorization).
Sentry security will working through HiveServer2 and HiveServer1 which is used by Hive CLI.
The CLI for HiveServer1 is HiveCLI.
The CLI for HiveServer2 is Beeline.