HDFS Plugin for Apache Ranger - apache

I am planning to use Apache Ranger for authorization of my HDFS file system. I have a question on the capability of apache ranger plugin. Does HDFS plugin for Apache ranger offers more security features than just managing HDFS ACLs ? From the limited understanding that i gathered by looking into the presentations/blogs, I am unable to comprehend the functions of HDFS plugin for Apache Ranger.

..and now with the latest version of ApacheRanger it is possible to define "deny" rules.
Previously it was just possible to define rules which specifies additional "allow" privileges, on top of the underlying HDFS ACLs. Hence , if you had HDFS ACL for a directory set to "777", everybody can access it, independant from any Ranger-HDFS-policy on top of that ;)

Apache Ranger plugin for HDFS will provide user access auditing with below fields:
IP, Resource type, Timestamp, Access granted/denied.

Note that the Ranger plugin does not actually use HDFS ACLs. Ranger policies are added on top of standard HDFS permissions and HDFS ACLs.
You need to be aware that any access rights that are granted on these lower levels cannot be taken away by Ranger anymore.
Apart from that, Ranger gives you the same possibilities as ACLs, plus some more, like granting access by client IP range.

Related

Tag based policies in Apache Ranger not working

I am new to Apache Ranger and the BigData field in general. I am working on an on-prem big data pipeline. I have configured resource based policies in Apache Ranger (ver 2.2.0) using ranger hive plugin (Hive ver 2.3.8) and they seem to be working fine. But I am having problems with tag based policies and would like someone to tell me where I am going wrong. I have configured a tag based policy in Ranger by doing the following -
1. Create a tag in Apache Atlas (eg. TAG_C1) on a hive column (column C1) (for this first
install Apache Atlas, Atlas Hook for Hive, then create tag in Atlas).
This seems to be working fine.
2. Install Atlas plugin in Apache Ranger.
3. Install RangerTagSync (but did not install Kafka).
4. Atlas Tag (TAG_C1) is being seen in Apache Ranger when I create Tag based masking policy in ranger.
5. But masking is not visible in hive which I access via beeline.
Is Kafka important for Tag based policies in Apache Ranger? What am I doing wrong in these steps?
Kafka is important for tagsync and for atlas too. Kafka is the one thats gonna notify rangertagsync about the tag assigments/changes in apache atlas.

HiveAccessControlException Permission Denied: user does not have [ALL] privilege

I am very new to hive and the hadoop ecosystem.
I am trying to create a new table in hive but I am encountering this error:
According to some suggestions, I have to set the Ranger policies but upon checking, policies already had permissions to "All"
Same permissions were also given to other policies.
Did I miss something? Thank you.
You might need HDFS user directory which can be created by the administrator using sudo -u hdfs hdfs dfs -mkdir /user/<user_id>
In case you want to check if there is one exists:
hdfs dfs -ls /user | grep <user_id>
I had a similar issue. You may want to check Ranger > Audit > Plugin Status to see if the policy is being applied. If it's not it may be that you have a jersey-client classpath conflict. More details here:
https://jonmorisissqlblog.blogspot.com/2021/06/ranger-hive-policy-activation-time.html

Apache Ranger multiple policy repo for Hive Plugin

I recently started working with Apache Ranger on HDP 2.2.6 and was trying to implement 2 active policy repos(Repo1 and Repo2) for the Ranger Hive Plugin. But I found that the policies from only Repo1 were being executed and all the policies from Repo2 were not (even if all the policies in repo1 were disabled).
Do I need to change some config property in Ranger to activate 2 or more repos at the same time?
Thanks!
Hive Plugin can work with only a single repo at a time. ranger.plugin.hive.service.name property in ranger-hive-security.xml (located under hive/hiveserver2 conf directory) corresponds to the repo/service name that plugin uses to fetch policies from ranger admin.
What is the use case that you are trying to address? Perhaps there is something in existing plugin/policy design that can help you achieve that with single service/repo.

How to force HDFS to use LDAP user's UID

I have a cloudera cluster with HDFS and Hue services and I'm trying to unify the authentication using LDAP.
I have my LDAP server running thanks to 389-ds (not sure if is the best way) and I can log into Hue with users from the LDAP server. When I login for first time, Hue creates the home directory in the HDFS.
But is not using the UID I set when I added the user to the LDAP server.
It wouldn't be a problem if I just access the HDFS via Hue but I also have a machine with the HDFS mounted via NFS.
I'm also having problems to add LDAP authentication in the machine with the NFS mount. I can do su username (username being a user in the LDAP server) and the system adds a home directory, but I cannot authenticate via SSH using LDAP users. I need this to avoid adding local users too.
My main question is: How to force HDFS or Hue to use the same UID I set when I create LDAP users.
More details:
I have configured LDAP in cloudera for both Hue and Hadoop (not sure if the latter is using it properly)
I know I could, maybe, change the UID a posteriori to the one set by Hue at the first login, but is more a workaround than a clean solution.
Pictures:
In this example, potato user has an uid 10104, but if I do ls -la /users/potato in the NFS mount, it says that the folder belongs to a user with uid 3312528423.

Hive add partition with ACL

I am facing some problems with hive partition creation where the permissions user has in hdfs are acl based.
1. I created a normal user in linux.
2. I gave him permissions recursively on a directory which is referred to by an external table.
(e.g. hdfs dfs -setfacl -R -m default:user:newUserName:rwx /apps/dbname/tblname)
I checked to see the permissions are recursively applied and I can read and write to the
directory even though I do not have any POSIX related permissions on the same. i.e. I only have the ACLS.
I logged in as newUserName and started hive and ran an "alter table add partition" command, where the location for the partition is /apps/dbname/tablename/somefolder
hive responds with the error: Authorization failed: java.security.AccessControlException: action WRITE not permitted on path hdfs://sandbox.hortonworks.com:8020/apps/dbname/tblname for user newUserName. Use SHOW GRANT to get more details.
What am I missing here. dfs.namenode.acls.enabled is true. I thought that was all that was required for ACLs to work. I am using hortonworks hdp2.1
Thanks
In current version of hive (0.13.x) Support for HDFS ACL (introduced in Apache Hadoop 2.4) is yet to be added.
Please find below jira which addresses the above issue:
https://issues.apache.org/jira/browse/HIVE-7714