I am facing some problems with hive partition creation where the permissions user has in hdfs are acl based.
1. I created a normal user in linux.
2. I gave him permissions recursively on a directory which is referred to by an external table.
(e.g. hdfs dfs -setfacl -R -m default:user:newUserName:rwx /apps/dbname/tblname)
I checked to see the permissions are recursively applied and I can read and write to the
directory even though I do not have any POSIX related permissions on the same. i.e. I only have the ACLS.
I logged in as newUserName and started hive and ran an "alter table add partition" command, where the location for the partition is /apps/dbname/tablename/somefolder
hive responds with the error: Authorization failed: java.security.AccessControlException: action WRITE not permitted on path hdfs://sandbox.hortonworks.com:8020/apps/dbname/tblname for user newUserName. Use SHOW GRANT to get more details.
What am I missing here. dfs.namenode.acls.enabled is true. I thought that was all that was required for ACLs to work. I am using hortonworks hdp2.1
Thanks
In current version of hive (0.13.x) Support for HDFS ACL (introduced in Apache Hadoop 2.4) is yet to be added.
Please find below jira which addresses the above issue:
https://issues.apache.org/jira/browse/HIVE-7714
Related
I am very new to hive and the hadoop ecosystem.
I am trying to create a new table in hive but I am encountering this error:
According to some suggestions, I have to set the Ranger policies but upon checking, policies already had permissions to "All"
Same permissions were also given to other policies.
Did I miss something? Thank you.
You might need HDFS user directory which can be created by the administrator using sudo -u hdfs hdfs dfs -mkdir /user/<user_id>
In case you want to check if there is one exists:
hdfs dfs -ls /user | grep <user_id>
I had a similar issue. You may want to check Ranger > Audit > Plugin Status to see if the policy is being applied. If it's not it may be that you have a jersey-client classpath conflict. More details here:
https://jonmorisissqlblog.blogspot.com/2021/06/ranger-hive-policy-activation-time.html
When I connect to a beeline database, I don't need to enter the user and password, I just press enter and I get access to the database.
but when I want to write onto the database I get a permission denied error:
Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access=READ, inode="/apps/hive/warehouse
I would know which default user should I use to connect to my database in order to add policies to it.
I thought that it's hive, but it seems it's not.
beeline takes a user account when you connect
beeline -u 'url' -n username
Use hdfs dfs -ls to find tables permissions (assuming you're not using Sentry or Ranger to manage permissions)
In any case, you don't use beeline to add permissions. By default, it's simply HDFS user/group permissions using chmod / chown, assuming you have ACLs enabled.
Try one of the below two.
Login with hadoop user
Give full permissions to /apps/hive/warehouse folder
I am planning to use Apache Ranger for authorization of my HDFS file system. I have a question on the capability of apache ranger plugin. Does HDFS plugin for Apache ranger offers more security features than just managing HDFS ACLs ? From the limited understanding that i gathered by looking into the presentations/blogs, I am unable to comprehend the functions of HDFS plugin for Apache Ranger.
..and now with the latest version of ApacheRanger it is possible to define "deny" rules.
Previously it was just possible to define rules which specifies additional "allow" privileges, on top of the underlying HDFS ACLs. Hence , if you had HDFS ACL for a directory set to "777", everybody can access it, independant from any Ranger-HDFS-policy on top of that ;)
Apache Ranger plugin for HDFS will provide user access auditing with below fields:
IP, Resource type, Timestamp, Access granted/denied.
Note that the Ranger plugin does not actually use HDFS ACLs. Ranger policies are added on top of standard HDFS permissions and HDFS ACLs.
You need to be aware that any access rights that are granted on these lower levels cannot be taken away by Ranger anymore.
Apart from that, Ranger gives you the same possibilities as ACLs, plus some more, like granting access by client IP range.
I have a cloudera cluster with HDFS and Hue services and I'm trying to unify the authentication using LDAP.
I have my LDAP server running thanks to 389-ds (not sure if is the best way) and I can log into Hue with users from the LDAP server. When I login for first time, Hue creates the home directory in the HDFS.
But is not using the UID I set when I added the user to the LDAP server.
It wouldn't be a problem if I just access the HDFS via Hue but I also have a machine with the HDFS mounted via NFS.
I'm also having problems to add LDAP authentication in the machine with the NFS mount. I can do su username (username being a user in the LDAP server) and the system adds a home directory, but I cannot authenticate via SSH using LDAP users. I need this to avoid adding local users too.
My main question is: How to force HDFS or Hue to use the same UID I set when I create LDAP users.
More details:
I have configured LDAP in cloudera for both Hue and Hadoop (not sure if the latter is using it properly)
I know I could, maybe, change the UID a posteriori to the one set by Hue at the first login, but is more a workaround than a clean solution.
Pictures:
In this example, potato user has an uid 10104, but if I do ls -la /users/potato in the NFS mount, it says that the folder belongs to a user with uid 3312528423.
I have a Hadoop cluster setup and working under a common default username "user1". I want to put files into hadoop from a remote machine which is not part of the hadoop cluster. I configured hadoop files on the remote machine in a way that when
hadoop dfs -put file1 ...
is called from the remote machine, it puts the file1 on the Hadoop cluster.
the only problem is that I am logged in as "user2" on the remote machine and that doesn't give me the result I expect. In fact, the above code can only be executed on the remote machine as:
hadoop dfs -put file1 /user/user2/testFolder
However, what I really want is to be able to store the file as:
hadoop dfs -put file1 /user/user1/testFolder
If I try to run the last code, hadoop throws error because of access permissions. Is there anyway that I can specify the username within hadoop dfs command?
I am looking for something like:
hadoop dfs -username user1 file1 /user/user1/testFolder
If you use the HADOOP_USER_NAME env variable you can tell HDFS which user name to operate with. Note that this only works if your cluster isn't using security features (e.g. Kerberos). For example:
HADOOP_USER_NAME=hdfs hadoop dfs -put ...
This may not matter to anybody, but I am using a small hack for this.
I'm exporting the HADOOP_USER_NAME in .bash_profile, so that every time I'm logging in, the user is set.
Just add the following line of code to .bash_profile:
export HADOOP_USER_NAME=<your hdfs user>
By default authentication and authorization is turned off in Hadoop. According to the Hadoop - The Definitive Guide (btw, nice book - would recommend to buy it)
The user identity that Hadoop uses for permissions in HDFS is determined by running
the whoami command on the client system. Similarly, the group names are derived from
the output of running groups.
So, you can create a new whoami command which returns the required username and put it in the PATH appropriately, so that the created whoami is found before the actual whoami which comes with Linux is found. Similarly, you can play with the groups command also.
This is a hack and won't work once the authentication and authorization has been turned on.
Shell/Command way:
Set HADOOP_USER_NAME variable , and execute the hdfs commands
export HADOOP_USER_NAME=manjunath
hdfs dfs -put <source> <destination>
Pythonic way:
import os
os.environ["HADOOP_USER_NAME"] = "manjunath"
There's another post with something similar to this that could provide a work around for you using streaming via ssh:
cat file.txt | ssh user1#clusternode "hadoop fs -put - /path/in/hdfs/file.txt"
See putting a remote file into hadoop without copying it to local disk for more information