Setting permissions for cloudera hadoop - permissions

I installed coudera hadoop 4 on a cluster of about 20 nodes. Using cloudera manager it went really smooth and all, but when I want to create an input directory using hadoop fs -mkdir input I get the following error: mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x Looks like a classic wrong permissions case but I have no clue where to start to fix this.
I found this document which I think would solve my problem if I knew what to do with it. For starters I don't know whether I am using MapReduce v1 of v2 (I don't see any yarn service in my cloudera manager so my guess would be v1 (?)). Second, since the whole installation was automatic I don't know what is installed and where.
Could anyone point me towards some easy steps to solve my problem? I'm really looking for the easiest solution here, I don't care at all about security since it is only a test. If I could give all users all possible permissions that would be fine.

I solved my problem: In cloudera manager, go to hdfs configuration under advanced and put the following code in HDFS Service Configuration Safety Valve:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

Changing dfs.permission is always a solution but you can also try changing the user. In my system, the writing permission is assigned only to 'hdfs' user. The user can be changed by the following command:
su hdfs

hdfs1 -> Configuration -> View and Edit -> Uncheck "Check HDFS Persmissions" this worked thanks Shehaz

1.Do not modify dfs.permissions.Keep its value as true.
2.Add groups for a particular user if you required.( optional)
groupadd development
groupadd production
echo "Group production and development are created."
create user with existing groups and assign hdfs directory to use
useradd -g development clouddev3
sudo -u hdfs hadoop fs -mkdir -p /user/clouddev3
sudo -u hdfs hadoop fs -chown -R clouddev3:development /user/clouddev3
echo "User clouddev3 created and owns /user/clouddev3 directory in hdfs"
Now login with clouddev3 user and try,
hdfs dfs -ls /user/clouddev3
or hdfs dfs -ls

Related

Automatically mounting S3 using s3fs on ubuntu 16

I am having an issue getting my s3 to automatically mount properly after restart. I am running an AWS ECS c5d using ubuntu 16.04. I able able to use s3fs to connect to my S3 drive manually using:
$s3fs -o uid=1000,umask=077,gid=1000 s3drive ~/localdata
Afterwards when I go into the folder I can see and change my S3 files. But when I try to set it up for automatically connecting I can't get it to work. I have tried adding the following to etc/fstab:
s3drive /home/ubuntu/localdata fuse.s3fs _netdev,passwd_file=/home/ubuntu/.passwd-s3fs, uid=1000,umask=077,gid=1000 0 0
It processes but when I go to the location and $ls -lah I see an odd entry for permissions (and I am denied permission to cd into it):
d????????? ? ? ? ? ? localdata
I get the same result when I start fresh and try adding to /etc/fstab:
s3fs#s3drive /home/ubuntu/localdata fuse _netdev,passwd_file=/home/ubuntu/.passwd-s3fs,uid=1000,umask=077,gid=1000 0 0
Lastly I tried added to /etc/re.local just above the exit 0 row either:
s3fs -o uid=1000,umask=077,gid=1000 s3drive ~/localdata
or
s3fs -o _netdev,uid=1000,umask=077,gid=1000 s3drive ~/localdata
When I reboot nothing seems to happen (i.e. no connection). But if I run it manually using:
$ sudo /etc/rc.local start
I get the same weird entry for my drive
d????????? ? ? ? ? ? localdata
Any ideas how to do this right? or what the ? ? ? permissions mean? I really hope this isn't a duplicate but i searched the existing answers and tried stuff for the whole afternoon.
Looks like permission problem.
Verify AWS keys in pass ~/.passwd-s3fs are correct, chmod is 600, and IAM user has correct permissions to that bucket.
You probably need a higher version of s3fs:
https://github.com/s3fs-fuse/s3fs-fuse/issues/1018
Either upgrade you ubuntu to 20.04
or host a docker container with ubuntu 20.04 (or some other distro), map you local folder to a folder inside container using volumes and setup s3fs inside that container using fstab.

Save database on external hard drive

I am creating some databases using PostgreSQL but I want to save them on an external hard drive due to lack of memory in my computer.
How can I do this?
You can store the database on another disk by specifying it as the data_directory setting. You need to specify this at startup and it will apply to all databases.
You can put it in postgresql.conf:
data_directory = '/volume/path/'
Or, specify it on the command line when you start PostgreSQL:
postgres -c data_directory='/volume/path/'
Reference: 18.2. File Locations
STEP 1: If postgresql is running, stop it:
sudo systemctl stop postgresql
STEP 2: Get the path to access your hard drive.
(if Linux) Find and mount your hard drive by:
# Retrieve your device's name with:
sudo fdisk -l
# Then mount your device
sudo mount /dev/DEVICE_NAME YOUR_HD_DIR_PATH
STEP 3: Copy the existing database directory to the new location (in your hard drive) with rsync.
sudo rsync -av /var/lib/postgresql YOUR_HD_DIR_PATH
Then rename the previous postgres main dir with .bak extension to prevent conflicts
sudo mv /var/lib/postgresql/11/main /var/lib/postgresql/11/main.bak
Note: my postgres version was 11. Replace in path with your version.
STEP 4: Edit postgres configuration file:
sudo nano /etc/postgresql/11/main/postgresql.conf
Change the data_directory line with:
data_directory = 'YOUR_HD_DIR_PATH/postgresql/11/main'
STEP 5: Restart Postgres & Check everything is working
sudo systemctl start postgresql
pg_lsclusters
Output should shows status as 'online'
Ver Cluster Port Status Owner Data directory Log file
11 main 5432 online postgres YOUR_HD_DIR_PATH/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
Finally your can access your PostgresSQL with:
sudo -u postgres psql
You can try following the walkthrough here. This worked well for me and is similar to #Antiez's answer.
Currently, I am trying to do the same and the only conflict that I'm having at the moment is that it seems there is an issue with PostgreSQL's incremental backup and point-in-time recovery proccesses. I think it has something to do with folder permissions. If I try uploading a ~30MB csv to the postgres db, it will crash and the server will not start again because files cannot be written to the pg_wal directory. The only file in that directory is 000000010000000000000001 and does not move on to 000000010000000000000002 etc. while writing to a new table.
My stackoverflow post looking for a solution to this issue can be found here.

Apache oozie sharedlib is showing a blank list

Relatively new to Apache OOZIE and did an installation on Ubuntu 14.04, Hadoop 2.6.0, JDK 1.8. I was able to install oozie and the web console is visible at the 11000 port of my server.
Now while i copied the examples bundled with oozie and tried to run them i am running into an error which says no sharedlib exists.
Installed the sharedlib as below-
bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310
(my namenode is running on localhost 54310 and JT on localhost 54311)
hadoop fs -ls /user/hduser/share/lib is showing shared library created as per the oozie-site.xml file. However when i check the shared library using the command -
oozie admin -oozie http://localhost:11000/oozie -shareliblist the list is blank and also jobs are failing for the same reason.
Any clues on how should i approach this problem?
Thanks.
The sharelib create command looks fine.
If you havent done so already copy the core-site.xml from your hadoop installation folder into $OOZIE_HOME/conf/hadoop-conf/.
There might already be a "placeholder" core-site.xml in the hadoop-conf folder, delete or rename that one. Oozie doesnt get its hadoop configuration directly from your hadoop install (like hive for example) but from the core-site.xml you place in that hadoop-conf folder.
Okay i got a solution for this.
So when i was trying to create the sharedlib directory it was doing on HDFS but while running the job local path was being refereed. So i extracted the oozie-sharedlib tar.gz file in my local /user/hduser/share/lib directory and its working now.
But did not get the reason so its still an open question.
I have encountered the same issue and it turned out that
oozie was not able to communicate with hdfs, as it was not able to find the location for core-site.xml or any other hadoop configuration which has to be declared inside oozie-site.xml.
Corresponding property in oozie-site.xml is oozie.service.HadoopAccessorService.hadoop.configurations
this property was defined wrongly in my case.
changed it to point to where my Hadoop configuration xmls are present and then it started communicating with hdfs and hence was able to locate the sharelib on hdfs

load local data files into hive table failed when using hive

when i tried to load local data files into hive table,it report error while moving files.And i found the link,which give comments to fix this issue.I follow this step ,but it still can't work.
http://answers.mapr.com/questions/3565/getting-started-with-hive-load-the-data-from-sample-table txt-into-the-table-fails
After mkdir /user/hive/tmp,and set hive.exec.scratchdir= /user/hive/tmp,it still report RuntimeException Cannot make directory:file/user/hive/tmp/hive_2013* How can I fix this issue?Who are familiar with hive can help me?Thanks!
hive version is 0.10.0
hadoop version is 1.1.2
I suspect a permission issue here, because you are using MapR distribution.
Make sure that the user trying to create the directory has permissions to create the directory on CLDB.
Easy way to debug here is to do
$hadoop fs -chmod -R 777 /user/hive
and then try to load the data, to confirm if it's permission issue.

Hadoop DFS permission issue when running job

I'm getting this following permission error, and am not sure why hadoop is trying to write to this particular folder:
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 2 100000
Number of Maps = 2
Samples per Map = 100000
Wrote input for Map #0
Wrote input for Map #1
Starting Job
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=myuser, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Any idea why it is trying to write to the root of my hdfs?
Update: After temporarily setting hdfs root (/) to be 777 permissions, I seen that a "/tmp" folder is being written. I suppose one option is to just create a "/tmp" folder with open permissions for all to write to, but it would be nice from a security standpoint if this is instead written to the user folder (i.e. /user/myuser/tmp)
I was able to get this working with the following setting:
<configuration>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
</property>
#...
</configuration>
Restart of jobtracker service required as well (special thanks to Jeff on Hadoop mailing list for helping me track down problem!)
1) Create the {mapred.system.dir}/mapred directory in hdfs using the following command
sudo -u hdfs hadoop fs -mkdir /hadoop/mapred/
2) Give permission to mapred user
sudo -u hdfs hadoop fs -chown mapred:hadoop /hadoop/mapred/
You can also make a new user named "hdfs". Quite simple solution but not as clean probably.
Of course this is when you are using Hue with Cloudera Hadoop Manager (CDH3)
You need to set the permission for hadoop root directory (/) instead of setting the permission for the system's root directory. Even I was confused, but then realized that the directory mentioned was of hadoop's file system and not the system's.