Hadoop DFS permission issue when running job

Hadoop DFS permission issue when running job - permissions

I'm getting this following permission error, and am not sure why hadoop is trying to write to this particular folder:
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 2 100000
Number of Maps = 2
Samples per Map = 100000
Wrote input for Map #0
Wrote input for Map #1
Starting Job
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=myuser, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Any idea why it is trying to write to the root of my hdfs?
Update: After temporarily setting hdfs root (/) to be 777 permissions, I seen that a "/tmp" folder is being written. I suppose one option is to just create a "/tmp" folder with open permissions for all to write to, but it would be nice from a security standpoint if this is instead written to the user folder (i.e. /user/myuser/tmp)

I was able to get this working with the following setting:
<configuration>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
</property>
#...
</configuration>
Restart of jobtracker service required as well (special thanks to Jeff on Hadoop mailing list for helping me track down problem!)

1) Create the {mapred.system.dir}/mapred directory in hdfs using the following command
sudo -u hdfs hadoop fs -mkdir /hadoop/mapred/
2) Give permission to mapred user
sudo -u hdfs hadoop fs -chown mapred:hadoop /hadoop/mapred/

You can also make a new user named "hdfs". Quite simple solution but not as clean probably.
Of course this is when you are using Hue with Cloudera Hadoop Manager (CDH3)

You need to set the permission for hadoop root directory (/) instead of setting the permission for the system's root directory. Even I was confused, but then realized that the directory mentioned was of hadoop's file system and not the system's.

Related

Removing files from GCS: "gsutil -m rm" throws CommandException: files/objects could not be removed

gsutil -m rm gs://{our_bucket}/{dir}/{subdir}/*
...
Removing gs://our_bucket/dir/subdir/staging-000000000102.json...
Removing gs://our_bucket/dir/subdir/staging-000000000101.json...
CommandException: 103 files/objects could not be removed.
The command is able to find the directory with the 103 .JSON files, and "tries" removing them per the Removing gs://... being output. For what reason might we be receiving CommandException: 103 files/objects could not be removed.?
This works on my local machine
This works in our docker container run locally
This does not work in our docker container on the GCP compute engine where we need it to be working.
Perhaps this is a permissions issue with the compute engine not having permission to remove files in our GCS?
Edit: We have a service account JSON in the /config folder of our Airflow project, and that service account is shared to an IAM user with Storage Admin permission. Perhaps having the JSON in the /config folder is not sufficient for assigning permissions to the entire GCP compute engine? I am particularly confused because this server is able to query from our BQ database, and WRITE to GCS, but cannot delete from GCS...

The solution in this link - https://gist.github.com/ryderdamen/926518ddddd46dd4c8c2e4ef5167243d was exactly what we needed:
Stop the instance
Edit the settings
Remove gsutil cache

Save database on external hard drive

I am creating some databases using PostgreSQL but I want to save them on an external hard drive due to lack of memory in my computer.
How can I do this?

You can store the database on another disk by specifying it as the data_directory setting. You need to specify this at startup and it will apply to all databases.
You can put it in postgresql.conf:
data_directory = '/volume/path/'
Or, specify it on the command line when you start PostgreSQL:
postgres -c data_directory='/volume/path/'
Reference: 18.2. File Locations

STEP 1: If postgresql is running, stop it:
sudo systemctl stop postgresql
STEP 2: Get the path to access your hard drive.
(if Linux) Find and mount your hard drive by:
# Retrieve your device's name with:
sudo fdisk -l
# Then mount your device
sudo mount /dev/DEVICE_NAME YOUR_HD_DIR_PATH
STEP 3: Copy the existing database directory to the new location (in your hard drive) with rsync.
sudo rsync -av /var/lib/postgresql YOUR_HD_DIR_PATH
Then rename the previous postgres main dir with .bak extension to prevent conflicts
sudo mv /var/lib/postgresql/11/main /var/lib/postgresql/11/main.bak
Note: my postgres version was 11. Replace in path with your version.
STEP 4: Edit postgres configuration file:
sudo nano /etc/postgresql/11/main/postgresql.conf
Change the data_directory line with:
data_directory = 'YOUR_HD_DIR_PATH/postgresql/11/main'
STEP 5: Restart Postgres & Check everything is working
sudo systemctl start postgresql
pg_lsclusters
Output should shows status as 'online'
Ver Cluster Port Status Owner Data directory Log file
11 main 5432 online postgres YOUR_HD_DIR_PATH/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
Finally your can access your PostgresSQL with:
sudo -u postgres psql

You can try following the walkthrough here. This worked well for me and is similar to #Antiez's answer.
Currently, I am trying to do the same and the only conflict that I'm having at the moment is that it seems there is an issue with PostgreSQL's incremental backup and point-in-time recovery proccesses. I think it has something to do with folder permissions. If I try uploading a ~30MB csv to the postgres db, it will crash and the server will not start again because files cannot be written to the pg_wal directory. The only file in that directory is 000000010000000000000001 and does not move on to 000000010000000000000002 etc. while writing to a new table.
My stackoverflow post looking for a solution to this issue can be found here.

Apache oozie sharedlib is showing a blank list

Relatively new to Apache OOZIE and did an installation on Ubuntu 14.04, Hadoop 2.6.0, JDK 1.8. I was able to install oozie and the web console is visible at the 11000 port of my server.
Now while i copied the examples bundled with oozie and tried to run them i am running into an error which says no sharedlib exists.
Installed the sharedlib as below-
bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310
(my namenode is running on localhost 54310 and JT on localhost 54311)
hadoop fs -ls /user/hduser/share/lib is showing shared library created as per the oozie-site.xml file. However when i check the shared library using the command -
oozie admin -oozie http://localhost:11000/oozie -shareliblist the list is blank and also jobs are failing for the same reason.
Any clues on how should i approach this problem?
Thanks.

The sharelib create command looks fine.
If you havent done so already copy the core-site.xml from your hadoop installation folder into $OOZIE_HOME/conf/hadoop-conf/.
There might already be a "placeholder" core-site.xml in the hadoop-conf folder, delete or rename that one. Oozie doesnt get its hadoop configuration directly from your hadoop install (like hive for example) but from the core-site.xml you place in that hadoop-conf folder.

Okay i got a solution for this.
So when i was trying to create the sharedlib directory it was doing on HDFS but while running the job local path was being refereed. So i extracted the oozie-sharedlib tar.gz file in my local /user/hduser/share/lib directory and its working now.
But did not get the reason so its still an open question.

I have encountered the same issue and it turned out that
oozie was not able to communicate with hdfs, as it was not able to find the location for core-site.xml or any other hadoop configuration which has to be declared inside oozie-site.xml.
Corresponding property in oozie-site.xml is oozie.service.HadoopAccessorService.hadoop.configurations
this property was defined wrongly in my case.
changed it to point to where my Hadoop configuration xmls are present and then it started communicating with hdfs and hence was able to locate the sharelib on hdfs

load local data files into hive table failed when using hive

when i tried to load local data files into hive table,it report error while moving files.And i found the link,which give comments to fix this issue.I follow this step ,but it still can't work.
http://answers.mapr.com/questions/3565/getting-started-with-hive-load-the-data-from-sample-table txt-into-the-table-fails
After mkdir /user/hive/tmp,and set hive.exec.scratchdir= /user/hive/tmp,it still report RuntimeException Cannot make directory:file/user/hive/tmp/hive_2013* How can I fix this issue?Who are familiar with hive can help me?Thanks!
hive version is 0.10.0
hadoop version is 1.1.2

I suspect a permission issue here, because you are using MapR distribution.
Make sure that the user trying to create the directory has permissions to create the directory on CLDB.
Easy way to debug here is to do
$hadoop fs -chmod -R 777 /user/hive
and then try to load the data, to confirm if it's permission issue.

Setting permissions for cloudera hadoop

I installed coudera hadoop 4 on a cluster of about 20 nodes. Using cloudera manager it went really smooth and all, but when I want to create an input directory using hadoop fs -mkdir input I get the following error: mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x Looks like a classic wrong permissions case but I have no clue where to start to fix this.
I found this document which I think would solve my problem if I knew what to do with it. For starters I don't know whether I am using MapReduce v1 of v2 (I don't see any yarn service in my cloudera manager so my guess would be v1 (?)). Second, since the whole installation was automatic I don't know what is installed and where.
Could anyone point me towards some easy steps to solve my problem? I'm really looking for the easiest solution here, I don't care at all about security since it is only a test. If I could give all users all possible permissions that would be fine.

I solved my problem: In cloudera manager, go to hdfs configuration under advanced and put the following code in HDFS Service Configuration Safety Valve:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

Changing dfs.permission is always a solution but you can also try changing the user. In my system, the writing permission is assigned only to 'hdfs' user. The user can be changed by the following command:
su hdfs

hdfs1 -> Configuration -> View and Edit -> Uncheck "Check HDFS Persmissions" this worked thanks Shehaz

1.Do not modify dfs.permissions.Keep its value as true.
2.Add groups for a particular user if you required.( optional)
groupadd development
groupadd production
echo "Group production and development are created."
create user with existing groups and assign hdfs directory to use
useradd -g development clouddev3
sudo -u hdfs hadoop fs -mkdir -p /user/clouddev3
sudo -u hdfs hadoop fs -chown -R clouddev3:development /user/clouddev3
echo "User clouddev3 created and owns /user/clouddev3 directory in hdfs"
Now login with clouddev3 user and try,
hdfs dfs -ls /user/clouddev3
or hdfs dfs -ls

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hadoop DFS permission issue when running job - permissions

1) Create the {mapred.system.dir}/mapred directory in hdfs using the following command sudo -u hdfs hadoop fs -mkdir /hadoop/mapred/ 2) Give permission to mapred user sudo -u hdfs hadoop fs -chown mapred:hadoop /hadoop/mapred/

You can also make a new user named "hdfs". Quite simple solution but not as clean probably. Of course this is when you are using Hue with Cloudera Hadoop Manager (CDH3)

You need to set the permission for hadoop root directory (/) instead of setting the permission for the system's root directory. Even I was confused, but then realized that the directory mentioned was of hadoop's file system and not the system's.

Related

Removing files from GCS: "gsutil -m rm" throws CommandException: files/objects could not be removed

Save database on external hard drive

Apache oozie sharedlib is showing a blank list

load local data files into hive table failed when using hive

Setting permissions for cloudera hadoop

Categories

Resources