What permissions are required to run Hive Cli - hive

I'm seeing an issue with running the Hive CLI. When I run the CLI on an edge node I receive the following error regarding HDFS permissions:
c784gnj:~ # sudo hive
/usr/lib/hive/conf/hive-env.sh: line 5: /usr/lib/hive/lib/hive-hbase-handler-1.1.0-cdh5.5.2.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/lib/htrace-core4-4.0.1-incubating.jar,/usr/lib/hbase/lib/htrace-core-3.2.0-incubating.jar,/usr/lib/hbase/lib/htrace-core.jar,/usr/lib/hbase/hbase-hadoop2-compat.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-hadoop-compat.jar,/usr/lib/hbase/hbase-protocol.jar: No such file or directory
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
16/10/11 10:35:49 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.5.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=app1_K, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
What is hive trying to write to in the /user directory in HDFS?
I can already see that /user/hive is created:
drwxrwxr-t - hive hive 0 2015-03-16 22:17 /user/hive
As you can see I am behind kerberos auth on Hadoop.
Thanks in advance!

Log says you need to set permission on HDFS /user directory to user app1_K
Command
hadoop fs -setfacl -m -R user:app1_K:rwx /user
Execute this command as privileged user from Hadoop bin
If you get similar permission error on any other hdfs directory, then you have to grant permission on that directory.
Refer the below link for more information.
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Access_Control_Lists

Instead of disabling HDFS access privileges altogether, as suggested by #Kumar, you might simply create a HDFS home dir for every new user on the system, so that Hive/Spark/Pig/Sqoop jobs have a valid location to create temp files...
On a Kerberized cluster:
kinit hdfs#MY.REALM
hdfs dfs -mkdir /user/app1_k
hdfs dfs -chown app1_k:app1_k /user/app1_k
Otherwise:
export HADOOP_USER_NAME=hdfs
hdfs dfs -mkdir /user/app1_k
hdfs dfs -chown app1_k:app1_k /user/app1_k

Related

How to fix Exception while running locally spark-sql program on windows10 by enabling HiveSupport?

I am working on SPARK-SQL 2.3.1 and
I am trying to enable the hiveSupport for while creating a session as below
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "c://tmp//hive")
I ran below command
C:\Software\hadoop\hadoop-2.7.1\bin>winutils.exe chmod 777 C:\tmp\hive
While running my program getting:
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
How to fix this issue and run my local windows machine?
Try to use this command:
hadoop fs -chmod -R 777 /tmp/hive/
This is Spark Exception, not Windows. You need to set correct permissions for the HDFS folder, not only for your local directory.

Hive table loading: Unable to move source file

I begin learning BigData with Hadoop Hive
I can't upload local data to Hive table
Hive command is:
load data local inpath '/usr/local/nhanvien/testHive.txt' into table nhanvien;
I get error :
Loading data to table hivetest.nhanvien Failed with exception Unable
to move source file:/usr/local/nhanvien/testHive.txt to destination
hdfs://localhost:9000/user/hive/warehouse/hivetest.db/nhanvi‌​en/testHive_copy_3.t‌​xt
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
was try:
hadoop fs -chmod g+w /user/hive/warehouse
sudo chmod -R 777 /home/abc/employeedetails
it still get this error
can someone give me solution ?
You can try with:
export HADOOP_USER_NAME=hdfs
hive -e "load data local inpath '/usr/local/nhanvien/testHive.txt' into table nhanvien;"
Its a permission issue. Try giving permission to local file and directory where your file exists.
sudo chmod -R 777 /usr/local/nhanvien/testHive.txt
Then
Login as $HDFS_USER and run the following command:
hdfs dfs -chown -R $HIVE_USER:$HDFS_USER /user/hive
hdfs dfs -chmod -R 775 /user/hive
hdfs dfs -chmod -R 775 /user/hive/warehouse
You can also configure for hdfs-site.xml such as:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
This configure will disable permissions on HDFS. So, a regular user can do the operations on HDFS.
Hope this help.

Hadoop Metastore Will Not Initialize

preamble: i'm new to hadoop / hive. have installed standalone hadoop and now am trying to get hive to work. i keep getting an error about initializing the metastore and cannot seem to figure out how to resolve. (hadoop 2.7.2 and hive 2.0)
HADOOP_HOME AND HIVE_HOME ARE SET
ubuntu15-laptop: ~ $>echo $HADOOP_HOME
/usr/hadoop/hadoop-2.7.2
ubuntu15-laptop: ~ $>echo $HIVE_HOME
/usr/hive
hdfs is working
ubuntu15-laptop: ~ $>hadoop fs -ls /
Found 2 items
drwxrwxr-x - testuser supergroup 0 2016-04-13 21:37 /tmp
drwxrwxr-x - testuser supergroup 0 2016-04-13 21:38 /user
ubuntu15-laptop: ~ $>hadoop fs -ls /user
Found 1 items
drwxrwxr-x - testuser supergroup 0 2016-04-13 21:38 /user/hive
ubuntu15-laptop: ~ $>hadoop fs -ls /user/hive
Found 1 items
drwxrwxr-x - testuser supergroup 0 2016-04-13 21:38 /user/hive/warehouse
ubuntu15-laptop: ~ $>groups
testuser adm cdrom sudo dip plugdev lpadmin sambashare
hive is not working. says i need to initialize my metastore
ubuntu15-laptop: ~ $>hive
Logging initialized using configuration in
jar:file:/usr/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Exception in thread "main" java.lang.RuntimeException: Hive metastore database
is not initialized. Please use schematool (e.g. ./schematool -initSchema
-dbType ...) to create the schema. If needed, don't forget to include the
option to auto-create the underlying database in your JDBC connection string
(e.g. ?createDatabaseIfNotExist=true for mysql)
so i try to initialize it useing postgres - but schematool tries to use derby
ubuntu15-laptop: ~ $>schematool -initSchema -dbType postgres
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.postgres.sql
Error: Syntax error: Encountered "statement_timeout" at line 1, column 5.
(state=42X01,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization
FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***
so i change hive-site.xml to use postgres drivers etc. but because i don't
have the drivers installed, it fails
ubuntu15-laptop: ~ $>cp /usr/hive/conf/hive-site.xml.templ /usr/hive/conf/hive-site.xml
ubuntu15-laptop: ~ $>schematool -initSchema -dbType postgres
Metastore connection URL: jdbc:postgresql://localhost:5432/hivedb
Metastore Connection Driver : org.postgresql.Driver
Metastore connection User: 123456
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
*** schemaTool failed ***
so then i try to use derby
first move the hive-site.xml out of the way again so default is derby
ubuntu15-laptop: ~ $>mv /usr/hive/conf/hive-site.xml /usr/hive/conf/hive-site.xml.templ
then i try intializing again with derby but it appears to already be
initialized per the error "Error: FUNCTION 'NUCLEUS_ASCII' already exists"
ubuntu15-laptop: ~ $>schematool -initSchema -dbType derby
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.derby.sql
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization
FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***
I've been at this for two days. Any help would be very much appreciated.
So..
Here's what happened.
After installing hive, the first thing I did was run hive, which attempted to create/initialize the metastore_db, but apparently didn't get it right. On that initial run, I got this error:
Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)
Running hive, even though it failed, created a metastore_db directory in the directory from which I ran hive:
ubuntu15-laptop: ~ $>ls -l |grep meta
drwxrwxr-x 5 testuser testuser 4096 Apr 14 12:44 metastore_db
So when I then tried running
ubuntu15-laptop: ~ $>schematool -initSchema -dbType derby
The metastore already existed, but not in complete form.
Soooooo the answer is:
Before you run hive for the first time, run
schematool -initSchema -dbType derby
If you already ran hive and then tried to initSchema and it's failing:
mv metastore_db metastore_db.tmp
Re run
schematool -initSchema -dbType derby
Run hive again
**Also of note: if you change directories, the metastore_db created above won't be found! I'm sure there's a good reason for this that I don't know yet because I'm literally trying to use hive for the first time today. Ahhh here's information on this: metastore_db created wherever I run Hive

Cannot use apache flink in amazon emr

I can not a start a yarn session of Apache Flink in Amazons EMR. The error message I get is
$ tar xvfj flink-0.9.0-bin-hadoop26.tgz
$ cd flink-0.9.0
$ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
...
Diagnostics: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
java.io.FileNotFoundException: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
...
I am using Flink verision 0.9 and Amazons Hadoop version 4.0.0. Any ideas or hints?
The full log can be found here: https://gist.github.com/headmyshoulder/48279f06c1850c62c28c
From the log:
The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the sytem is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
Flink failed to read the Hadoop configuration files. They are either picked up from the environment variables, e.g. HADOOP_HOME, or you can set the configuration dir in the flink-conf.yaml before you execute your YARN command.
Flink needs to read the Hadoop configuration to know how to upload the Flink jar to the cluster file system such that the newly created YARN cluster can access it. If Flink fails to resolve the Hadoop configuration, it uses the local file system for uploading the jar. That means that the jar will be put on the machine you launch your cluster from. Thus, it won't be accessible from the Flink YARN cluster.
Please see the Flink configuration page for more information.
edit: On Amazong EMR, export HADOOP_CONF_DIR=/etc/hadoop/conf let's Flink discover the Hadoop configuration directory.
if i were you i would try with this:
./bin/yarn-session.sh -n 1 -jm 768 -tm 768

How to use Apache hive with fully distributed cluster

I am using hadoop 1.2.1 having 3 data nodes and one namenode. My hbase version is 0.94.14. I have configured apache hive 1.0 on name node machine.
I have to import hbase table data to hive. When I run a query, it gives following error in log file
ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - Cannot resolve the host name for /192.168.3.9 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '9.3.168.192.in-addr.arpa'
What is the problem in my setup. I have followed this tutorial for hadoop installation.
In hadoop namenode log file following warning appears when I run query in hive
WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Cannot roll edit log, edits.new files already exists in all healthy directories:
Is there any information needed for hive about how many datanode hadoop have?
Also my Hmaster is running on some other machine and I have configured hive at namnode machine/
Your hadoop, zookeeper, hbase and hive should be in running condition.
1) COPY THESE FILES TO THE HADOOP LIBRARY.
sudo cp /usr/lib/hive/lib/hive-common-0.7.0-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/
2)STOP HBASE AND HADOOP USING FOLLOWING COMMONDS
/usr/lib/hadoop/bin/stop-all.sh
/usr/lib/hbase/bin/stop-hbase.sh
3) RESTART HBASE AND HADOOP USING COMMONDS
/usr/lib/hadoop/bin/start-all.sh
/usr/lib/hadoop/bin/start-hbase.sh