How to use Apache hive with fully distributed cluster - apache

I am using hadoop 1.2.1 having 3 data nodes and one namenode. My hbase version is 0.94.14. I have configured apache hive 1.0 on name node machine.
I have to import hbase table data to hive. When I run a query, it gives following error in log file
ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - Cannot resolve the host name for /192.168.3.9 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '9.3.168.192.in-addr.arpa'
What is the problem in my setup. I have followed this tutorial for hadoop installation.
In hadoop namenode log file following warning appears when I run query in hive
WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Cannot roll edit log, edits.new files already exists in all healthy directories:
Is there any information needed for hive about how many datanode hadoop have?
Also my Hmaster is running on some other machine and I have configured hive at namnode machine/

Your hadoop, zookeeper, hbase and hive should be in running condition.
1) COPY THESE FILES TO THE HADOOP LIBRARY.
sudo cp /usr/lib/hive/lib/hive-common-0.7.0-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/
2)STOP HBASE AND HADOOP USING FOLLOWING COMMONDS
/usr/lib/hadoop/bin/stop-all.sh
/usr/lib/hbase/bin/stop-hbase.sh
3) RESTART HBASE AND HADOOP USING COMMONDS
/usr/lib/hadoop/bin/start-all.sh
/usr/lib/hadoop/bin/start-hbase.sh

Related

Where is s3-dist-cp of EMR 6.2.0?

I created an EMR Spark cluster with the following configuration:
Then I ssh into the master node, typed the command s3-dist-cp, then got the following error:
s3-dist-cp: command not found
I searched the whole disk but found nothing:
sudo find / -name "*s3-dist-cp*"
Where is the s3-dist-cp command? Thanks!
It turns out I must select "Hadoop", see the screenshot below:

after modified the config files in /etc/presto/conf, how to restart presto-server

In aws emr, after modified the config file in /etc/presto/conf, how can we restart presto-server? Just on master node or on all nodes?
On EMR you can restart Presto with
sudo stop presto
sudo start presto
You should do this on every node where you modified the config file. You should also update the config file on every node, as appropriate.

What permissions are required to run Hive Cli

I'm seeing an issue with running the Hive CLI. When I run the CLI on an edge node I receive the following error regarding HDFS permissions:
c784gnj:~ # sudo hive
/usr/lib/hive/conf/hive-env.sh: line 5: /usr/lib/hive/lib/hive-hbase-handler-1.1.0-cdh5.5.2.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/lib/htrace-core4-4.0.1-incubating.jar,/usr/lib/hbase/lib/htrace-core-3.2.0-incubating.jar,/usr/lib/hbase/lib/htrace-core.jar,/usr/lib/hbase/hbase-hadoop2-compat.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-hadoop-compat.jar,/usr/lib/hbase/hbase-protocol.jar: No such file or directory
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
16/10/11 10:35:49 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.5.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=app1_K, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
What is hive trying to write to in the /user directory in HDFS?
I can already see that /user/hive is created:
drwxrwxr-t - hive hive 0 2015-03-16 22:17 /user/hive
As you can see I am behind kerberos auth on Hadoop.
Thanks in advance!
Log says you need to set permission on HDFS /user directory to user app1_K
Command
hadoop fs -setfacl -m -R user:app1_K:rwx /user
Execute this command as privileged user from Hadoop bin
If you get similar permission error on any other hdfs directory, then you have to grant permission on that directory.
Refer the below link for more information.
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Access_Control_Lists
Instead of disabling HDFS access privileges altogether, as suggested by #Kumar, you might simply create a HDFS home dir for every new user on the system, so that Hive/Spark/Pig/Sqoop jobs have a valid location to create temp files...
On a Kerberized cluster:
kinit hdfs#MY.REALM
hdfs dfs -mkdir /user/app1_k
hdfs dfs -chown app1_k:app1_k /user/app1_k
Otherwise:
export HADOOP_USER_NAME=hdfs
hdfs dfs -mkdir /user/app1_k
hdfs dfs -chown app1_k:app1_k /user/app1_k

Cannot use apache flink in amazon emr

I can not a start a yarn session of Apache Flink in Amazons EMR. The error message I get is
$ tar xvfj flink-0.9.0-bin-hadoop26.tgz
$ cd flink-0.9.0
$ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
...
Diagnostics: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
java.io.FileNotFoundException: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
...
I am using Flink verision 0.9 and Amazons Hadoop version 4.0.0. Any ideas or hints?
The full log can be found here: https://gist.github.com/headmyshoulder/48279f06c1850c62c28c
From the log:
The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the sytem is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
Flink failed to read the Hadoop configuration files. They are either picked up from the environment variables, e.g. HADOOP_HOME, or you can set the configuration dir in the flink-conf.yaml before you execute your YARN command.
Flink needs to read the Hadoop configuration to know how to upload the Flink jar to the cluster file system such that the newly created YARN cluster can access it. If Flink fails to resolve the Hadoop configuration, it uses the local file system for uploading the jar. That means that the jar will be put on the machine you launch your cluster from. Thus, it won't be accessible from the Flink YARN cluster.
Please see the Flink configuration page for more information.
edit: On Amazong EMR, export HADOOP_CONF_DIR=/etc/hadoop/conf let's Flink discover the Hadoop configuration directory.
if i were you i would try with this:
./bin/yarn-session.sh -n 1 -jm 768 -tm 768

Hadoop + Hive - hcatalog won't startup

I just installed a single node Hadoop 2.2.0 cluster running on ubuntu.
I tried a couple of basic example calculations and it works fine.
I then tried to setup hive 0.12.0, that includes hcatalog.
I actually follow this tutorial.
And when I try to start hcatalog, I always get the following error :
bash $HIVE_HOME/hcatalog/sbin/hcat_server.sh start
dirname: missing operand
Try `dirname --help' for more information.
Started metastore server init, testing if initialized correctly...
/usr/local/hive/hcatalog/sbin/hcat_server.sh: line 91: /usr/local/hive-0.12.0/hcatalog/sbin/../var/log/hcat.out: No such file or directory
Metastore startup failed, see /usr/local/hive-0.12.0/hcatalog/sbin/../var/log/hcat.err
But there's no hcat.err file at all, I'm kind of blocked right now.
Any help would be much appreciated !
Thanks in advance,
Guillaume
I worked out that hcat was not executable in the hive installation I have downloaded.
S just sudo chmod A+X hcat and it works