Airflow could not find beeline - hive

I have an airflow dag that uses beeline to execute queries. Everything was working well,then system has to reboot due to blackout. After that airflow couldn't connect to beeline. The error says
[Errno 2] No such file or directory: 'beeline': 'beeline'.
But beeline is installed in the same server where airflow runs. What could be the reason. Can anybody help me for this.
I can execute same beeline -u command outside airflow in command line. And it could connect to beeline.

Related

SSH into Hadoop cluster using Paramiko, and then executing dependent commands

I am implementing a python script, which uses paramiko to connect to a hadoop cluster. My problem is that I can SSH to a root user only, and from inside I have to switch user to hdfs to execute my command.
now I need something to automate this switching to HDFS user and then cding into /tmp/ and then executing command from there. I have tried invoke_shell() , it hangs, and also the && inside the exec_command, it also doesnt work.
I am getting a permission denied exception:
java.io.FileNotFoundException: file.txt (Permission denied)
There are two workflows that I have thought of:
1st one:
1. sudo -u hdfs -s
2. cd /tmp/
3. <execute the command> <outputDir>
2nd one:
sudo -u hdfs <execution command> /tmp/<outputDir>
The first one doesnt give the above error. But the second one throws this. I was trying second one just to avoid the dependent command issue.
Any help or suggestions will be appreciated.

How to setup beeline to use variable in connection string

Currently in our dev environment we have hardcoded the beeline connect string to something like
beeline -u 'jdbc:hive2://zk0-hi-clu.3qy32mhqlj1ubaea5iyw5joamf.ax.internal.cloudapp.net:2181,zk1-hi-clu.3qy32mhqlj1ubaea5iyw5joamf.ax.internal.cloudapp.net:2181,zk6-hi-clu.3qy32mhqlj1ubaea5iyw5joamf.ax.internal.cloudapp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' --hivevar date=20190101 -f test.hql
I am trying to see if there are ways we can make the connect dynamic like it will look up a config file like odbc.ini. so when we promote the code to other environment, it will automatically connect to the correct target. Is this possible?
Not exactly your case: I need some defaults in my shell and use the alias functionality of the bash.
export BEELINE_CONNECTION_STRING='jdbc:hive2://myzookeeper1:2181,myzookeeper2:2181,myzookeeper3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;'
alias beeline='beeline -u ${BEELINE_CONNECTION_STRING}'
After this typing beeline causes this:
beeline
Connecting to jdbc:hive2://myzookeeper1:2181,myzookeeper2:2181,myzookeeper3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;
Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292)
Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.5.0-292 by Apache Hive
0: jdbc:hive2://myhive>

How to fix Exception while running locally spark-sql program on windows10 by enabling HiveSupport?

I am working on SPARK-SQL 2.3.1 and
I am trying to enable the hiveSupport for while creating a session as below
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "c://tmp//hive")
I ran below command
C:\Software\hadoop\hadoop-2.7.1\bin>winutils.exe chmod 777 C:\tmp\hive
While running my program getting:
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
How to fix this issue and run my local windows machine?
Try to use this command:
hadoop fs -chmod -R 777 /tmp/hive/
This is Spark Exception, not Windows. You need to set correct permissions for the HDFS folder, not only for your local directory.

How to run hive server 2 as a background, don't get terminated on closing terminal

My hive server 2 command runs properly on (aws)ubuntu terminal:
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10002 --hiveconf hive.root.logger=LOG,console
but when i closed the terminal my hive server stop,
i want a command to solve this problem thanks.
At last after searching so much i found this command :-
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10002 2 & > /dev/null
After running this process you will get an process id save it some where, it will be needed when you need to kill the same process , i am doing like this :)

Using beeline to compile ddl objects from .hql file

We have couple of hql files for compiling ddls.
in hive we used the following command from bash :
hive -v -f abc.hql
but, in beeline this doesn't work from bash. Any idea what can be the equivalent command for beeline.
Make sure your hiveserver2 is up & running on some port
In beeline
beeline -u "jdbc:hive2://localhost:port/database_name/" -f abc.hql
Refer this doc for more commands
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
Refer this doc if you have not yet configured hiveserver2
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2