setting umask for hive client - hive

How can I set the umask for an Hive HQL script? Either via statements within the script or via a client side configuration set before running the script? I want to make the change on the client side without changing the server side configuration.
I've found that this works from a shell prompt, but I'd like to do it from inside a hive script.
$ hdfs dfs -Dfs.permissions.umask-mode=000 -mkdir /user/jeff/foo
$ hdfs dfs -Dfs.permissions.umask-mode=000 -put bar /user/jeff/foo
These tries don't work:
hive> dfs -mkdir -Dfs.permissions.umask-mode=000 /user/jeff/foo;
-mkdir: Illegal option -Dfs.permissions.umask-mode=000
hive> dfs -Dfs.permissions.umask-mode=000 -mkdir /user/jeff/foo;
-Dfs.permissions.umask-mode=000: Unknown command
Setting hive.files.umask.value in .hiverc doesn't have the desired effect (The g+w and o+w bits aren't set which was what I was trying to do with this umask.):
hive> set hive.files.umask.value;
hive.files.umask.value=000
hive> dfs -mkdir /user/jeff/foo;
hive> dfs -ls -d /user/jeff/foo;
drwxr-xr-x - jeff hadoop 0 2016-02-23 15:19 /user/jeff/foo
It looks like I'll need to sprinkle a bunch of "dfs -chmod 777 ..." statements in my HQL script.
Ideas??

Related

impala shell command to export a parquet file as a csv

I have some parquet files stored in HDFS that I want to convert to csv files FIRST and export them in a remote file using ssh.
I don't know if it's possible or simple by writing a spark job (I know that we can convert parquet to csv file JUST by using spark.read.parquet then to the same DF use spark.write as a csv file). But I really wanted to do it by using a impala shell request.
So, I thought about something like this :
hdfs dfs -cat my-file.parquet | ssh myserver.com 'cat > /path/to/my-file.csv'
Can you help me PLEASE with this request ? Please.
Thank you !!
Example without kerberos:
impala-shell -i servername:portname -B -q 'select * from table' -o filename '--output_delimiter=\001'
I could explain it all, but it is late and here is a link that allows you to do that as well as the header if you want: http://beginnershadoop.com/2019/10/02/impala-export-to-csv/
You can do that by multiples ways.
One approach could be as in the example below.
With impala-shell you can run a query and pipe to ssh to write the output in a remote machine.
$ impala-shell --quiet --delimited --print_header --output_delimiter=',' -q 'USE fun; SELECT * FROM games' | ssh remoteuser#ip.address.of.remote.machine "cat > /home/..../query.csv"
This command change from default database to a fun database and run a query on it.
You can change the --output_delimiter='\t', --print_header or not along with other options.

Hive table loading: Unable to move source file

I begin learning BigData with Hadoop Hive
I can't upload local data to Hive table
Hive command is:
load data local inpath '/usr/local/nhanvien/testHive.txt' into table nhanvien;
I get error :
Loading data to table hivetest.nhanvien Failed with exception Unable
to move source file:/usr/local/nhanvien/testHive.txt to destination
hdfs://localhost:9000/user/hive/warehouse/hivetest.db/nhanvi‌​en/testHive_copy_3.t‌​xt
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
was try:
hadoop fs -chmod g+w /user/hive/warehouse
sudo chmod -R 777 /home/abc/employeedetails
it still get this error
can someone give me solution ?
You can try with:
export HADOOP_USER_NAME=hdfs
hive -e "load data local inpath '/usr/local/nhanvien/testHive.txt' into table nhanvien;"
Its a permission issue. Try giving permission to local file and directory where your file exists.
sudo chmod -R 777 /usr/local/nhanvien/testHive.txt
Then
Login as $HDFS_USER and run the following command:
hdfs dfs -chown -R $HIVE_USER:$HDFS_USER /user/hive
hdfs dfs -chmod -R 775 /user/hive
hdfs dfs -chmod -R 775 /user/hive/warehouse
You can also configure for hdfs-site.xml such as:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
This configure will disable permissions on HDFS. So, a regular user can do the operations on HDFS.
Hope this help.

Getting results of a pig script on a remote cluster

Is there a way to get the results of a pig script run on a remote cluster directly without STORE-ing them and retrieving them separately?
So you can use a pig parameters to run your scripts. For example:
example.pig
A = LOAD '$PATH_TO_FOLDER_WITH_DATA' AS (f1:int, f2:int, f3:int);
--# Do Something With Your Data, and get output
C = STORE ouput INTO '$OUTPUT_PATH'
Then you can run the script like:
pig -p "/path/to/local/file" -p "/path/to/the/output" example.pig
So to automate in BASH:
storelocal.sh
#!/bin/bash
pig -p '$PATH_TO_FILES' -p '$PATH_TO_HDFS_OUT' example.pig
hdfs dfs -getmerge '$PATH_TO_HDFS_OUT' '$PATH_TO_LOCAL'
And you can run it ./storelocal.sh /path/to/local/file /path/to/the/local/output

Need Help for Setup Apache Hadoop on Apache Mesos

I'm trying to setup hadoop on mesos using the document below:
https://docs.mesosphere.com/tutorials/run-hadoop-on-mesos/
I'm facing a problem on step-9
sudo -u mapred ./hadoop-2.0.0-mr1-cdh4.2.1/bin/hadoop dfs -rm -f /hadoop-2.0.0-mr1-cdh4.2.1.tgz
sudo -u mapred /usr/bin/hadoop dfs -copyFromLocal ./hadoop-2.0.0-mr1-cdh4.2.1.tgz /
I am still new to this concept. I have to configure a mesos cluster using this tutorial:
https://www.digitalocean.com/community/tutorials/how-to-configure-a-production-ready-mesosphere-cluster-on-ubuntu-14-04
Now I'm getting errors while performing dfs commands:
root#station1:~# sudo -u mapred ./hadoop-2.0.0-mr1-cdh4.2.1/bin/hadoop dfs -rm -f /hadoop-2.0.0-mr1-cdh4.2.1.tgz
-rm: Expected authority at index 7: hdfs://
Usage: hadoop fs [generic options] -rm [-f] [-r|-R] [-skipTrash] <src> ...
This tutorial assumes you have HDFS already installed on your cluster. You can do this by manually installing HDFS on each node, or you can try out the new HDFS framework: https://github.com/mesosphere/hdfs
Does hadoop fs -ls hdfs:// work on its own? If not, you'll need to install and configure HDFS appropriately.

Using beeline to compile ddl objects from .hql file

We have couple of hql files for compiling ddls.
in hive we used the following command from bash :
hive -v -f abc.hql
but, in beeline this doesn't work from bash. Any idea what can be the equivalent command for beeline.
Make sure your hiveserver2 is up & running on some port
In beeline
beeline -u "jdbc:hive2://localhost:port/database_name/" -f abc.hql
Refer this doc for more commands
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
Refer this doc if you have not yet configured hiveserver2
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2