dump data from spark-sql cli query to csv file - apache-spark-sql

How to write query result to a csv file while using spark-sql cli. below is the spark-sql code that i am using.
spark-sql --conf "spark.driver.extraJavaOptions=${log4j_setting}" --conf "spark.executor.extraJavaOptions=${log4j_setting}" --files log4j.properties -e "SELECT * from table;"

Related

impala shell command to export a parquet file as a csv

I have some parquet files stored in HDFS that I want to convert to csv files FIRST and export them in a remote file using ssh.
I don't know if it's possible or simple by writing a spark job (I know that we can convert parquet to csv file JUST by using spark.read.parquet then to the same DF use spark.write as a csv file). But I really wanted to do it by using a impala shell request.
So, I thought about something like this :
hdfs dfs -cat my-file.parquet | ssh myserver.com 'cat > /path/to/my-file.csv'
Can you help me PLEASE with this request ? Please.
Thank you !!
Example without kerberos:
impala-shell -i servername:portname -B -q 'select * from table' -o filename '--output_delimiter=\001'
I could explain it all, but it is late and here is a link that allows you to do that as well as the header if you want: http://beginnershadoop.com/2019/10/02/impala-export-to-csv/
You can do that by multiples ways.
One approach could be as in the example below.
With impala-shell you can run a query and pipe to ssh to write the output in a remote machine.
$ impala-shell --quiet --delimited --print_header --output_delimiter=',' -q 'USE fun; SELECT * FROM games' | ssh remoteuser#ip.address.of.remote.machine "cat > /home/..../query.csv"
This command change from default database to a fun database and run a query on it.
You can change the --output_delimiter='\t', --print_header or not along with other options.

How to run hive commands from shell?

I have to repair tables in hive from my shell script after successful completion of my spark application.
msck repair table <DATABASE_NAME>.<TABLE_NAME>;
Please suggest me a suitable approach for this which also works for large tables with partitions.
I have found a workaround for this using :
hive -S -e "msck repair table <DATABASE_NAME>.<TABLE_NAME>;"
-S : This silents the output generated from Hive.
-e : This is used for running hive command.
-f : This is used for providing a hql script.

I am unable to execute a hive script

I want to know what is the command to execute a hive Script
Complete the Code to execute the hive script ./custexport.hql
Scripts hive>
If you are using hive then
hive -f 'your hql file'
if you are using beeline then also you can use -f option with complete beeline command.

Executing a SQL file in interactive impala-shell session

In an interactive impala-shell session, is there a way to load and execute a text file containing one or more SQL statements? In Hive's beeline, for example, you can use !run <filename> to run the SQL commands in that file.
This is not currently possible. You can file a JIRA.
I believe it is possible - see impala-shell -h (version v2.1.1-cdh5):
-f QUERY_FILE, --query_file=QUERY_FILE
Execute the queries in the query file, delimited by ;
[default: none]
combine this with shell command in interactive mode:
shell impala-shell -f file;

Using beeline to compile ddl objects from .hql file

We have couple of hql files for compiling ddls.
in hive we used the following command from bash :
hive -v -f abc.hql
but, in beeline this doesn't work from bash. Any idea what can be the equivalent command for beeline.
Make sure your hiveserver2 is up & running on some port
In beeline
beeline -u "jdbc:hive2://localhost:port/database_name/" -f abc.hql
Refer this doc for more commands
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
Refer this doc if you have not yet configured hiveserver2
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2