impala shell command to export a parquet file as a csv - impala

I have some parquet files stored in HDFS that I want to convert to csv files FIRST and export them in a remote file using ssh.
I don't know if it's possible or simple by writing a spark job (I know that we can convert parquet to csv file JUST by using spark.read.parquet then to the same DF use spark.write as a csv file). But I really wanted to do it by using a impala shell request.
So, I thought about something like this :
hdfs dfs -cat my-file.parquet | ssh myserver.com 'cat > /path/to/my-file.csv'
Can you help me PLEASE with this request ? Please.
Thank you !!

Example without kerberos:
impala-shell -i servername:portname -B -q 'select * from table' -o filename '--output_delimiter=\001'
I could explain it all, but it is late and here is a link that allows you to do that as well as the header if you want: http://beginnershadoop.com/2019/10/02/impala-export-to-csv/

You can do that by multiples ways.
One approach could be as in the example below.
With impala-shell you can run a query and pipe to ssh to write the output in a remote machine.
$ impala-shell --quiet --delimited --print_header --output_delimiter=',' -q 'USE fun; SELECT * FROM games' | ssh remoteuser#ip.address.of.remote.machine "cat > /home/..../query.csv"
This command change from default database to a fun database and run a query on it.
You can change the --output_delimiter='\t', --print_header or not along with other options.

Related

How to execute multiple SQL files from a local directory in PostgreSQL/PgAdmin4 in on transaction in Windows?

I have a folder in a directory in my PC which contains multiple SQL files. Each of the file is a Postgres function. I want to execute every SQL file situated in the folder at a time in PostgreSQL server using PgAdmin or in other way. How can I accomplish this?
I apologize if I'm oversimplifying your question, but if the main issue is how to execute all SQL files without having to call them one by one, you just need to put them in a loop, e.g. in bash calling psql
#!/bin/bash
for f in *.sql
do
psql -h dbhost -d db -U dbuser -f $f
done
Or cat them and pipe the result to psql stdin:
$ cat /path/to/files/*.sql | psql -h dbhost -d db -U dbuser
And if you need them to run in a single transaction, consider merging the SQL files, e.g. using cat - this assumes all statements in your sql file are properly terminated:
$ cat /path/to/files/*.sql > merged.sql

Import dump file containing a database dump to dbeaver

I have a database dump in thisdb_2022.dump binary file that I'm trying to import to dbeaver, but I haven't found a way to import the database so I can see it.
I found the below in the dbeaver forum but when I try to follow the instructions and create a new connection I don't see any option I can select that will open this document.
https://dbeaver.io/forum/viewtopic.php?f=2&t=895
Edit: The database and version is PostgreSQL 12
. I'm not trying to
dump it to an existing db rather I want to create a new one with this
dump.
the dump command looks like this: pg_dump -h blah.amazonaws.com -Fc -v --dbname="blah2" -f "/tmp/dump/20220203.dump".
And it will be the same version PostgreSQL 12
The easiest way to not use DBeaver at all.
Do:
UPDATED with correct command.
--In psql
CREATE DATABASE new_db;
--Exit psql
--At command line
pg_restore -d new_db -h <the_host> -p <the_port> -U postgres /tmp/dump/20220203.dump
To work in Dbeaver directly see Backup/Restore.

Executing a SQL file in interactive impala-shell session

In an interactive impala-shell session, is there a way to load and execute a text file containing one or more SQL statements? In Hive's beeline, for example, you can use !run <filename> to run the SQL commands in that file.
This is not currently possible. You can file a JIRA.
I believe it is possible - see impala-shell -h (version v2.1.1-cdh5):
-f QUERY_FILE, --query_file=QUERY_FILE
Execute the queries in the query file, delimited by ;
[default: none]
combine this with shell command in interactive mode:
shell impala-shell -f file;

Getting results of a pig script on a remote cluster

Is there a way to get the results of a pig script run on a remote cluster directly without STORE-ing them and retrieving them separately?
So you can use a pig parameters to run your scripts. For example:
example.pig
A = LOAD '$PATH_TO_FOLDER_WITH_DATA' AS (f1:int, f2:int, f3:int);
--# Do Something With Your Data, and get output
C = STORE ouput INTO '$OUTPUT_PATH'
Then you can run the script like:
pig -p "/path/to/local/file" -p "/path/to/the/output" example.pig
So to automate in BASH:
storelocal.sh
#!/bin/bash
pig -p '$PATH_TO_FILES' -p '$PATH_TO_HDFS_OUT' example.pig
hdfs dfs -getmerge '$PATH_TO_HDFS_OUT' '$PATH_TO_LOCAL'
And you can run it ./storelocal.sh /path/to/local/file /path/to/the/local/output

In isql, is there a way for me to run several SQL statements from a file?

I have a file that contains several SQL queries.
Can I somehow run them via isql (I'm doing the calls from Bash script, so no access to Perl DBI or JDBC)
I tried piping them into isql command via echo /my/file | isql -my-other-parameters but that didn't work.
Yes.
If you're running ISQL in interactive mode, you can load an entire contents of the file using :r my-filename command from > prompt.
From Bash script, it's also possible to do - but you need to carefully make sure that
The SQL file you are piping in has a go statement at the end. That is a VERY common cause of issues like the one you mentioned.
That statement has a newline after it.
From a script, you can do it 2 ways: Pass on STDIN via a pipe/redirect; OR, pass in the file name via isql's -i parameter
In my case it was isq -n for a piped query to work.
isql -U $DB_USR -P $DB_PWD -S $DB_PATH -D $DB_NAME -w 500 < $FILE