Error in Hive Command Prompt while executing DESC FORMATTED Command - hive

I have a requirement where I need to get the CREATE TABLE statements for all the External table.
I used the script provided by other developers on StackOverflow and made changes for getting CREATE TABLE Statement
#Command to Execute sh ScriptName.sh <Hive_database_name>
#Create external table list for a schema
SCHEMA=$1
#define filenames
alltableslist=tables_$SCHEMA
exttablelist=ext_tables_$SCHEMA
#Get all tables Line 12
hive -S -e " set hive.cli.print.header=false; use $SCHEMA; show tables;" 1> $alltableslist
sed -i '/WARN:/d' $alltableslist
#For each table check its type:
for table in $(cat $alltableslist)
do
#Describe table- line no 20
describe=$(hive -S -e "use $SCHEMA; DESCRIBE FORMATTED $table")
#Get type
table_type=$(echo "${describe}" | egrep -o 'Table Type:[^,]+' | cut -f2)
# echo $table "-" $table_type
#Check table type, get create table statement
if [ $table_type == EXTERNAL_TABLE ]; then
echo Processing table $table ...
#get CREATE TABLE SCRIPT and save results
hive -e "use $1;show create table $SCHEMA.$table" >> $exttablelist
echo -e ";\n" >> $exttablelist
fi
done; #tables loop
sed -i '/WARN:/d' $exttablelist
The script working fine till line no 12, and I got the list of all the tables in the Hive Database.
But the script fails at line no 20, giving below error.
Logging initialized using configuration in file:/etc/hive/2.6.5.3016-3/0/hive-log4j.properties
OK
Time taken: 1.871 seconds
NoViableAltException(13#[1453:5: ( ( KW_DATABASE | KW_SCHEMA )=> ( KW_DATABASE | KW_SCHEMA ) ( KW_EXTENDED )? (dbName= identifier ) -> ^( TOK_DESCDATABASE $dbName ( KW_EXTENDED )? ) | ( KW_FUNCTION )=> KW_FUNCTION ( KW_EXTENDED )? (name= descFuncNames ) -> ^( TOK_DESCFUNCTION $name ( KW_EXTENDED )? ) | ( KW_FORMATTED | KW_EXTENDED | KW_PRETTY )=> ( (descOptions= KW_FORMATTED |descOptions= KW_EXTENDED |descOptions= KW_PRETTY ) parttype= partTypeExpr ) -> ^( TOK_DESCTABLE $parttype $descOptions) |parttype= partTypeExpr -> ^( TOK_DESCTABLE $parttype) )])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser.descStatement(HiveParser.java:18250)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4182)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1786)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1152)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:447)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1160)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:315)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:243)
at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
FAILED: ParseException line 1:15 cannot recognize input near 'FORMATTED' '_1xy_abcde00005' '<EOF>' in describe statement
I have total 75000 tables in the database, and around 1000 tables like _1xy_abcde00005
Can you please help?

Related

Add headers to a SQL (Sybase) output

I have created a a script that execute sql (Sybase)
#!/bin/bash
command=$(
isql -U databasename_dba -P password -b <<EOF!
select label1, label2 from TABLE
go
EOF!
)
echo "$command" >> output_file.csv):
All good so far, the file is produced:
But as you can see, the output is represented in 1 column.
Is possible to add "Headers" and divide the column is 2 columns, my desired output would be:
Try to remove -b.
#!/bin/bash
command=$(
isql -U databasename_dba -P password <<EOF!
select label1, label2 from TABLE
go
EOF!
)
echo "$command" >> output_file.csv):

Select in multiples tables PSQL without join

(Please correct my if i do some mistakes)
I have 2 tables:
ACTIVITIES : ID / NAME / CONTENT / DATE /ETC..
ARTICLES : ID / NAME /CONTENT /DATE /ETC..
I have created one script to delete image when it is not in the db , the problem is :
I don't know how to check the content of activity and article on the same request because this request bellow just delete my activities images..
#!/bin/bash
db="intranet_carc_development"
user="benjamin"
echo "DELETING UNUSED FILES AND IMAGES..."
for f in public/uploads/files/*
do
if [[ -f "$f" ]]
then
f="$(basename "$f")"
psql $db $user -t -v "ON_ERROR_STOP=1" \
-c "select content from public.articles where content like '%$f%'" | grep . \
&& echo "exist" \
|| rm public/uploads/files/$f
fi
done
printf "DONE\n\n"
If tied something like :
select content from public.articles, public.activities where content like '%$f%'"
but I have this log error:
ERROR: column reference "content" is ambiguous
You can try something like
WITH artcontent AS (
SELECT content
FROM public.articles
),
actcontent AS (
SELECT content
FROM public.activities
),
merge AS (
SELECT * FROM artcontent
UNION ALL
SELECT * FROM actcontent
)
SELECT *
FROM merge
The UNION ALL statement will put together your two results artcontent (which comes from articles) and actcontent (from activities).
Hope it will help you !

How to pass parameter into SQL file from UNIX script?

I'm looking to pass in a parameter into a SQL file from my UNIX script. Unfortunately having problems with it.
Please see UNIX script below:
#!/bin/ksh
############
# Functions
_usage() {
SCRIPT_NAME=XXX
-eq 1 -o "$1" = "" -o "$1" = help -o "$1" = Help -o "$1" = HELP ]; then
echo "Usage: $SCRIPT_NAME [ cCode ]"
echo " - For example : $SCRIPT_NAME GH\n"
exit 1
fi
}
_initialise() {
cCode=$1
echo $cCode
}
# Set Variables
_usage $#
_initialise $1
# Main Processing
sql $DBNAME < test.sql $cCode > $PVNUM_LOGFILE
RETCODE=$?
# Check for errors within log file
if [[ $RETCODE != 0 ]] || grep 'E_' $PVNUM_LOGFILE
then
echo "Error - 50 - running test.sql. Please see $PVNUM_LOGFILE"
exit 50
fi
Please see SQL script (test.sql):
SELECT DISTINCT v1.*
FROM data_latest v1
JOIN temp_table t
ON v1.number = t.id
WHERE v1.code = '&1'
The error I am receiving when running my UNIX script is:
INGRES TERMINAL MONITOR Copyright 2008 Ingres Corporation
E_US0022 Either the flag format or one of the flags is incorrect,
or the parameters are not in proper order.
Anyone have any idea what I'm doing wrong?
Thanks!
NOTE: While I don't work with the sql command, I do routinely pass UNIX parameters into SQL template/script files when using the isql command line tool, so fwiw ...
The first thing you'll want to do is replace the &1 string with the value in the cCode variable; one typical method is to use sed to do a global search and replace of &1 with ${cCode} , eg:
$ cCode=XYZ
$ sed "s/\&1/${cCode}/g" test.sql
SELECT DISTINCT v1.*
FROM data_latest v1
JOIN temp_table t
ON v1.number = t.id
WHERE v1.code = 'XYZ' <=== &1 replaced with XYZ
NOTE: You'll need to wrap the sed code in double quotes so that the value of the cCode variable can be referenced.
Now, to get this passed into sql there are a couple options ... capture the sed output to a new file and submit that file to sql or ... [and I'm guessing this is doable with sql], pipe the sed output into sql, eg:
sed "s/\&1/${cCode}/g" test.sql | sql $DBNAME > $PVNUM_LOGFILE
You may need '\p\g' around your SQL in the text file?
I personally tend to code in the SQL to the script itself, as in
#!/bin/ksh
var=01.01.2018
db=database_name
OUTLOG=/path/log.txt
sql $db <<_END_ > $OUTLOG
set autocommit on;
\p\g
set lockmode session where readlock = nolock;
\p\g
SELECT *
FROM table
WHERE date > '${var}' ;
\p\g
_END_
exit 0

SQL query over multiple tables in one database

enter code hereI have the following problem:
I have a bunch of Tables in a Vertica Database say:
+------------+
| Tablenames |
+------------+
| a_1 |
| a_2 |
| a_34 |
| b_1 |
| b_4 |
+------------+
The tables are not exactly the same but have mostly similar entries. And now I want to make one query over all tables that start with a_ (a_1 a_2 a_34).
Is there a way to search through all the tables for the string a_ in their name, output some sort of list and than either use a for loop or join operation with the generated list?
Once I get the new table (lets call it temp_table) that has all the table names that start with a_ I would like to run one query over all of them, something like that (Matlab syntax):
for ii=1:length(temp_table)
Data{ii}=SELECT * FROM temp_table(ii) WHERE paste_condition_here
end
So Data should be a new table that appends the new rows with each iteration.
#Nirjihar - there is on information_schema (you need v_catalog), you are confusing with MySQL.
select TABLE_NAME from v_catalog.TABLES where TABLE_NAME like 'a_%';
This will return all tables with a criteria of 'a_%'
Just as a complement !
In Vertica you won`t have loops ! For this you have to use UDP(user defined procedures), this can be written in the language of your choice (shell,java,R,C++).
i will go ahead and post on model here for you :
1 - Shell proc - to be created in the procedures folder
#!/bin/bash
. /home/dbadmin/.profile
/opt/vertica/bin/vsql -U $username -w $password -t -o /tmp/query.sql -c"
SELECT
' select * from '
||TABLE_SCHEMA
||'.'
||TABLE_NAME
||';'
FROM
v_catalog.TABLES where TABLE_NAME like '%$1%'
"
/opt/vertica/bin/vsql -U $username -w $password -F $'|' -At -o /tmp/query_output.csv -f /tmp/query.sql
2 - change sh file privs
chmod 4750 query_table.sh
3 - make sure you have the .profile file populated accordingly
. /home/dbadmin/.profile
#!/bin/bash
username=dbadmin
password=secrectpasswd
export username
export password
Note: this is to avoid passwd in text and only have one point of text passwd
4 - Register the UDP with Vertica Catalog
. /home/dbadmin/.profile
admintools -t install_procedure -f /vertica/catalog//procedures/query_table.sh -d -p $password
5 -Create the UDP inside the database
. /home/dbadmin/.profile
/opt/vertica/bin/vsql -U $username -w $password -c "CREATE PROCEDURE dba.query_table(table_name varchar) AS 'query_table.sh' LANGUAGE 'external' USER 'dbadmin';"
6 - execute the proc
select dba.query_table('you possible table name here');
7 - check results
a - you will get a file with the query
b - one file with the exported data(csv '|' delimited).
i have a similar post here:
http://www.aodba.com/create-vertica-schema-fly/
To get all the tables that start from a_:-
select TABLE_NAME from INFORMATION_SCHEMA.TABLES where TABLE_NAME like 'a_%'
Then you can alias this and join the list or each table as you like.

psql shortcut for frequently used queries? (like Unix "alias")

Is it possible to somehow create aliases (like Unix alias command) in psql?
I mean, not SQL FUNCTION, but local aliases to ease manual queries?
I don't know about any possibility. There is only workaround for psql based on psql variables, but there is lot of limits - using parameters for this queries is difficult.
postgres=# \set whoami 'SELECT CURRENT_USER;'
postgres=# :whoami
current_user
--------------
pavel
(1 row)
Pavel's answer is almost correct, except you can use parameter in another way.
after
\set s 'select * from '
\set l ' limit 10;'
The following command
:s agent :l
will equal to
select * from agent limit 10;
According to http://www.postgresql.org/docs/9.0/static/app-psql.html
If an unquoted argument begins with a colon (:), it is taken as a psql
variable and the value of the variable is used as the argument
instead. If the variable name is surrounded by single quotes (e.g.
:'var'), it will be escaped as an SQL literal and the result will be
used as the argument. If the variable name is surrounded by double
quotes, it will be escaped as an SQL identifier and the result will be
used as the argument.
You can also use backquote to run shell command
Arguments that are enclosed in backquotes (`) are taken as a command
line that is passed to the shell. The output of the command (with any
trailing newline removed) is taken as the argument value. The above
escape sequences also apply in backquotes.
how about using UDFs? You can create a UDF that returns a table (set of) then you can query it as this: select * from udf();
It is not as clean, but it is better than nothing and it is portable. And UDFs can take parameters too.
Why not use a view? May be views will help in your case.
This might help, if you need to run frequent queries from command line (not from psql cli).
Add this to .bash_profile /.bashrc
POSTGRES_BIN=~/Postgres/bin
B_RED='\033[1;31m'
RESET='\033[0m'
psqlcommand="$POSTGRES_BIN/psql -U vignesh usersdb -q -c"
function psqlselectrows()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "SELECT * from $1"
}
The above command selects rows from the table, passed in the argument.
Note:
Change the database name, as required.
The schema by default is public. To have another default schema, add the following line in ~/.psqlrc file.
SET SEARCH_PATH TO <schema_name>;
If the database is password protected, refer this and make use of the secure method.
I have made some commands for my use, if it might help.
psqlselectrows - To select rows from a table
psqlgettablecount - To get row count of a table
psqltruncatetable - To truncate a table, on prompt
psqlgettablesize - To get the size of a table
psqlgetvacuumdetails - To get vacuum details of a table
psqlsettings - To get default and modified settings configured for Postgres.
(All the above commands need table name as first argument)
#Colors
B_RED='\033[1;31m'
B_GREEN='\033[1;32m'
B_YELLOW='\033[1;33m'
RESET='\033[0m'
#Postgres Command With Params
psqlcommand="$POSTGRES_BIN/psql -U vignesh usersdb -q -c"
function psqlgettablesize()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "select pg_size_pretty(pg_total_relation_size('$1')) as total_table_size, pg_size_pretty(pg_relation_size('$1')) as table_size, pg_size_pretty(pg_indexes_size('$1')) as index_size;";
}
function psqlgettablecount()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "select count(*) from $1;"
}
function psqlgetvacuumdetails()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "SELECT relname, n_live_tup, n_dead_tup, last_analyze::timestamp, analyze_count, last_autoanalyze::timestamp, autoanalyze_count, last_vacuum::timestamp, vacuum_count, last_autovacuum::timestamp, autovacuum_count FROM pg_stat_user_tables where relname='$1' and schemaname = current_schema();"
}
function psqltruncatetable()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
{
read -p "$(echo -e ${B_YELLOW}"Are you sure to truncate table '$1' (y/n)? "${RESET})" choice
case "$choice" in
y|Y ) $psqlcommand "TRUNCATE $1;";;
n|N ) echo -e "${B_GREEN}Table '$1' not truncated${RESET}";;
* ) echo -e "${B_RED}Invalid option${RESET}";;
esac
}
}
function psqlsettings()
{
query="select * from pg_settings"
if [ "$1" != "" ]; then
query="$query where category like '%$1%'"
fi
query="$query ;"
$psqlcommand "$query"
if [ -z "$1" ]; then
echo -e "${B_YELLOW}Passing Category as first argument will filter the related settings.${RESET}"
fi
}
function psqlselectrows()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "SELECT * from $1"
}