Issue with subqueries in Hive - sql

I am trying to run a subquery within Hive in bash. But the issue is that the compiler is saying that it cannot recognize the subquery within the query. any ideas?
#!/bin/bash
echo "Hello world"
#####################################################################
#This line will connect to the database and execute the query in Hive
####################################################################
var1=$(beeline --showHeader=false --outputformat=tsv2 -u "jdbc:hive2:XXXXXXXXX" <<EOF
select $2 from $3.$1 where length($2)=(select max(length($2)) from $3.$1) limit 1;
EOF
)
#####################################################################
#This will output the result of the query
####################################################################
echo "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
echo "We are currently analyzing Table:$1 and Column:$2"
echo "The value wth a maximum length for $1 is $var1"
echo "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

I am afraid that your query will not work in HIVE because of the way you are doing the subquery. You will have to rewrite the query.
Try the next code to get the $2 that has the maximum length:
select $2 from (select max(length($2)) as length_2, $2 from $3.$1 group by $2 order by length_2 desc) a limit 1;
Also,you can execute the query using -e option, as #mazaneicha has mentioned.

Related

How to Compare Two SQL Files in Shell Script

I have two SQL files A.sql and B.sql. My requirement is I have to compare A.sql and B.sql and I need to check query which are present in A.sql is present in B.sql or not, if it is not there in B.sql then we need to move Query from A.sql to newfile.sql which is not present in B.sql
Below is the Example
A.sql
Select * from emp;
Select * from dept;
Select * from student;
Select * from subject;
B.sql
Select * from emp;
Select * from dept;
Output Excepted
Select * from student;
Select * from subject;
Output what I am getting
Select * from dept;
Select * from student;
Select * from subject;
Below is my script
while read -rd ';' i_sql
do
flag=0
while read -rd ';' e_sql
do
if [ "$i_sql" != "$e_sql" ];
then
flag=0
else
flag=1
break
fi
done < B.sql
if [ !$flag ]
then
echo "$i_sql">>newfile.sql
fi
done < A.sql
Reading the sql query upto semicolon from A.sql and storing it in i_sql
while read -rd ';' i_sql
Reading the sql query upto semicolon from B.sql and storing it in e_sql
while read -rd ';' e_sql
Below i am comparing the i_sql and e_sql if it is equal i am going to else part using break so that it
should not compare with other statements.If it is not equal i am setting flag=0, later i am moving the
query which is not present in B.sql to newfile.sql outside the inner while loop.
if [ "$i_sql" != "$e_sql" ];
then
flag=0
else
flag=1
break
fi
done < B.sql
Below i am moving the Sql query to newfile.sql which is not there in B.sql and which is present in A.sql.
if [ !$flag ]
then
echo "$i_sql">>newfile.sql
fi
done < A.sql
Can anyone please help with the above issue and let me know what is wrong.
Note : My one SQL query doesn't occupies single line, it will be more than 4-5 lines. Just for an example I have used a single line query.
Since my one SQL query is having more than 4-5 lines, so I am reading SQL query in while loop upto semicolon and storing it in a variable and then I am using the variable for comparison.
Thanks in advance!!!
I assume that in your input files one query occupies exactly one line. You did not say this explicitly, but your example suggests it. In this case, you could interpret B.sql a list of literal pattern and ask grep, which of these pattern do not occur in A.sql :
grep -F -f B.sql -v A.sql
-F says literal pattern, -f tells grep where to look for the pattern, and -v says to report lines where none of the pattern matches.
Your logic seems correct. But you need to take care of more details like casing difference, difference in white spaces between the words (Let's say same query has a space before semicolon in one file and no space in the other.)
The reason why 'Select * from dept;' appears in the result may be some whitespace difference.
As suggested in the comments, it is better to use a diff tool command line instead of writing the logic yourself. You can explore diff / vimdiff / git diff ...
This can be achieved through awk:
awk 'FNR==NR { map[$0]=1 } FNR!=NR && map[$0]!=1 { print $0>> "newfile.sql";close("newfile.sql") } FNR!=NR && map[$0]==1 { print }' B.sql A.sql > A.sql.tmp
Process B.sql first (NR==FNR). Create a array indexed with the entries. When we process A.sql (FNR!=NR) and there is not an entry in the map array. we print the line to a newfile.sql file. Otherwise we print to screen.
You can then commit the output on screen back to the A.sql file:
awk 'FNR==NR { map[$0]=1 } FNR!=NR && map[$0]!=1 { print $0>> "newfile.sql";close("newfile.sql") } FNR!=NR && map[$0]==1 { print }' B.sql A.sql > A.sql.tmp > A.sql.tmp && mv -f A.sql.tmp A.sql
The problem is on this line :
if [ !$flag ]
which always yields true beacuse !1 and !0 are non-empty strings.
What you need is :
if [ $flag = 0 ]

How to Compare Two Arrays in Shell Script

I have two SQL files A.sql and B.sql. My requirement is to compare A.sql and B.sql and I need to check query which are present in A.sql is present in B.sql or not, if it is not there in B.sql then we need to print the contents which are there in A.sql and not there in B.sql. So basically i am storing the SQL Query upto Semicolon as one query and comparing both the arrays and printing the content.
Below is the Example
A.sql
Select * from emp;
Select * from dept;
Select * from student;
Select * from subject;
B.sql
Select * from emp;
Select * from dept;
Select * from student;
Output Excepted
Select * from subject;
Output what I am getting
Select * from emp;
Below is my script
i=0
while read -rd ';' first_sql
do
first_array[$i]=$first_sql
i=$((i+1))
done < A.sql
j=0
while read -rd ';'second_sql
do
second_array[$j]=$second_sql
j=$((j+1))
done < B.sql
for p in "${first_array[#]}"; do
flag=false
for q in "${second_array[#]}"; do
if [[ $p == $q ]]; then
echo "$p is in first_array"
flag=true
break
fi
done
echo $flag
echo "$p is not in first_array"
done
So now I am reading first SQL file i.e A.sql upto Semicolon as a one query and storing it to the array.
i=0
while read -rd ';' first_sql
do
first_array[$i]=$first_sql
i=$((i+1))
done < A.sql
So now I am reading Second SQL file i.e B.sql upto Semicolon as a one query and storing it to the array.
j=0
while read -rd ';'second_sql
do
second_array[$j]=$second_sql
j=$((j+1))
done < B.sql
Now I am comparing first_array and second_array and printing the contents which are present in first_array and not present in second_array outside the inner for loop.
for p in "${first_array[#]}"; do
flag=false
for q in "${second_array[#]}"; do
if [[ $p == $q ]]; then
echo "$p is in first_array"
flag=true
break
fi
done
echo $flag
echo "$p is not in first_array"
done
Can anyone please help with the above issue and let me know what is wrong.
Note : My one SQL query doesn't occupies single line, it will be more than 4-5 lines. Just for an example I have used a single line query.
Since my one SQL query is having more than 4-5 lines, so I am reading SQL query in while loop upto semicolon and storing it in a array and then i am comparing those two array to print the unmatched contents.
Thanks in advance!!!
You can do it with a one liner that uses comm to print out just the statements that appear in the first file, with a bit of pre and post processing to account for multi-line sql statements:
$ comm -z -23 <(perl -0777 -pe 's/;\n/;\x{0}/g' a.sql | sort -z) \
<(perl -0777 -pe 's/;\n/;\x{0}/g' b.sql | sort -z) \
| tr "\0" "\n"
Select * from subject;
(This does assume a GNU userland; other versions of comm and sort might not take the -z option).

How to pass parameter into SQL file from UNIX script?

I'm looking to pass in a parameter into a SQL file from my UNIX script. Unfortunately having problems with it.
Please see UNIX script below:
#!/bin/ksh
############
# Functions
_usage() {
SCRIPT_NAME=XXX
-eq 1 -o "$1" = "" -o "$1" = help -o "$1" = Help -o "$1" = HELP ]; then
echo "Usage: $SCRIPT_NAME [ cCode ]"
echo " - For example : $SCRIPT_NAME GH\n"
exit 1
fi
}
_initialise() {
cCode=$1
echo $cCode
}
# Set Variables
_usage $#
_initialise $1
# Main Processing
sql $DBNAME < test.sql $cCode > $PVNUM_LOGFILE
RETCODE=$?
# Check for errors within log file
if [[ $RETCODE != 0 ]] || grep 'E_' $PVNUM_LOGFILE
then
echo "Error - 50 - running test.sql. Please see $PVNUM_LOGFILE"
exit 50
fi
Please see SQL script (test.sql):
SELECT DISTINCT v1.*
FROM data_latest v1
JOIN temp_table t
ON v1.number = t.id
WHERE v1.code = '&1'
The error I am receiving when running my UNIX script is:
INGRES TERMINAL MONITOR Copyright 2008 Ingres Corporation
E_US0022 Either the flag format or one of the flags is incorrect,
or the parameters are not in proper order.
Anyone have any idea what I'm doing wrong?
Thanks!
NOTE: While I don't work with the sql command, I do routinely pass UNIX parameters into SQL template/script files when using the isql command line tool, so fwiw ...
The first thing you'll want to do is replace the &1 string with the value in the cCode variable; one typical method is to use sed to do a global search and replace of &1 with ${cCode} , eg:
$ cCode=XYZ
$ sed "s/\&1/${cCode}/g" test.sql
SELECT DISTINCT v1.*
FROM data_latest v1
JOIN temp_table t
ON v1.number = t.id
WHERE v1.code = 'XYZ' <=== &1 replaced with XYZ
NOTE: You'll need to wrap the sed code in double quotes so that the value of the cCode variable can be referenced.
Now, to get this passed into sql there are a couple options ... capture the sed output to a new file and submit that file to sql or ... [and I'm guessing this is doable with sql], pipe the sed output into sql, eg:
sed "s/\&1/${cCode}/g" test.sql | sql $DBNAME > $PVNUM_LOGFILE
You may need '\p\g' around your SQL in the text file?
I personally tend to code in the SQL to the script itself, as in
#!/bin/ksh
var=01.01.2018
db=database_name
OUTLOG=/path/log.txt
sql $db <<_END_ > $OUTLOG
set autocommit on;
\p\g
set lockmode session where readlock = nolock;
\p\g
SELECT *
FROM table
WHERE date > '${var}' ;
\p\g
_END_
exit 0

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.
Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output
True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)
I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'

how to assign a query result to a shell variable

I have an sql query that returns a date.
I call this query from a shell script and would like to assign this value to the variable called datestart (and use it later). Here is my code. Without the datestart assignment the query works fine.
#!/bin/sh
firstname="-Upgsql"
dbname="statcoll"
portname="-p5438"
datestart=(psql $firstname $portname $dbname<< EOF
SELECT MIN(latestrefdate) FROM (SELECT MAX(referencedate) AS latestrefdate FROM statistics WHERE transactionname IN(SELECT DISTINCT transactionname FROM statistics WHERE platform = 'Smarties')GROUP BY transactionname) as earliest;
EOF
)
echo $datestart
but the result is this :
Syntax error: word unexpected (expecting ")").
I have no idea where should I insert that closing bracket. Any hint is appreciated.
Instead of brackets in variable assignment you need to use $(...) for BASH or `...` for sh.
Try this:
#!/bin/sh
firstname="-Upgsql"
dbname="statcoll"
portname="-p5438"
datestart=`psql -t --pset="footer=off" --user="$firstname" --port="$portname" -d "$dbname"<<EOF
SELECT MIN(latestrefdate) FROM (SELECT MAX(referencedate) AS latestrefdate FROM statistics WHERE transactionname IN (SELECT DISTINCT transactionname FROM statistics WHERE platform = 'Smarties') GROUP BY transactionname) as earliest;
EOF
`
echo "$datestart"