How to Compare Two Arrays in Shell Script - sql

I have two SQL files A.sql and B.sql. My requirement is to compare A.sql and B.sql and I need to check query which are present in A.sql is present in B.sql or not, if it is not there in B.sql then we need to print the contents which are there in A.sql and not there in B.sql. So basically i am storing the SQL Query upto Semicolon as one query and comparing both the arrays and printing the content.
Below is the Example
A.sql
Select * from emp;
Select * from dept;
Select * from student;
Select * from subject;
B.sql
Select * from emp;
Select * from dept;
Select * from student;
Output Excepted
Select * from subject;
Output what I am getting
Select * from emp;
Below is my script
i=0
while read -rd ';' first_sql
do
first_array[$i]=$first_sql
i=$((i+1))
done < A.sql
j=0
while read -rd ';'second_sql
do
second_array[$j]=$second_sql
j=$((j+1))
done < B.sql
for p in "${first_array[#]}"; do
flag=false
for q in "${second_array[#]}"; do
if [[ $p == $q ]]; then
echo "$p is in first_array"
flag=true
break
fi
done
echo $flag
echo "$p is not in first_array"
done
So now I am reading first SQL file i.e A.sql upto Semicolon as a one query and storing it to the array.
i=0
while read -rd ';' first_sql
do
first_array[$i]=$first_sql
i=$((i+1))
done < A.sql
So now I am reading Second SQL file i.e B.sql upto Semicolon as a one query and storing it to the array.
j=0
while read -rd ';'second_sql
do
second_array[$j]=$second_sql
j=$((j+1))
done < B.sql
Now I am comparing first_array and second_array and printing the contents which are present in first_array and not present in second_array outside the inner for loop.
for p in "${first_array[#]}"; do
flag=false
for q in "${second_array[#]}"; do
if [[ $p == $q ]]; then
echo "$p is in first_array"
flag=true
break
fi
done
echo $flag
echo "$p is not in first_array"
done
Can anyone please help with the above issue and let me know what is wrong.
Note : My one SQL query doesn't occupies single line, it will be more than 4-5 lines. Just for an example I have used a single line query.
Since my one SQL query is having more than 4-5 lines, so I am reading SQL query in while loop upto semicolon and storing it in a array and then i am comparing those two array to print the unmatched contents.
Thanks in advance!!!

You can do it with a one liner that uses comm to print out just the statements that appear in the first file, with a bit of pre and post processing to account for multi-line sql statements:
$ comm -z -23 <(perl -0777 -pe 's/;\n/;\x{0}/g' a.sql | sort -z) \
<(perl -0777 -pe 's/;\n/;\x{0}/g' b.sql | sort -z) \
| tr "\0" "\n"
Select * from subject;
(This does assume a GNU userland; other versions of comm and sort might not take the -z option).

Related

strugling with awk script need help to done this just need your suggestion or logic

I have a sql file to filter the data
-- Edit this file by adding your SQL below each question.
-------------------------------------------------------------------------------
-------------------------------------------------------------
-- The following queries are based on the 1994 census data.
-------------------------------------------------------------
.read 1994
-census-summary-1.sql
-- 4. what is the average age of people from China?
select avg(age)
from census
where native_country ='China';
-- 5. what is the average age of people from Taiwan?
select avg(age)
from census
where native_country ='Taiwan';
-- 6. which native countries have "land" in their name?
select distinct(native_country)
from census
where native_country like '%land%';
--------------------------------------------------------------------------------------
-- The following queries are based on the courses-ddl.sql and courses-small.sql data
--------------------------------------------------------------------------------------
drop table census;
.read courses-ddl.sql
.read courses-small-1.sql
-- 11. what are the names of all students who have taken some course? Don't show duplicates.
select distinct(name)
from student
where tot_cred > 0;
-- 12. what are the names of departments that offer 4-credit courses? Don't list duplicates.
select distinct(dept_name)
from course
where credits=4;
-- 13. What are the names and IDs of all students who have received an A in a computer science class?
select distinct(name), id
from student natural join takes natural join course
where dept_name="Comp. Sci." and grade="A";
if I run
./script.awk -v ID=6 file.sql
Note that the problem id is passed to the awk script as variable ID on the command line, like this:
-v ID=6
How Can I get the result like
Result :
select distinct(native_country) from census where native_country like '%land%';
With your shown samples and in GNU awk, please try following GNU awk code using its match function. Where id is an awk variable has value which you want to make sure should be checked in lines of your Input_file. Also I have used exit to get/print the very first match and get out of program to save some time/cycle, in case you have more than one matches then simply remove it from following code.
awk -v RS= -v id="6" '
match($0,/(\n|^)-- ([0-9]+)\.[^\n]*\n(select[^;]*;)/,arr) && arr[2]==id{
gsub(/\n/,"",arr[3])
print arr[3]
exit
}
' Input_file
One option with awk could be matching the start of the line with -- 6. where 6 is the ID.
Then move to the next line, and set a variable that the start of the part that you want to match is seen
Then print all lines that do not start with a space and are seen.
Set seen to 0 when encountering an "empty" line
Concatenate the lines that you want in the output as a single line, and at the end remove the trailing space.
gawk -v ID=6 '
match($0, "^-- "ID"\\.") {
seen=1
next
}
/^[[:space:]]*$/ {
seen=0
}
seen {
a = a $0 " "
}
END {
sub(/ $/, "", a)
print a
}
' file.sql
Or as a single line
gawk -v ID=6 'match($0,"^-- "ID"\\."){seen=1;next};/^[[:space:]]*$/{seen=0};seen{a=a$0" "};END{sub(/ $/,"",a);print a}' file.sql
Output
select distinct(native_country) from census where native_country like '%land%';
Another option with gnu awk setting the row separator to an "empty" line and using a regex with a capture group to match all lines after the initial -- ID match that do not start with a space
gawk -v ID=6 '
match($0, "\\n-- "ID"\\.[^\\n]*\\n(([^[:space:]][^\\n]*(\\n|$))*)", m) {
gsub(/\n/, " ", m[1])
print m[1]
}
' RS='^[[:space:]]*$' file

How to pass unix variable in where condition of query?

I have a file whose filename I am storing in a shell variable and I wish to pass that variable in the WHERE condition of my SQL select query. How can I achieve this ?
my code
cd /path/to/folder
var =$(ls tail)
id_var=$(echo "$var" | cut -f 1 -d '.')
...
...
sqlplus -s user/pwd#db < mysql.sql > output.txt
cat mysql.sql
select * from Records where "GlobalId"='$id_var'
From this answer:
cd /path/to/folder
var =$(ls tail)
id_var=$(echo "$var" | cut -f 1 -d '.')
sqlplus -s user/pwd#db #mysql.sql "${id_var}" > output.txt
Then in mysql.sql use &1 to substitute the first start argument:
select * from Records where "GlobalId"='&1'
Note: &1 is a substitution variable (and not a bind variable) so you will need to make sure that the value passed in does not perform any SQL injection attacks.
You can export the variable
export id_var
Then use envsubst command
envsubst < mysql.sql
This will substitute your variable.

How to Compare Two SQL Files in Shell Script

I have two SQL files A.sql and B.sql. My requirement is I have to compare A.sql and B.sql and I need to check query which are present in A.sql is present in B.sql or not, if it is not there in B.sql then we need to move Query from A.sql to newfile.sql which is not present in B.sql
Below is the Example
A.sql
Select * from emp;
Select * from dept;
Select * from student;
Select * from subject;
B.sql
Select * from emp;
Select * from dept;
Output Excepted
Select * from student;
Select * from subject;
Output what I am getting
Select * from dept;
Select * from student;
Select * from subject;
Below is my script
while read -rd ';' i_sql
do
flag=0
while read -rd ';' e_sql
do
if [ "$i_sql" != "$e_sql" ];
then
flag=0
else
flag=1
break
fi
done < B.sql
if [ !$flag ]
then
echo "$i_sql">>newfile.sql
fi
done < A.sql
Reading the sql query upto semicolon from A.sql and storing it in i_sql
while read -rd ';' i_sql
Reading the sql query upto semicolon from B.sql and storing it in e_sql
while read -rd ';' e_sql
Below i am comparing the i_sql and e_sql if it is equal i am going to else part using break so that it
should not compare with other statements.If it is not equal i am setting flag=0, later i am moving the
query which is not present in B.sql to newfile.sql outside the inner while loop.
if [ "$i_sql" != "$e_sql" ];
then
flag=0
else
flag=1
break
fi
done < B.sql
Below i am moving the Sql query to newfile.sql which is not there in B.sql and which is present in A.sql.
if [ !$flag ]
then
echo "$i_sql">>newfile.sql
fi
done < A.sql
Can anyone please help with the above issue and let me know what is wrong.
Note : My one SQL query doesn't occupies single line, it will be more than 4-5 lines. Just for an example I have used a single line query.
Since my one SQL query is having more than 4-5 lines, so I am reading SQL query in while loop upto semicolon and storing it in a variable and then I am using the variable for comparison.
Thanks in advance!!!
I assume that in your input files one query occupies exactly one line. You did not say this explicitly, but your example suggests it. In this case, you could interpret B.sql a list of literal pattern and ask grep, which of these pattern do not occur in A.sql :
grep -F -f B.sql -v A.sql
-F says literal pattern, -f tells grep where to look for the pattern, and -v says to report lines where none of the pattern matches.
Your logic seems correct. But you need to take care of more details like casing difference, difference in white spaces between the words (Let's say same query has a space before semicolon in one file and no space in the other.)
The reason why 'Select * from dept;' appears in the result may be some whitespace difference.
As suggested in the comments, it is better to use a diff tool command line instead of writing the logic yourself. You can explore diff / vimdiff / git diff ...
This can be achieved through awk:
awk 'FNR==NR { map[$0]=1 } FNR!=NR && map[$0]!=1 { print $0>> "newfile.sql";close("newfile.sql") } FNR!=NR && map[$0]==1 { print }' B.sql A.sql > A.sql.tmp
Process B.sql first (NR==FNR). Create a array indexed with the entries. When we process A.sql (FNR!=NR) and there is not an entry in the map array. we print the line to a newfile.sql file. Otherwise we print to screen.
You can then commit the output on screen back to the A.sql file:
awk 'FNR==NR { map[$0]=1 } FNR!=NR && map[$0]!=1 { print $0>> "newfile.sql";close("newfile.sql") } FNR!=NR && map[$0]==1 { print }' B.sql A.sql > A.sql.tmp > A.sql.tmp && mv -f A.sql.tmp A.sql
The problem is on this line :
if [ !$flag ]
which always yields true beacuse !1 and !0 are non-empty strings.
What you need is :
if [ $flag = 0 ]

Issue with subqueries in Hive

I am trying to run a subquery within Hive in bash. But the issue is that the compiler is saying that it cannot recognize the subquery within the query. any ideas?
#!/bin/bash
echo "Hello world"
#####################################################################
#This line will connect to the database and execute the query in Hive
####################################################################
var1=$(beeline --showHeader=false --outputformat=tsv2 -u "jdbc:hive2:XXXXXXXXX" <<EOF
select $2 from $3.$1 where length($2)=(select max(length($2)) from $3.$1) limit 1;
EOF
)
#####################################################################
#This will output the result of the query
####################################################################
echo "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
echo "We are currently analyzing Table:$1 and Column:$2"
echo "The value wth a maximum length for $1 is $var1"
echo "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
I am afraid that your query will not work in HIVE because of the way you are doing the subquery. You will have to rewrite the query.
Try the next code to get the $2 that has the maximum length:
select $2 from (select max(length($2)) as length_2, $2 from $3.$1 group by $2 order by length_2 desc) a limit 1;
Also,you can execute the query using -e option, as #mazaneicha has mentioned.

psql shortcut for frequently used queries? (like Unix "alias")

Is it possible to somehow create aliases (like Unix alias command) in psql?
I mean, not SQL FUNCTION, but local aliases to ease manual queries?
I don't know about any possibility. There is only workaround for psql based on psql variables, but there is lot of limits - using parameters for this queries is difficult.
postgres=# \set whoami 'SELECT CURRENT_USER;'
postgres=# :whoami
current_user
--------------
pavel
(1 row)
Pavel's answer is almost correct, except you can use parameter in another way.
after
\set s 'select * from '
\set l ' limit 10;'
The following command
:s agent :l
will equal to
select * from agent limit 10;
According to http://www.postgresql.org/docs/9.0/static/app-psql.html
If an unquoted argument begins with a colon (:), it is taken as a psql
variable and the value of the variable is used as the argument
instead. If the variable name is surrounded by single quotes (e.g.
:'var'), it will be escaped as an SQL literal and the result will be
used as the argument. If the variable name is surrounded by double
quotes, it will be escaped as an SQL identifier and the result will be
used as the argument.
You can also use backquote to run shell command
Arguments that are enclosed in backquotes (`) are taken as a command
line that is passed to the shell. The output of the command (with any
trailing newline removed) is taken as the argument value. The above
escape sequences also apply in backquotes.
how about using UDFs? You can create a UDF that returns a table (set of) then you can query it as this: select * from udf();
It is not as clean, but it is better than nothing and it is portable. And UDFs can take parameters too.
Why not use a view? May be views will help in your case.
This might help, if you need to run frequent queries from command line (not from psql cli).
Add this to .bash_profile /.bashrc
POSTGRES_BIN=~/Postgres/bin
B_RED='\033[1;31m'
RESET='\033[0m'
psqlcommand="$POSTGRES_BIN/psql -U vignesh usersdb -q -c"
function psqlselectrows()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "SELECT * from $1"
}
The above command selects rows from the table, passed in the argument.
Note:
Change the database name, as required.
The schema by default is public. To have another default schema, add the following line in ~/.psqlrc file.
SET SEARCH_PATH TO <schema_name>;
If the database is password protected, refer this and make use of the secure method.
I have made some commands for my use, if it might help.
psqlselectrows - To select rows from a table
psqlgettablecount - To get row count of a table
psqltruncatetable - To truncate a table, on prompt
psqlgettablesize - To get the size of a table
psqlgetvacuumdetails - To get vacuum details of a table
psqlsettings - To get default and modified settings configured for Postgres.
(All the above commands need table name as first argument)
#Colors
B_RED='\033[1;31m'
B_GREEN='\033[1;32m'
B_YELLOW='\033[1;33m'
RESET='\033[0m'
#Postgres Command With Params
psqlcommand="$POSTGRES_BIN/psql -U vignesh usersdb -q -c"
function psqlgettablesize()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "select pg_size_pretty(pg_total_relation_size('$1')) as total_table_size, pg_size_pretty(pg_relation_size('$1')) as table_size, pg_size_pretty(pg_indexes_size('$1')) as index_size;";
}
function psqlgettablecount()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "select count(*) from $1;"
}
function psqlgetvacuumdetails()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "SELECT relname, n_live_tup, n_dead_tup, last_analyze::timestamp, analyze_count, last_autoanalyze::timestamp, autoanalyze_count, last_vacuum::timestamp, vacuum_count, last_autovacuum::timestamp, autovacuum_count FROM pg_stat_user_tables where relname='$1' and schemaname = current_schema();"
}
function psqltruncatetable()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
{
read -p "$(echo -e ${B_YELLOW}"Are you sure to truncate table '$1' (y/n)? "${RESET})" choice
case "$choice" in
y|Y ) $psqlcommand "TRUNCATE $1;";;
n|N ) echo -e "${B_GREEN}Table '$1' not truncated${RESET}";;
* ) echo -e "${B_RED}Invalid option${RESET}";;
esac
}
}
function psqlsettings()
{
query="select * from pg_settings"
if [ "$1" != "" ]; then
query="$query where category like '%$1%'"
fi
query="$query ;"
$psqlcommand "$query"
if [ -z "$1" ]; then
echo -e "${B_YELLOW}Passing Category as first argument will filter the related settings.${RESET}"
fi
}
function psqlselectrows()
{
[ -z "$1" ] && echo -e "${B_RED}Argument 1 missing: Need table name${RESET}" ||
$psqlcommand "SELECT * from $1"
}