Awk /sed extract information when a pattern match from a paragraph - awk

I want to search a pattern "FROM" in paragraph that begins with CREATE VIEW and ends with ";" and save the result in a csv file. for example if I have the following file :
CREATE VIEW view1
AS something
FROM table1 ,table2 as A, table3 (something FROM table4)
FROM table5, table6
USING file1
;
CREATE VIEW view2
FROM table1 ,table2 ,table6 ,table4
something
something
FROM table5 ,table7 (something FROM table4 ,table5(this is something FROM table8)
USING file2
;
I would like to have the following result:
view1;table1
view1;table2
view1;table3
view1;table4
view1;table5
view1;table6
view2;table1
view2;table2
view2;table6
view2;table4
view2;table5
view2;table7
view2;table4
view2;table5
view2;table8

I won't pretended to know the syntax of whatever follows FROM in your input file so here's how to identify the view plus split the FROM lines at commas and you can take it from there:
$ cat tst.awk
BEGIN { FS="[[:space:]]*,[[:space:]]*"; OFS=";" }
sub(/^CREATE VIEW[[:space:]]+/,"") { view = $0 }
sub(/^FROM[[:space:]]+/,"") {
for (i=1;i<=NF;i++) {
print view, $i
}
}
$ awk -f tst.awk file
view1;table1
view1;table2 as A
view1;table3 (something FROM table4)
view1;table5
view1;table6
view2;table1
view2;table2
view2;table6
view2;table4
view2;table5
view2;table7 (something FROM table4
view2;table5(this is something FROM table8)

Related

strugling with awk script need help to done this just need your suggestion or logic

I have a sql file to filter the data
-- Edit this file by adding your SQL below each question.
-------------------------------------------------------------------------------
-------------------------------------------------------------
-- The following queries are based on the 1994 census data.
-------------------------------------------------------------
.read 1994
-census-summary-1.sql
-- 4. what is the average age of people from China?
select avg(age)
from census
where native_country ='China';
-- 5. what is the average age of people from Taiwan?
select avg(age)
from census
where native_country ='Taiwan';
-- 6. which native countries have "land" in their name?
select distinct(native_country)
from census
where native_country like '%land%';
--------------------------------------------------------------------------------------
-- The following queries are based on the courses-ddl.sql and courses-small.sql data
--------------------------------------------------------------------------------------
drop table census;
.read courses-ddl.sql
.read courses-small-1.sql
-- 11. what are the names of all students who have taken some course? Don't show duplicates.
select distinct(name)
from student
where tot_cred > 0;
-- 12. what are the names of departments that offer 4-credit courses? Don't list duplicates.
select distinct(dept_name)
from course
where credits=4;
-- 13. What are the names and IDs of all students who have received an A in a computer science class?
select distinct(name), id
from student natural join takes natural join course
where dept_name="Comp. Sci." and grade="A";
if I run
./script.awk -v ID=6 file.sql
Note that the problem id is passed to the awk script as variable ID on the command line, like this:
-v ID=6
How Can I get the result like
Result :
select distinct(native_country) from census where native_country like '%land%';
With your shown samples and in GNU awk, please try following GNU awk code using its match function. Where id is an awk variable has value which you want to make sure should be checked in lines of your Input_file. Also I have used exit to get/print the very first match and get out of program to save some time/cycle, in case you have more than one matches then simply remove it from following code.
awk -v RS= -v id="6" '
match($0,/(\n|^)-- ([0-9]+)\.[^\n]*\n(select[^;]*;)/,arr) && arr[2]==id{
gsub(/\n/,"",arr[3])
print arr[3]
exit
}
' Input_file
One option with awk could be matching the start of the line with -- 6. where 6 is the ID.
Then move to the next line, and set a variable that the start of the part that you want to match is seen
Then print all lines that do not start with a space and are seen.
Set seen to 0 when encountering an "empty" line
Concatenate the lines that you want in the output as a single line, and at the end remove the trailing space.
gawk -v ID=6 '
match($0, "^-- "ID"\\.") {
seen=1
next
}
/^[[:space:]]*$/ {
seen=0
}
seen {
a = a $0 " "
}
END {
sub(/ $/, "", a)
print a
}
' file.sql
Or as a single line
gawk -v ID=6 'match($0,"^-- "ID"\\."){seen=1;next};/^[[:space:]]*$/{seen=0};seen{a=a$0" "};END{sub(/ $/,"",a);print a}' file.sql
Output
select distinct(native_country) from census where native_country like '%land%';
Another option with gnu awk setting the row separator to an "empty" line and using a regex with a capture group to match all lines after the initial -- ID match that do not start with a space
gawk -v ID=6 '
match($0, "\\n-- "ID"\\.[^\\n]*\\n(([^[:space:]][^\\n]*(\\n|$))*)", m) {
gsub(/\n/, " ", m[1])
print m[1]
}
' RS='^[[:space:]]*$' file

How to Compare Two SQL Files in Shell Script

I have two SQL files A.sql and B.sql. My requirement is I have to compare A.sql and B.sql and I need to check query which are present in A.sql is present in B.sql or not, if it is not there in B.sql then we need to move Query from A.sql to newfile.sql which is not present in B.sql
Below is the Example
A.sql
Select * from emp;
Select * from dept;
Select * from student;
Select * from subject;
B.sql
Select * from emp;
Select * from dept;
Output Excepted
Select * from student;
Select * from subject;
Output what I am getting
Select * from dept;
Select * from student;
Select * from subject;
Below is my script
while read -rd ';' i_sql
do
flag=0
while read -rd ';' e_sql
do
if [ "$i_sql" != "$e_sql" ];
then
flag=0
else
flag=1
break
fi
done < B.sql
if [ !$flag ]
then
echo "$i_sql">>newfile.sql
fi
done < A.sql
Reading the sql query upto semicolon from A.sql and storing it in i_sql
while read -rd ';' i_sql
Reading the sql query upto semicolon from B.sql and storing it in e_sql
while read -rd ';' e_sql
Below i am comparing the i_sql and e_sql if it is equal i am going to else part using break so that it
should not compare with other statements.If it is not equal i am setting flag=0, later i am moving the
query which is not present in B.sql to newfile.sql outside the inner while loop.
if [ "$i_sql" != "$e_sql" ];
then
flag=0
else
flag=1
break
fi
done < B.sql
Below i am moving the Sql query to newfile.sql which is not there in B.sql and which is present in A.sql.
if [ !$flag ]
then
echo "$i_sql">>newfile.sql
fi
done < A.sql
Can anyone please help with the above issue and let me know what is wrong.
Note : My one SQL query doesn't occupies single line, it will be more than 4-5 lines. Just for an example I have used a single line query.
Since my one SQL query is having more than 4-5 lines, so I am reading SQL query in while loop upto semicolon and storing it in a variable and then I am using the variable for comparison.
Thanks in advance!!!
I assume that in your input files one query occupies exactly one line. You did not say this explicitly, but your example suggests it. In this case, you could interpret B.sql a list of literal pattern and ask grep, which of these pattern do not occur in A.sql :
grep -F -f B.sql -v A.sql
-F says literal pattern, -f tells grep where to look for the pattern, and -v says to report lines where none of the pattern matches.
Your logic seems correct. But you need to take care of more details like casing difference, difference in white spaces between the words (Let's say same query has a space before semicolon in one file and no space in the other.)
The reason why 'Select * from dept;' appears in the result may be some whitespace difference.
As suggested in the comments, it is better to use a diff tool command line instead of writing the logic yourself. You can explore diff / vimdiff / git diff ...
This can be achieved through awk:
awk 'FNR==NR { map[$0]=1 } FNR!=NR && map[$0]!=1 { print $0>> "newfile.sql";close("newfile.sql") } FNR!=NR && map[$0]==1 { print }' B.sql A.sql > A.sql.tmp
Process B.sql first (NR==FNR). Create a array indexed with the entries. When we process A.sql (FNR!=NR) and there is not an entry in the map array. we print the line to a newfile.sql file. Otherwise we print to screen.
You can then commit the output on screen back to the A.sql file:
awk 'FNR==NR { map[$0]=1 } FNR!=NR && map[$0]!=1 { print $0>> "newfile.sql";close("newfile.sql") } FNR!=NR && map[$0]==1 { print }' B.sql A.sql > A.sql.tmp > A.sql.tmp && mv -f A.sql.tmp A.sql
The problem is on this line :
if [ !$flag ]
which always yields true beacuse !1 and !0 are non-empty strings.
What you need is :
if [ $flag = 0 ]

How to Compare Two Arrays in Shell Script

I have two SQL files A.sql and B.sql. My requirement is to compare A.sql and B.sql and I need to check query which are present in A.sql is present in B.sql or not, if it is not there in B.sql then we need to print the contents which are there in A.sql and not there in B.sql. So basically i am storing the SQL Query upto Semicolon as one query and comparing both the arrays and printing the content.
Below is the Example
A.sql
Select * from emp;
Select * from dept;
Select * from student;
Select * from subject;
B.sql
Select * from emp;
Select * from dept;
Select * from student;
Output Excepted
Select * from subject;
Output what I am getting
Select * from emp;
Below is my script
i=0
while read -rd ';' first_sql
do
first_array[$i]=$first_sql
i=$((i+1))
done < A.sql
j=0
while read -rd ';'second_sql
do
second_array[$j]=$second_sql
j=$((j+1))
done < B.sql
for p in "${first_array[#]}"; do
flag=false
for q in "${second_array[#]}"; do
if [[ $p == $q ]]; then
echo "$p is in first_array"
flag=true
break
fi
done
echo $flag
echo "$p is not in first_array"
done
So now I am reading first SQL file i.e A.sql upto Semicolon as a one query and storing it to the array.
i=0
while read -rd ';' first_sql
do
first_array[$i]=$first_sql
i=$((i+1))
done < A.sql
So now I am reading Second SQL file i.e B.sql upto Semicolon as a one query and storing it to the array.
j=0
while read -rd ';'second_sql
do
second_array[$j]=$second_sql
j=$((j+1))
done < B.sql
Now I am comparing first_array and second_array and printing the contents which are present in first_array and not present in second_array outside the inner for loop.
for p in "${first_array[#]}"; do
flag=false
for q in "${second_array[#]}"; do
if [[ $p == $q ]]; then
echo "$p is in first_array"
flag=true
break
fi
done
echo $flag
echo "$p is not in first_array"
done
Can anyone please help with the above issue and let me know what is wrong.
Note : My one SQL query doesn't occupies single line, it will be more than 4-5 lines. Just for an example I have used a single line query.
Since my one SQL query is having more than 4-5 lines, so I am reading SQL query in while loop upto semicolon and storing it in a array and then i am comparing those two array to print the unmatched contents.
Thanks in advance!!!
You can do it with a one liner that uses comm to print out just the statements that appear in the first file, with a bit of pre and post processing to account for multi-line sql statements:
$ comm -z -23 <(perl -0777 -pe 's/;\n/;\x{0}/g' a.sql | sort -z) \
<(perl -0777 -pe 's/;\n/;\x{0}/g' b.sql | sort -z) \
| tr "\0" "\n"
Select * from subject;
(This does assume a GNU userland; other versions of comm and sort might not take the -z option).

generate SQL queries out of CSV table

Good morning,
I have a CSV file that contains in the first line the column of my tables and the rest is data. Something like that
FIELD1,FIELD2,FIELD3
data1,data2,data3
data1,data2,data3
Now I have been trying to write a script that will return the following output and can be used for more than once.
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
INSERT INTO tablename (FIELD1,FIELD2,FIELD3) VALUES
(data1,data2,data3)
That's what I have so far but it does not return the correct output.
firstline=$(printf '%s\n' 1p d wq | ed -s file.csv )
cat file.csv | while read line
do
field1=$(echo "$line" | cut -d "," -f1)
field2=$(echo "$line" | cut -d "," -f2)
field3=$(echo "$line" | cut -d "," -f3)
echo "INSERT INTO tablename ($firstline) VALUES ($fields1 $field2 $field3) ">prova.csv
done
) VALUES ( 15blename (data1,1,1
I am not sure I can use the variable $firstline inside the while loop... but I don't understand why it doesn't print me the insert into and the correct parenthesis.
Thanks in advance.
EDIT:
I have a new problem: SQL assistant does not allow me to insert values that are not enclosed with "'" so my question is how do I edit the script to make it look like this :
INSERT INTO tablename (columns) VALUES ('data1','data2','data3') ">prova.csv
thanks
Using awk:
awk 'NR==1{x=$0;next} {printf "INSERT INTO tablename (%s) VALUES (%s)\n",x,$0}' file

Separate text and pass it to a SQL

I'm using the latest Debian version.
I have this file:
2301,XT_ARTICLES
2101,XT_HOUSE_PHOTOS
301,XT_PDF
101611,XT_FIJOS
I want to separate this text so I can add the ID and the name to a one SQL. The SQL must be repeated according to the number of lines in the file, but I don't know how can I do it.
Can anybody help me, please?
Is this fit your needs ?
awk -F',' '{print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"}' file.txt
INSERT INTO foobar VALUES(2301, 'XT_ARTICLES');
INSERT INTO foobar VALUES(2101, 'XT_HOUSE_PHOTOS');
INSERT INTO foobar VALUES(301, 'XT_PDF');
INSERT INTO foobar VALUES(101611, 'XT_FIJOS');
If it's ok, just pipe that in MySQL :
awk -F',' '
BEGIN{
print "USE qux;"
}
{
print "INSERT INTO foobar VALUES("$1,",\047"$2"\047);"
}' file.txt | mysql