bash cut and paste SQL insert statement - sql

I would like to remove a column name value pair from one INSERT statement and move it into another INSERT statement. I have about a hundred seperate files that have this sort of format (although the format may vary slightly from file to file, for instance some users may have put the entire INSERT statement on one line).
INPUT
INSERT INTO table1 (
col1,
col2
)
VALUES (
foo,
bar
);
INSERT INTO table2 (
col3,
col4_move_this_one,
col5
)
VALUES (
john,
doe_move_this_value,
doe
);
OUTPUT
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
doe_move_this_value,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);
In general with the above format I was thinking I could use sed and cat in a script to find line numbers of each line to be moved and then move it, something like this.
for file in *; do
line_number=$(cat -n ${file} | sed some_statement | awk to_get_line_number)
# etc...
done
...but maybe you guys can recommend a more clever way that would work also if the INSERT statement is on one line.

With GNU awk for true multi-dimensional arrays, 3rd arg to match(), multi-char RS and \s/\S syntactic sugar:
$ cat tst.awk
BEGIN { RS="\\s*);\\s*" }
match($0,/(\S+\s+){2}([^(]+)[(]([^)]+)[)][^(]+[(]([^)]+)/,a) {
for (i in a) {
gsub(/^\s*|\s*$/,"",a[i])
gsub(/\s*\n\s*/,"",a[i])
}
tables[NR] = a[2]
names[NR][1]; split(a[3],names[NR],/,/)
values[NR][1]; split(a[4],values[NR],/,/)
}
END {
names[1][3] = names[1][2]
names[1][2] = names[2][2]
names[2][2] = names[2][3]
delete names[2][3]
values[1][3] = values[1][2]
values[1][2] = values[2][2]
values[2][2] = values[2][3]
delete values[2][3]
for (tableNr=1; tableNr<=NR; tableNr++) {
printf "INSERT INTO %s (\n", tables[tableNr]
cnt = length(names[tableNr])
for (nr=1; nr<=cnt; nr++) {
print " " names[tableNr][nr] (nr<cnt ? "," : "")
}
print ")"
print "VALUES ("
cnt = length(values[tableNr])
for (nr=1; nr<=cnt; nr++) {
print " " values[tableNr][nr] (nr<cnt ? "," : "")
}
print ");\n"
}
}
.
$ awk -f tst.awk file
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
doe_move_this_value,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);

A GAWK version that relies on gensub's backreference feature and heavily on regex.
$ cat > test.awk
BEGIN {
RS=" *) *; *" # set RS to ");" and prepare to space as well
ORS=");\n"
}
{
sub(/^[ \n]*/,"") # remove emptiness before second INSERT
}
$0 ~ /^INSERT/ && NR==1 {
a=$0 # store the first INSERT
}
$0 ~ /^INSERT/ && NR==2 { # store the second and use gensub to
b=$0 # find the second variables in INSERT and VALUES
split(gensub(/(INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1}[ \n]*([^,]*)[^\)]*\)*[ \n]*/,"\\3 ","g"),c," ")
}
END { # print first INSERT with second variables in place
# and second INSERT with variables removed
print gensub(/((INSERT|VALUES)[^\(]*\((([ \n]*)[^,]*,){1})/,"\\1\\4"c[++i]",\\5","g",a)
print gensub(/((INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1})[ \n]*[^,]*,/,"\\1 ","g",b)
}
This solution assumes that variables to copy are the second variables in the second INSERT after keywords INSERT and VALUES and that they are added to those same places in the first INSERT. Solution is space and \n friendly but doesn't support \t, easily fixed I assume.
$ awk -f test.awk file
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
col4_move_this_one,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);

Related

awk (or sed, or something else) - reference match on a previous line and add a line

I want to add a procedure call to the first line of every single stored procedure in a MySQL database (more than 50 stored procedures) that passes the executing routine's name to the new call.
There's no easy way to do this from within MySQL that I can work out, so I thought I'd give it a go exporting the routines using MySQLdump, editing the file with awk/sed/something and then re-creating the SPs. It's proved trickier than I expected.
This involves finding the function name from the CREATE ... PROCEDURE line, and then adding the CALL NewFunc(currentFuncName) line after the next BEGIN statement. Importantly, the next BEGIN statement isn't always the same number of lines after the CREATE.
I'm part-way there, I just can't work out how to tie up the two bits. I can get the function name with this:
awk '/CREATE.*PROCEDURE/ {gsub(/^.*PROCEDURE `|`\(.*$/,"");print "call newFuncHere(" $1 ");"}'
How do I get awk to add the result of this after the next BEGIN statement?
Example source:
CREATE DEFINER=`blah#%` PROCEDURE `theProcedure`(`param1` BIGINT(21) UNSIGNED,
theExisting BIGINT(21) UNSIGNED,
INOUT `myStatus` VARCHAR(255) )
SQL SECURITY INVOKER
BEGIN
DECLARE blah VARCHAR(32) DEFAULT NULL;
Example output:
CREATE DEFINER=`blah#%` PROCEDURE `theProcedure`(`param1` BIGINT(21) UNSIGNED,
theExisting BIGINT(21) UNSIGNED,
INOUT `myStatus` VARCHAR(255) )
SQL SECURITY INVOKER
BEGIN
CALL newFuncHere(theProcedure);
DECLARE blah VARCHAR(32) DEFAULT NULL;
Indentation is not relevant - just shown for example
Edited to add, forgot about the requirement that DECLARE statements must come first, so this is what I've ended up with...
# Initialise variables
BEGIN { seen=0; beginCount=-1; createCount=-1; done=0; }
# Ignore blank lines but still print them out
/^$|^\t$/ { print $0 ; next ;}
# If matches CREATE.*PROCEDURE|FUNCTION - increase createCount and set name of function/beginCount variables
/^CREATE/{createCount++} ($5~/PROCEDURE/ || $5~/FUNCTION/) { name=$6 ; beginCount=-1 }
# If BEGIN is found - increase beginCount - this is to count the depth of BEGIN statements
/BEGIN/ {beginCount++}
# Increment seen variable if DECLARE is in this line
/DECLARE/{seen++}
# If beginCounter = 0, we're at BEGIN level 0 (starts at -1) and if done is not set we've not yet done anything. A
# create depth of 0 (starts at -1), and we've seen at least one DECLARE line,
# if this line does NOT contain DECLARE, add the new line we want, followed
# by a newline and the current line. Reset some variables.
!done && !beginCount && !createCount && seen && !/DECLARE/{ print "CALL newFunction('" name "');" ORS $0 ; seen=0;count++; done=1 ; next;}
#{print createCount seen beginCount done}
# If END is seen and we're inside a nested BEGIN statement, decrement beginCount
# print out the line and skip to the next line
/END.*;/ && beginCount {beginCount--; print $0 ; next ;}
# If END is seen and we're at the top level, reset some counters
/END.*;/ && !beginCount {createCount=-1 ; beginCount=-1 ; seen=0; done=0; }
# print out the line
{ print $0 ;}
It's not perfect, it needs work to handle END IF statements, doesn't handle situations where the DECLARE statement covers multiple lines and there's a few other situations it doesn't work, and there's some redundant stuff in there from previous experiments - but it's good enough for my needs right now.
Assumptions:
current procedure name is field #4 in a backtick-delimited line
the new line is to be indented the same as the BEGIN plus 3 more characters (eg, total of 4 characters for the example)
Sample input:
$ cat ddl.sql
CREATE DEFINER=`blah#%` PROCEDURE `theProcedure`(`param1` BIGINT(21) UNSIGNED,
theExisting BIGINT(21) UNSIGNED,
INOUT `myStatus` VARCHAR(255) )
SQL SECURITY INVOKER
BEGIN
DECLARE blah VARCHAR(32) DEFAULT NULL;
BEGIN
....
....
..
BEGIN
BEGIN
END
One awk idea:
awk -F'`' '
procname && /BEGIN/ { print
n=index($0,"BEGIN")+3 # find indentation of "BEGIN" and +3
printf "%*scall newFuncHere(%s);\n", n, "", procname # indent by "n" spaces
procname=""
next
}
/CREATE.*PROCEDURE/ { procname=$4 }
1
' ddl.sql
This generates:
CREATE DEFINER=`blah#%` PROCEDURE `theProcedure`(`param1` BIGINT(21) UNSIGNED,
theExisting BIGINT(21) UNSIGNED,
INOUT `myStatus` VARCHAR(255) )
SQL SECURITY INVOKER
BEGIN
call newFuncHere(theProcedure);
DECLARE blah VARCHAR(32) DEFAULT NULL;
BEGIN
....
....
..
BEGIN
BEGIN
END
A slight variation on #markp-fuso's answer that relies on PROCEDURE appearing as the 3rd field in the line containing the function name as the 4th field, and then relies on BEGIN appearing at the beginning of the line after which you want to insert the new function call could be:
awk -F'`' '
/^CREATE/ && $3~/PROCEDURE/ { name=$4 }
/^BEGIN/ { print $0 ORS " CALL newFuncHere(" name ");"; next }
1
' file
Above there are 3-Rules:
The first conditioned upon /^CREATE/ && $3~/PROCEDURE/ simply saves the function name as name;
The second conditioned upon /^BEGIN/ simply outputs the current record and the new function call and skips to the next record (note: this presumes there will always be a function name before each BEGIN line -- if not, set and unset name and add name as a condition);
The final rule 1 is simply shorthand for the default print command that outputs the current record unchanged.
Example Use/Output
With your sample data in the file file, you could simply copy/middle-mouse paste the above into an xterm with file in the present working directory and receive:
$ awk -F'`' '
> /^CREATE/ && $3~/PROCEDURE/ { name=$4 }
> /^BEGIN/ { print $0 ORS " CALL newFuncHere(" name ");"; next }
> 1
> ' f
CREATE DEFINER=`blah#%` PROCEDURE `theProcedure`(`param1` BIGINT(21) UNSIGNED,
theExisting BIGINT(21) UNSIGNED,
INOUT `myStatus` VARCHAR(255) )
SQL SECURITY INVOKER
BEGIN
CALL newFuncHere(theProcedure);
DECLARE blah VARCHAR(32) DEFAULT NULL;
This just simplifies the syntax a bit.

match a column and return some other column like sql

How do I match the corpus file with second column in stem and return the first column?
corpus.txt
this
is
broken
testing
as
told
Only the fist 2 columns are important in this file:
stem.csv
"test";"tested";"test";"Suffix";"A";"7673";"321: 0 xxx"
"test";"testing";"test";"Suffix";"A";"7673";"322: 0 xxx"
"test";"tests";"test";"Suffix";"b";"5942";"001: 0 xxx"
"break";"broke";"break";"Suffix";"b";"5942";"002: 0 xxx"
"break";"broken";"break";"Suffix";"b";"5942";"003: 0 xxx"
"break";"breaks";"break";"Suffix";"c";"5778";"001: 0 xxx"
"tell";"told";"tell";"Suffix";"c";"5778";"002: 0 xx"
If the word is missing in the stem file, it should be replaced with XXX
expected.txt
XXX
XXX
break
test
XXX
tell
It can be done using SQL queries like this...
CREATE TABLE `stem` (
`column1` varchar(100) DEFAULT NULL,
`column2` varchar(100) DEFAULT NULL
) ;
INSERT INTO `stem` VALUES ('break','broken'),('break','breaks'),('test','tests');
CREATE TABLE `corpus` (
`column1` varchar(100) DEFAULT NULL
)
INSERT INTO `corpus` VALUES ('tests'),('xyz');
_____
mysql> select ifnull(b.column1, 'XXX') as result from corpus as a left join stem as b on a.column1 = b.column2;
+--------+
| result |
+--------+
| test |
| XXX |
+--------+
But I am looking for a way to process text files directly so that I do not need to import them in mysql.
Using awk:
$ awk -F';' ' # delimiter
NR==FNR { # process the stem file
gsub(/"/,"") # off with the double quotes
a[$2]=$1 # hash
next
}
{
if($1 in a) # if corpus entry found in stem
print a[$1] # output
else
print "XXX"
}' stem corpus
Output:
XXX
XXX
break
test
XXX
tell

Match multiline SQL statement in pgdump

I have PostgreSQL database dump by pg_dump version 9.5.2, which contains DDLs and also INSERT INTO statements for each table in given database. Dump looks like this:
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
INSERT INTO unimportant_table VALUES (123456, 'some data split into
- multiple
- lines
just for fun');
INSERT INTO important_table VALUES (987654321, 'some important data', 'another crap split into
- lines');
...
-- thousands of inserts into both tables
The dump file is really large and it is produced by another company, so I am not able to influence the export process. I need create 2 files from this dump:
All DDL statements (all statements that doesn't start with INSERT INTO)
All INSERT INTO important_table statements (I want restore only some tables from dump)
If all statements would be on single line without new line character in the data, it will be very easy to create 2 SQL script by grep, for example:
grep -v '^INSERT INTO .*;$' my_dump.sql > ddl.sql
grep -o '^INSERT INTO important_table .*;$' my_dump.sql > important_table.sql
# Create empty structures
psql < ddl.sql
# Import only one table for now
psql < important_table.sql
Firstly I was thinking about using grep but I did not find, how to process multiple lines at once, then I tried sed but it is returning only single line inserts. I also used https://regex101.com/ to find out right regular expression but I don't know how to combine it with grep or sed:
^(?!(INSERT INTO)).*$ -- for ddl
^INSERT INTO important_table(\s|[[:alnum:]])*;$ -- for inserts
I found similar question pcregrep multiline SQL match but there is no answer. Also, I don't mind if the solution will work with grep, sed or whatever you suggest, but it should work on Ubuntu 18.04.4 TLS.
Here is a bash based solution that uses perl one-liners to prepare your SQL dump data for the subsequent grep statements.
In my approach, the goal is to get one SQL statement on one line through a script that I called prepare.sh. It got a little more complicated because I wanted to accomodate for semicolons and quotes within your insert data strings (these, along with the line breaks, are represented by their hex codes in the intermediate output):
EDIT: In response to #32cupo's comment, below is a modified set of scripts that avoids xargs with large data sets (although I don't have huge dump files to test it with):
#!/bin/bash
perl -pne 's/;(?=\s*$)/__ENDOFSTATEMENT__/g' \
| perl -pne 's/\\/\\\\x5c/g' \
| perl -pne 's/\n/\\\\x0a/g' \
| perl -pne 's/"/\\\\x22/g' \
| perl -pne 's/'\''/\\\\x27/g' \
| perl -pne 's/__ENDOFSTATEMENT__/;\n/g' \
Then, a separate script (called ddl.sh) includes your grep statement for the DDL (and, with the help of the loop, only feeds smaller chunks (lines) into xargs):
#!/bin/bash
while read -r line; do
<<<"$line" xargs -I{} echo -e "{}"
done < <(grep -viE '^(\\\\x0a)*insert into')
Another separate script (called important_table.sh) includes your grep statement for the inserts into important-table:
#!/bin/bash
while read -r line; do
<<<"$line" xargs -I{} echo -e "{}"
done < <(grep -iE '^(\\\\x0a)*insert into important_table')
Here is the set of scripts in action (please also note that I spiced up your insert data with some semicolons and quotes):
~/$ cat dump.sql
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
INSERT INTO unimportant_table VALUES (123456, 'some data split into
- multiple
- lines
;just for fun');
INSERT INTO important_table VALUES (987654321, 'some important ";data"', 'another crap split into
- lines;');
...
-- thousands of inserts into both tables
~/$ cat dump.sql | ./prepare.sh | ./ddl.sh >ddl.sql
~/$ cat ddl.sql
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
...
-- thousands of inserts into both tables
~/$ cat dump.sql | ./prepare.sh | ./important_table.sh > important_table.sql
~/$ cat important_table.sql
INSERT INTO important_table VALUES (987654321, 'some important ";data"', 'another crap split into
- lines;');

Select row when value is in range

Given the two column below how can I select the row 110-118 if my filter is 111? 100-118 is a range thus 111 falls between 100-118
Dest_ZIP Ground
004-005 003
009 005
068-089 002
100-118 001
Below is a simple example of how to do this in SQL using a sub query to get the start and end range. This can be expanded on to better handle parsing the string value.
Declare #Temp TABLE
(
Dest_Zip varchar(7),
Ground varchar(3)
)
INSERT INTO #Temp VALUES ('004-005','003')
INSERT INTO #Temp VALUES ('068-089','002')
INSERT INTO #Temp VALUES ('100-118','001')
SELECT A.Dest_Zip, A.Ground FROM
(
select
Convert(int, SUBSTRING(Dest_Zip,1,3)) StartNum,
Convert(int, SUBSTRING(Dest_Zip,5,3)) EndNum,
*
from #Temp
) AS A
WHERE 111 >= A.StartNum AND 111 <= A.EndNum
Fix the data. Here is a simple way using computed columns (and assuming the "zips" are always 3 characters):
alter table t
add column minzip as (left(dest_zip), 3),
add column maxzip as (right(dest_zip), 3);
Then, you can run the query as:
select t.*
from t
where '111' between t.minzip and t.maxzip;
You can even create an index on computed columns, which can help performance (although not much in this case).
If you wish to make the checks in php, maybe this sample code can help you:
<?php
$my_string = "111";
$foo = "100-118"; // our range
$bar = explode('-', $foo); // Get an array .. let's call it $bar
// Print the output, see how the array looks
//print_r($bar);
//echo $bar[0].'<br />';
//echo $bar[1].'<br />';
if(($bar[0] <= $my_string ) AND ($bar[1] >= $my_string)){ echo 'true';} else { echo 'false';}
?>

string substitution from text file to another string

I have a text file with three columns of text (strings) per line. I want to create an SQL insert command by substituting each of the three strings into a skeleton SQL command. I have put place markers in the skeleton script and used SED s/placemarker1/first string/ but with no success. Is there an easier way to accomplish this task. I used pipes to repeat the process for 'second string' etc. I actually used awk to get the fields but could not convert to the actual values.
enter code here
for i in [ *x100* ]; do
if [ -f "$i" ]; then {
grep -e "You received a payment" -e "Transaction ID:" -e "Receipt No: " $i >> ../temp
cat ../temp | awk 'NR == 1 {printf("%s\t",$9)} NR == 2 {printf("%s\t",$9)} NR == 3 {printf("%s\n",$3)}' | awk '{print $2,$1,$3}' | sed 's/(/ /' | sed 's/)./ /' >> ../temp1
cat temp1 | awk 'email="$1"; transaction="$2"; ccreceipt="$3";'
cat /home/linux014/opt/skeleton.sql | sed 's/EMAIL/"$email"/' | sed 's/TRANSACTION/"$transaction"/' | sed 's/CCRECEIPT/"$ccreceipt"/' > /home/linux014/opt/new-member.sql
rm -f ../temp
} fi
done
I cannot figure out how to get the values instead of the names of the variables inserted into my string.
Sample input (one line only):
catdog#gmail.com 2w4e5r6t7y8u9i8u7 1111-2222-3333-4444
Sample actual output:
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('"$email"','"$transaction"','"$ccreceipt"');
Preferred output:
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('catdog#gmail.com','2w4e5r6t7y8u9i8u7','1111-2222-3333-4444');
awk '{print "INSERT INTO users (email,paypal_tran,CCReceipt) VALUES"; print "(\x27"$1"\x27,\x27"$2"\x27,\x27"$3"\x27);"}' input.txt
Converts your sample input to preferred output. It should work for multi line input.
EDIT
The variables you are using in this line:
cat temp1 | awk 'email="$1"; transaction="$2"; ccreceipt="$3";'
are only visible to awk and in this command. They are not shell variables.
Also in your sed commands remove those single quotes then you can get the values:
sed "s/EMAIL/$email/"
You can try this bash,
while read email transaction ccreceipt; do echo "INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('$email','$transaction','$ccreceipt');"; done<inputfile
inputfile:
catdog#gmail.com 2w4e5r6t7y8u9i8u7 1111-2222-3333-4444
dog#gmail.com 2dsdsda53563u9i8u7 3333-4444-5555-6666
Test:
sat:~$ while read email transaction ccreceipt; do echo "INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('$email','$transaction','$ccreceipt')"; done<inputfile
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('catdog#gmail.com','2w4e5r6t7y8u9i8u7','1111-2222-3333-4444')
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('dog#gmail.com','2dsdsda53563u9i8u7','3333-4444-5555-6666')
You can write a small procedure for this
CREATE PROCEDURE [dbo].[appInsert]--
#string VARCHAR(500)
AS
BEGIN
DECLARE #I INT
DECLARE #SubString VARCHAR(500)
SET #String = 'catdog#gmail.com 2w4e5r6t7y8u9i8u7 1111-2222-3333-4444'
SET #I = 1
SET #String = REPLACE(#String, ' ', '`~`')
WHILE #I > 0
BEGIN
SET #SubString = SUBSTRING (REVERSE(#String), 1, ( CHARINDEX( '`~`', REVERSE(#String)) - 1))
SET #String = SUBSTRING(#String, 1, LEN(#String) - CHARINDEX( '`~`', REVERSE(#String)) - 2 )
print REVERSE(#SubString) + ' === ' + #String
SET #i = CHARINDEX( '`~`', #String)
END
END