string substitution from text file to another string - sql

I have a text file with three columns of text (strings) per line. I want to create an SQL insert command by substituting each of the three strings into a skeleton SQL command. I have put place markers in the skeleton script and used SED s/placemarker1/first string/ but with no success. Is there an easier way to accomplish this task. I used pipes to repeat the process for 'second string' etc. I actually used awk to get the fields but could not convert to the actual values.
enter code here
for i in [ *x100* ]; do
if [ -f "$i" ]; then {
grep -e "You received a payment" -e "Transaction ID:" -e "Receipt No: " $i >> ../temp
cat ../temp | awk 'NR == 1 {printf("%s\t",$9)} NR == 2 {printf("%s\t",$9)} NR == 3 {printf("%s\n",$3)}' | awk '{print $2,$1,$3}' | sed 's/(/ /' | sed 's/)./ /' >> ../temp1
cat temp1 | awk 'email="$1"; transaction="$2"; ccreceipt="$3";'
cat /home/linux014/opt/skeleton.sql | sed 's/EMAIL/"$email"/' | sed 's/TRANSACTION/"$transaction"/' | sed 's/CCRECEIPT/"$ccreceipt"/' > /home/linux014/opt/new-member.sql
rm -f ../temp
} fi
done
I cannot figure out how to get the values instead of the names of the variables inserted into my string.
Sample input (one line only):
catdog#gmail.com 2w4e5r6t7y8u9i8u7 1111-2222-3333-4444
Sample actual output:
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('"$email"','"$transaction"','"$ccreceipt"');
Preferred output:
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('catdog#gmail.com','2w4e5r6t7y8u9i8u7','1111-2222-3333-4444');

awk '{print "INSERT INTO users (email,paypal_tran,CCReceipt) VALUES"; print "(\x27"$1"\x27,\x27"$2"\x27,\x27"$3"\x27);"}' input.txt
Converts your sample input to preferred output. It should work for multi line input.
EDIT
The variables you are using in this line:
cat temp1 | awk 'email="$1"; transaction="$2"; ccreceipt="$3";'
are only visible to awk and in this command. They are not shell variables.
Also in your sed commands remove those single quotes then you can get the values:
sed "s/EMAIL/$email/"

You can try this bash,
while read email transaction ccreceipt; do echo "INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('$email','$transaction','$ccreceipt');"; done<inputfile
inputfile:
catdog#gmail.com 2w4e5r6t7y8u9i8u7 1111-2222-3333-4444
dog#gmail.com 2dsdsda53563u9i8u7 3333-4444-5555-6666
Test:
sat:~$ while read email transaction ccreceipt; do echo "INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('$email','$transaction','$ccreceipt')"; done<inputfile
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('catdog#gmail.com','2w4e5r6t7y8u9i8u7','1111-2222-3333-4444')
INSERT INTO users (email,paypal_tran,CCReceipt) VALUES ('dog#gmail.com','2dsdsda53563u9i8u7','3333-4444-5555-6666')

You can write a small procedure for this
CREATE PROCEDURE [dbo].[appInsert]--
#string VARCHAR(500)
AS
BEGIN
DECLARE #I INT
DECLARE #SubString VARCHAR(500)
SET #String = 'catdog#gmail.com 2w4e5r6t7y8u9i8u7 1111-2222-3333-4444'
SET #I = 1
SET #String = REPLACE(#String, ' ', '`~`')
WHILE #I > 0
BEGIN
SET #SubString = SUBSTRING (REVERSE(#String), 1, ( CHARINDEX( '`~`', REVERSE(#String)) - 1))
SET #String = SUBSTRING(#String, 1, LEN(#String) - CHARINDEX( '`~`', REVERSE(#String)) - 2 )
print REVERSE(#SubString) + ' === ' + #String
SET #i = CHARINDEX( '`~`', #String)
END
END

Related

Why is my script not printing output on one line?

This is an image of what I'm asking for
I am using the following -echo- in a script and after I execute, the output format is as shown below:
`echo -e "UPDATE table1 SET table1_f1='$Fname' ,table1_f2='$Lname' where table1_f3='$id';\ncommit;" >> $OutputFile`
output: UPDATE table1 SET table1_f1='Fname' ,table1_f2='Lname' where table1_f3='id ';
the '; is appearing on a new line, why is that happening?
The variable $id in your shell script actually contains that newline (\n or \r\n) at the end; so there isn't really anything wrong in the part of the script you've shown here.
This effect is pretty common if the variable is created based on external commands (update:) or by reading external files as you are here.
For simple values, one way to strip the newline off the end of the value, prior to using it in your echo is:
id=$( echo "${id}" | tr -s '\r' '' | tr -s '\n' '' );
or for scripts that already rely on a particular bash IFS value:
OLDIFS="${IFS}";
IFS=$'\n\t ';
id=$( echo "${id}" | tr -s '\r' '' | tr -s '\n' '' );
IFS="${OLDIFS}";

match a column and return some other column like sql

How do I match the corpus file with second column in stem and return the first column?
corpus.txt
this
is
broken
testing
as
told
Only the fist 2 columns are important in this file:
stem.csv
"test";"tested";"test";"Suffix";"A";"7673";"321: 0 xxx"
"test";"testing";"test";"Suffix";"A";"7673";"322: 0 xxx"
"test";"tests";"test";"Suffix";"b";"5942";"001: 0 xxx"
"break";"broke";"break";"Suffix";"b";"5942";"002: 0 xxx"
"break";"broken";"break";"Suffix";"b";"5942";"003: 0 xxx"
"break";"breaks";"break";"Suffix";"c";"5778";"001: 0 xxx"
"tell";"told";"tell";"Suffix";"c";"5778";"002: 0 xx"
If the word is missing in the stem file, it should be replaced with XXX
expected.txt
XXX
XXX
break
test
XXX
tell
It can be done using SQL queries like this...
CREATE TABLE `stem` (
`column1` varchar(100) DEFAULT NULL,
`column2` varchar(100) DEFAULT NULL
) ;
INSERT INTO `stem` VALUES ('break','broken'),('break','breaks'),('test','tests');
CREATE TABLE `corpus` (
`column1` varchar(100) DEFAULT NULL
)
INSERT INTO `corpus` VALUES ('tests'),('xyz');
_____
mysql> select ifnull(b.column1, 'XXX') as result from corpus as a left join stem as b on a.column1 = b.column2;
+--------+
| result |
+--------+
| test |
| XXX |
+--------+
But I am looking for a way to process text files directly so that I do not need to import them in mysql.
Using awk:
$ awk -F';' ' # delimiter
NR==FNR { # process the stem file
gsub(/"/,"") # off with the double quotes
a[$2]=$1 # hash
next
}
{
if($1 in a) # if corpus entry found in stem
print a[$1] # output
else
print "XXX"
}' stem corpus
Output:
XXX
XXX
break
test
XXX
tell

Match multiline SQL statement in pgdump

I have PostgreSQL database dump by pg_dump version 9.5.2, which contains DDLs and also INSERT INTO statements for each table in given database. Dump looks like this:
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
INSERT INTO unimportant_table VALUES (123456, 'some data split into
- multiple
- lines
just for fun');
INSERT INTO important_table VALUES (987654321, 'some important data', 'another crap split into
- lines');
...
-- thousands of inserts into both tables
The dump file is really large and it is produced by another company, so I am not able to influence the export process. I need create 2 files from this dump:
All DDL statements (all statements that doesn't start with INSERT INTO)
All INSERT INTO important_table statements (I want restore only some tables from dump)
If all statements would be on single line without new line character in the data, it will be very easy to create 2 SQL script by grep, for example:
grep -v '^INSERT INTO .*;$' my_dump.sql > ddl.sql
grep -o '^INSERT INTO important_table .*;$' my_dump.sql > important_table.sql
# Create empty structures
psql < ddl.sql
# Import only one table for now
psql < important_table.sql
Firstly I was thinking about using grep but I did not find, how to process multiple lines at once, then I tried sed but it is returning only single line inserts. I also used https://regex101.com/ to find out right regular expression but I don't know how to combine it with grep or sed:
^(?!(INSERT INTO)).*$ -- for ddl
^INSERT INTO important_table(\s|[[:alnum:]])*;$ -- for inserts
I found similar question pcregrep multiline SQL match but there is no answer. Also, I don't mind if the solution will work with grep, sed or whatever you suggest, but it should work on Ubuntu 18.04.4 TLS.
Here is a bash based solution that uses perl one-liners to prepare your SQL dump data for the subsequent grep statements.
In my approach, the goal is to get one SQL statement on one line through a script that I called prepare.sh. It got a little more complicated because I wanted to accomodate for semicolons and quotes within your insert data strings (these, along with the line breaks, are represented by their hex codes in the intermediate output):
EDIT: In response to #32cupo's comment, below is a modified set of scripts that avoids xargs with large data sets (although I don't have huge dump files to test it with):
#!/bin/bash
perl -pne 's/;(?=\s*$)/__ENDOFSTATEMENT__/g' \
| perl -pne 's/\\/\\\\x5c/g' \
| perl -pne 's/\n/\\\\x0a/g' \
| perl -pne 's/"/\\\\x22/g' \
| perl -pne 's/'\''/\\\\x27/g' \
| perl -pne 's/__ENDOFSTATEMENT__/;\n/g' \
Then, a separate script (called ddl.sh) includes your grep statement for the DDL (and, with the help of the loop, only feeds smaller chunks (lines) into xargs):
#!/bin/bash
while read -r line; do
<<<"$line" xargs -I{} echo -e "{}"
done < <(grep -viE '^(\\\\x0a)*insert into')
Another separate script (called important_table.sh) includes your grep statement for the inserts into important-table:
#!/bin/bash
while read -r line; do
<<<"$line" xargs -I{} echo -e "{}"
done < <(grep -iE '^(\\\\x0a)*insert into important_table')
Here is the set of scripts in action (please also note that I spiced up your insert data with some semicolons and quotes):
~/$ cat dump.sql
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
INSERT INTO unimportant_table VALUES (123456, 'some data split into
- multiple
- lines
;just for fun');
INSERT INTO important_table VALUES (987654321, 'some important ";data"', 'another crap split into
- lines;');
...
-- thousands of inserts into both tables
~/$ cat dump.sql | ./prepare.sh | ./ddl.sh >ddl.sql
~/$ cat ddl.sql
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
...
-- thousands of inserts into both tables
~/$ cat dump.sql | ./prepare.sh | ./important_table.sh > important_table.sql
~/$ cat important_table.sql
INSERT INTO important_table VALUES (987654321, 'some important ";data"', 'another crap split into
- lines;');

variable passed to sql from shell is not working

My code is:
#!/bin/sh
cat tmp_ts.log | awk ' {print $8}'
lookup=$8
sqlplus -s "sys/Orcl1234 as sysdba" << EOF
SELECT tablespace_name FROM dba_tablespaces WHERE tablespace_name='$lookup';
exit;
EOF
and my output is:
IAM_OIM
no rows selected
In this variable lookup I have passed to select statement but it's not working.
My end result should be with select statement. See below the output of select query:
See below:
My end result should be this but that variable is not working in select statement.
#!/bin/sh
lookup="$(awk '/tablespace/{print $8;exit}' tmp_ts.log)"
echo "Querying database with lookup = $lookup"
sqlplus -s "sys/Orcl1234 as sysdba" <<EOF
SELECT tablespace_name FROM dba_tablespaces WHERE tablespace_name='$lookup';
exit;
EOF
You have to use awk's output to set lookup. The shell knows nothing about the $8 which was set in awk. Also, I have ensured that awk exits after the first matching line, so that there is no risk of returning multiple values, or simply empty lines as it did in your version.
You can fill lookup with a command like awk, sed or cut.
lookup=$(cut -d" " -f8 tmp_ts.log)
You should add some checks, like #Dario did (with an exit after the first match and only converting lines with tablespace, but what to do when no lines match?).
When you don't add the checks you can skip setting the $lookup:
sqlplus -s "sys/Orcl1234 as sysdba" << EOF
SELECT tablespace_name FROM dba_tablespaces
WHERE tablespace_name='$(sed 's/.*tablespace- //' tmp_ts.log)';
exit;
EOF

bash cut and paste SQL insert statement

I would like to remove a column name value pair from one INSERT statement and move it into another INSERT statement. I have about a hundred seperate files that have this sort of format (although the format may vary slightly from file to file, for instance some users may have put the entire INSERT statement on one line).
INPUT
INSERT INTO table1 (
col1,
col2
)
VALUES (
foo,
bar
);
INSERT INTO table2 (
col3,
col4_move_this_one,
col5
)
VALUES (
john,
doe_move_this_value,
doe
);
OUTPUT
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
doe_move_this_value,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);
In general with the above format I was thinking I could use sed and cat in a script to find line numbers of each line to be moved and then move it, something like this.
for file in *; do
line_number=$(cat -n ${file} | sed some_statement | awk to_get_line_number)
# etc...
done
...but maybe you guys can recommend a more clever way that would work also if the INSERT statement is on one line.
With GNU awk for true multi-dimensional arrays, 3rd arg to match(), multi-char RS and \s/\S syntactic sugar:
$ cat tst.awk
BEGIN { RS="\\s*);\\s*" }
match($0,/(\S+\s+){2}([^(]+)[(]([^)]+)[)][^(]+[(]([^)]+)/,a) {
for (i in a) {
gsub(/^\s*|\s*$/,"",a[i])
gsub(/\s*\n\s*/,"",a[i])
}
tables[NR] = a[2]
names[NR][1]; split(a[3],names[NR],/,/)
values[NR][1]; split(a[4],values[NR],/,/)
}
END {
names[1][3] = names[1][2]
names[1][2] = names[2][2]
names[2][2] = names[2][3]
delete names[2][3]
values[1][3] = values[1][2]
values[1][2] = values[2][2]
values[2][2] = values[2][3]
delete values[2][3]
for (tableNr=1; tableNr<=NR; tableNr++) {
printf "INSERT INTO %s (\n", tables[tableNr]
cnt = length(names[tableNr])
for (nr=1; nr<=cnt; nr++) {
print " " names[tableNr][nr] (nr<cnt ? "," : "")
}
print ")"
print "VALUES ("
cnt = length(values[tableNr])
for (nr=1; nr<=cnt; nr++) {
print " " values[tableNr][nr] (nr<cnt ? "," : "")
}
print ");\n"
}
}
.
$ awk -f tst.awk file
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
doe_move_this_value,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);
A GAWK version that relies on gensub's backreference feature and heavily on regex.
$ cat > test.awk
BEGIN {
RS=" *) *; *" # set RS to ");" and prepare to space as well
ORS=");\n"
}
{
sub(/^[ \n]*/,"") # remove emptiness before second INSERT
}
$0 ~ /^INSERT/ && NR==1 {
a=$0 # store the first INSERT
}
$0 ~ /^INSERT/ && NR==2 { # store the second and use gensub to
b=$0 # find the second variables in INSERT and VALUES
split(gensub(/(INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1}[ \n]*([^,]*)[^\)]*\)*[ \n]*/,"\\3 ","g"),c," ")
}
END { # print first INSERT with second variables in place
# and second INSERT with variables removed
print gensub(/((INSERT|VALUES)[^\(]*\((([ \n]*)[^,]*,){1})/,"\\1\\4"c[++i]",\\5","g",a)
print gensub(/((INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1})[ \n]*[^,]*,/,"\\1 ","g",b)
}
This solution assumes that variables to copy are the second variables in the second INSERT after keywords INSERT and VALUES and that they are added to those same places in the first INSERT. Solution is space and \n friendly but doesn't support \t, easily fixed I assume.
$ awk -f test.awk file
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
col4_move_this_one,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);