Creating an sql query using awk, bash, grep - sql

I have been trying to parse a Paypal Email and insert the resultant info into a Database of mine. I have most of the code working but I cannot get a Variable to insert into my awk code to create the sql insert query.
if [ -f email-data.txt ]; then {
grep -e "Transaction ID:" -e "Receipt No: " email-data.txt \
>> ../temp
cat ../temp \
| awk 'NR == 1 {printf("%s\t",$NF)} NR == 2 {printf("%s\n",$NF)}' \
>> ../temp1
awk '{print $1}' $email-data.txt \
| grep # \
| grep -v \( \
| grep -v href \
>> ../address
email_addr=$(cat ../address)
echo $email_addr
cat ../temp1 \
| awk '{print "INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES"; print "(\x27"($email_addr)"\x27,'1',\x27"$2"\x27,\x27"$3"\x27);"}' \
> /home/linux014/opt/post-new-member.sql
The output looks like the following
INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES('9MU013922L4775929 9MU013922L4775929',1,'9MU013922L4775929','');
Should look like
INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES('dogcat#gmail.com',1,'9MU013922L4775929','1234-2345-3456-4567');
(Names changed to protect the innocent)
The trial data I am using is set out below
Apr 18, 2014 10:46:17 GMT-04:00 | Transaction ID: 9MU013922L4775929
You received a payment of $50.00 USD from Dog Cat (dogcat#gmail.com)
Buyer:
Dog Cat
dogcat#gmail.com
Purchase Details
Receipt No: 1234-2345-3456-4567
I cannot figure out why the email-addr is not being inserted properly.

You are calling a shell variable inside awk. The right way to do that is by creating an awk variable using -v option.
For example, say $email is your shell variable, then
... | awk -v awkvar="$email" '{do something with awkvar}' ...
Read this for more details.
However, having said that, here is how I would try and parse the text file:
awk '
/Transaction ID:/ { tran = $NF }
/Receipt No:/ { receipt = $NF }
$1 ~ /#/ { email = $1 }
END {
print "INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES";
print "("q email q","1","q tran q","q receipt q");"
}' q="'" data.txt
Output:
INSERT INTO users (email,paid,paypal_tran,CCReceipt) VALUES
('dogcat#gmail.com',1,'9MU013922L4775929','1234-2345-3456-4567');

Related

To read and print 1st 1000 rows from a csv using awk command and then next 1000 and so on

I have a csv that has around 25k rows. I have to pick 1000 rows from column#1 and column#2 at a time and then next 1000 rows and so on.
I am using below command, and its working fine in picking up all the values from column#1 and Column#2 i.e 25K fields from both the columns, I want to pick value like 1-1000, put them in my sql export query then 1001-2000,2001-3000 and so on and put the value in WHERE IN in my export query and append the result in dbData.csv file.
My code is below:
awk -F ',' 'NR > 2 {print $1}' $INPUT > column1.txt
i=$(cat column1.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}')
awk -F ',' 'NR > 2 {print $2}' $INPUT > column2.txt
j=$(cat column2.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}')
echo "Please wait - connecting to database..."
db2 connect to $sourceDBStr user user123 using pas123
db2 "export to dbData.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in ($i) and cd.language_id in ($j)"
Let's assume the two first fields of your input CSV are "simple" (no spaces, no commas...) and do not need any kind of quoting. You could generate the tricky part of your query string with an awk script:
# foo.awk
NR >= first && NR <= last {
c1[n+0] = $1
c2[n++] = $2
}
END {
for(i = 0; i < n-1; i++) printf("%s,", c1[i])
printf("%s) %s (%s", c1[n-1], midstr, c2[0])
for(i = 1; i < n; i++) printf(",%s", c2[i])
}
And then use it in a bash loop to process 1000 records per iteration, store the result of the query in a temporary file (e.g., tmp.csv in the following bash script) that you concatenate to your dbData.csv file. The following example bash script uses the same parameters as you do (INPUT, sourceDBStr) and the same constants (dbData.csv, 1000, user123, pas123). Adapt if you need more flexibility. Error management (input file not found, DB connection error, DB query error...) is left as a bash exercise (but should be done).
prefix="export to tmp.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in"
midstr="and cd.language_id in"
rm -f dbData.csv
len=$(cat "$INPUT" | wc -l)
for (( first = 2; first <= len - 999; first += 1000 )); do
(( last = len < first + 999 ? len : first + 999 ))
query=$(awk -F ',' -f foo.awk -v midstr="$midstr" -v first="$first" \
-v last="$last" "$INPUT")
echo "Please wait - connecting to database..."
db2 connect to $sourceDBStr user user123 using pas123
db2 "$prefix ($query)"
cat tmp.csv >> dbData.csv
done
rm -f tmp.csv
But there are other ways using split, bash arrays and simpler awk or sed scripts. Example:
declare -a arr=()
prefix="export to tmp.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in"
midstr="and cd.language_id in"
awk -F, 'NR>1 {print $1, $2}' "$INPUT" | split -l 1000 - foobar
rm -f dbData.csv
for f in foobar*; do
arr=($(awk '{print $1 ","}' "$f"))
i="${arr[*]}"
arr=($(awk '{print $2 ","}' "$f"))
j="${arr[*]}"
echo "Please wait - connecting to database..."
db2 connect to $sourceDBStr user user123 using pas123
db2 "$prefix (${i%,}) $midstr (${j%,})"
cat tmp.csv >> dbData.csv
rm -f "$f"
done
rm -f tmp.csv

How to awk only selected columns and output in linux

I am trying to get only the first and third column of the following output into linux terminal. How can I do this?
my actual output:
akamai-1576314300-xhf78 0/1 Completed 0 5d4h
akamai-1576400700-6m84q 0/1 Completed 0 4d4h
output I need after using awk
akamai-1576314300-xhf78 Completed
akamai-1576400700-6m84q Completed
i am using kubectl get pods | awk '{print $1 print $3}'
but it is not woking...
This is what you are looking for :
kubectl get pods | awk '{ if ($3 == "Completed") { print $1 " " $3 }}'
Hope it helps!
Edit (to create an array of values) :
IFS=$'\n' read -d '' -a myResults <<< "$( kubectl get pods | awk NF | awk '{ if ($3 == "Completed") { print $1 " " $3 }}' )"
And then :
$ echo "${myResults[1]}"
akamai-1576400700-6m84q Completed
$ echo "${myResults[0]}"
akamai-1576314300-xhf78 Completed

Merge commands line

I have these command lines:
grep -e "[0-9] ERROR" /home/aa/lab/utb/cic/nova-all.log | awk '{ print $6 }' | awk -F'-' '{print $3""$2""$1}' | cut -c 1-4,7-8 > part1date.txt
grep -e "[0-9] ERROR" /home/aa/lab/utb/cic/nova-all.log | awk '{ print $3" "$4" "$5" "$9 }' > part1rest.txt
grep -e "[0-9] ERROR" /home/aa/lab/utb/cic/nova-all.log | awk '{ s = ""; for (i = 15; i <= NF; i++) s = s $i " "; print s}' > part1end.txt
paste -d \ part1date.txt part1rest.txt part1end.txt > temp.txt
rm part1*
cat temp.txt
The first 3 lines will save its output in a text file.
Then I merged the columns of these texts in one file to show the output.
Can someone help me to use same command in one line without saving them in textfile?
This command used to change the standard output:
sep 10 11:13:55 node-20 nova-scheduler 2014-10-12 10:36:55.675 3817 ERROR nova.scheduler....
to this format:
ddmmyy hh:mm:ss node-xx PROCESS LOGLEVEL MESSAGE
that means change place of columns and change the format of the date.
awk '/[0-9] ERROR/{gsub("-","",$6);$2=$6;$6=$9;for(i=0;++i<=NF;)$i=i<6?$(i+1):$(i+9);NF-=9;print}' file

How to include dig lookup in awk?

I have an awk command to extract information from mount points (see the accepted answer in How to extract NFS information from mount on Linux and Solaris?):
awk -F'[: ]' '{if(/^\//)print $3,$4,$1;else print $1,$2,$4}
I would like to include a dig lookup in this awk command to lookup the IP of hostnames. Unfortunately, the mount command sometimes include an IP and sometimes a hostname. I tried the following, but it has an unwanted newline, unwanted return code and does not work if there is an IP address:
For hostnames
echo "example.com:/remote/export on /local/mountpoint otherstuff" | awk -F'[: ]' '{if(/^\//)print system("dig +short " $3),$4,$1;else print system("dig +short " $1),$2,$4}'
Returns
93.184.216.119
0 /remote/export /local/mountpoint
For IPs
echo "93.184.216.119:/remote/export on /local/mountpoint otherstuff" | awk -F'[: ]' '{if(/^\//)print system("dig +short " $3),$4,$1;else print system("dig +short " $1),$2,$4}'
Returns
0 /remote/export /local/mountpoint
I would like to retrieve the following in both cases
93.184.216.119 /remote/export /local/mountpoint
Update:
It seems that some versions of dig return the IP when an IP is provided as query and others return nothing.
Solution:
Based on the accepted answer I used the following adapted awk command:
awk -F'[: ]' '{if(/^\//) { system("dig +short "$3" | grep . || echo "$3" | tr -d \"\n\""); print "",$4,$1 } else { system("dig +short "$1" | grep . || echo "$1" | tr -d \"\n\"");print "",$2,$4 };}'
The additional grep . || echo "$3" takes care that the input IP/hostname is returned if dig returns nothing.
The system command in awk executes a command returns its status. Consider this:
$ awk 'END { print "today is " system("date") " and sunny" }' < /dev/null
Tue Jan 7 20:19:28 CET 2014
today is 0 and sunny
The date command outputs the date and a newline. When running from awk the same thing happens. In this example the system finishes before printf itself, so first we see the line with date, and on the next line our text with the return value 0 of system.
To get what we want we need to split this into multiple commands and we don't need the return value of system:
$ awk 'END { printf "today is "; system("date | tr -d \"\n\""); print " and sunny" }' < /dev/null
today is Tue Jan 7 20:24:01 CET 2014 and sunny
To prevent the newline after date, we piped its output to tr -d "\n".
Long story short, change from this:
print system(...), $2, $4
to this:
system(... | tr -d \"\n\"); print "", $2, $4

Escape awk $ in shell function of bashrc

I have a command that gets the next ID of a table from a pool of sql files, now I am trying to put this command as an alias in ~/.bashrc using a shell function, but I did not figure out how to escape $ so it gets to awk and not replaced by bash, here's the code in .bashrc:
function nextval () {
grep 'INSERT INTO \""$1"\"' *.sql | \
awk '{print $6}' | \
cut -c 2- | \
awk -F "," '{print $1}' | \
sort -n | \
tail -n 1 | \
awk '{print $0+1}'
}
alias nextval=nextval
Usage: # nextval tablename
Escaping with \$ I get an the error: awk: backslash not last character on line.
The $ is not inside double quotes, so why bash is replacing it ?
Perhaps the part you really need to change is this
'INSERT INTO \""$1"\"'
to
"INSERT INTO \"$1\""
#konsolebox answered your question but also you could write the function without so many tools and pipes, e.g.:
function nextval () {
awk -v tbl="$1" '
$0 ~ "INSERT INTO \"" tbl "\"" {
split( substr($6,2), a, /,/ )
val = ( ((val == "") || (a[1] > val)) ? a[1] : val)
}
END { print val+1 }
' *.sql
}
It's hard to tell if the above is 100% correct without any sample input or expected output to test it against but it should be close.