Awk to scape quotation marks - awk

So I have a file like
select * from tb where start_date = to_date('20131010','yyyymmdd');
p23 VARCHAR2(300):='something something
still part of something above with 'this' between single quotes and close
something to end';
(code goes on)
That would be some automatically generated code which I should be able execute via sqlplus. But that obviously won't work, since the third line should have had its quotes escaped like (..) with ''this'' between (...).
I can't access the script that generated that code, but I was trying to get a awk to do the job. Notice the script has to be smart enough to not scape every quote in the code (the to_date('20131010','yyyymmdd') is correct).
I am no expert in awk, so I went as far as:
BEGIN {
RS=";"
FS="\n"
}
/\tp[0-9]+/{
ini = match($0, "\tp[0-9]+")
fim = match($0, ":='")
s = substr($0,ini,fim+1)
txt = substr($0, fim+3, length($0))
block = substr(txt, 0, length(txt)-1)
print gensub("'", "''", block)
}
!/\tp[0-9]+/{
print $0";"
}
but it went way too messy with the print gensub("'", "''", block) and it is not working.
Can someone give me a quick way out?

You have forgotten one parameter to gensub. Try:
BEGIN {
RS=";"
FS="\n"
}
/^[[:space:]]+p[0-9]+/{
ini = match($0, "\tp[0-9]+")
fim = match($0, ":='")
s = substr($0,ini,fim+1)
txt = substr($0, fim+3, length($0))
block = substr(txt, 0, length(txt)-1)
printf "%s'%s';", s, gensub("'", "''", "g",block)
next
}
{
printf "%s;", $0
}

Related

Using awk to analyze log file to identify blocks and to extract information

I am trying to figure out a way to use awk to analyze my log files from an old application. The log file contains processing information from the application but the structure is a bit messy. But it has a structure like this:
some random text
...
BLOCK-BEGIN bla bla INFO1:VAL1
variable lines of text
INFO2:VAL2
variable lines of text
POSSIBLE-BLOCK-END-PHRASE1
...
some random text
INFO3:not-desired-val5
...
BLOCK-BEGIN bla bla INFO1:VAL3
variable lines of text
INFO2:VAL4
variable lines of text
POSSIBLE-BLOCK-END-PHRASE2
...
What I want to do is to first identify the blocks. In this example above, there are two blocks with same block beginning but different endings. Within each block, I want to extract then few information, i.e. INFO1,INFO2 in the example. The desired output in this case would be:
VAL1,VAL2
VAL3,VAL4
I know some basic of awk. Therefore, any solutions or hints are highly welcome. Thanks
Update: my first attempt
awk '/BLOCK-BEGIN/{printf substr($4,7)",";for (i = 0 ; i < NF; i++) getline; if($0 ~ '/^INFO2/') print substr($0,7)}'
The output is:
VAL1,VAL2
VAL3,VAL4
But is there a better way to do it? Any suggestions?
$ awk -v OFS=',' '
(split($NF,a,/:/) == 2) && sub(/^INFO/,"",a[1]) {
info[a[1]] = a[2]
if ( a[1] == 2 ) {
print info[1], info[2]
}
}
' file
VAL1,VAL2
VAL3,VAL4
Regarding the code you posted in your question:
printf substr($4,7)"," - never do printf <input data> as it'll fail when your input contains printf formatting characters, always do printf "%s", <input data> instead so that could should be written printf "%,",substr($4,7).
getline - there's aonly a few specific situations where getline is the right approach and when it is you have to write it securely. This isn't the right situation and it's not written securely. See awk.freeshell.org/AllAboutGetline.
for (i = 0 ; i < NF; i++) all field numbers, array indices, and string character positions in awk start at 1, not 0, so write your code to match to you don't trip over thinking arrays or anything else start at zero - for (i = 1 ; i <= NF; i++).
'foo... $0 ~ '/^INFO2/' ...bar' those inner 's are terminating the awk script body and so exposing what's between them to the shell for interpretation. Never do that. In this case idk why you thought you needed them as your code should just be 'foo... $0 ~ /^INFO2/ ...bar'.
With your shown samples only, please try following awk code.
awk -F'INFO[0-9]+:' '
/BLOCK-BEGIN/{
if(val2 && val1){
print val1","val2
}
val1=val2=""
val1=$NF
next
}
/^INFO[0-9]+:/{
val2=(val2?val2 ",":"") $NF
}
END{
if(val2 && val1){
print val1","val2
}
}
' Input_file

Else syntax error when nesting array formula

I am recieving a syntax error on "else" for this shell:
{for (i=8;i<=NF;i+=3)
{if ($0~"=>") # if-else statement designed to flag file / directory transfers
print "=> flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2);
{split ($(i+2), array, "/");
for (x in array)
{j++;
a[j] =j;
printf (array[x] ",");}
printf ("%s\n", "");}
else
print "no => flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2)
}
}
Can't figure out why. If I delete the array block (starting with split()), all is well. But I need to scan the contents of $(i+2), so cutting it does me no good.
Also, if anyone has guidance on a good list of how to interpret error messages, that would be great.
Thanks for your advice.
EDIT: here is the above script laid out with sensible formatting:
{
for (i=8;i<=NF;i+=3) {
if ($0~"=>") # if-else statement designed to flag file / directory transfers
print "=> flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2);
{
split ($(i+2), array, "/");
for (x in array) {
j++;
a[j] =j;
printf (array[x] ",");
}
printf ("%s\n", "");
}
else
print "no => flag,"$1"," $2","$3","$4 ","$5","$6","$7"," $(i)","$(i+1)","$(i+2)
}
}
First thing first, since you didn't post any samples of input and expected output so didn't test it at all. Could you please try following, I hope you are running this in .awk script style. Also these are mostly syntax/cosmetic changes NOT on logic part, since no background was given on problem.
BEGIN{
OFS=","
}
{
for (i=8;i<=NF;i+=3){
if ($0~/=>/){
print "=> flag,"$1,$2,$3,$4,$5,$6,$7,$(i),$(i+1),$(i+2)
split ($(i+2), array, "/");
for(x in array){
j++;
a[j] =j;
printf (array[x] ",")
}
printf ("%s\n", "")
}
else{
print "no => flag",$1,$2,$3,$4,$5,$6,$7,$(i),$(i+1),$(i+2)
}
}
}
Problems fixed in OP's attempt:
{ starting curly braces(which indicates that if condition of for loop with multiple statements is started) could be in last of the line where they are present, NOT in next line, for better visibility purposes, I fixed in for loop and if condition first.
Since you are using regexp matching with a pattern so I fixed from $0~"=>" TO $0~/=>/.
Added BEGIN section in your attempt where I have set OFS(output field separator) value to , so that you need NOT to print like "," to print comma between variables, just , between variables will do the trick.
Fixed indentation, so that we are NOT confused where to close loop/condition and where to NOT.

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1
Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

Removing Quote From Field For Filename Using AWK

I've been playing around with this for an hour trying to work out how to embed the removal of quotes from a specific field using AWK.
Basically, the file encapsulates text in quotes, but I want to use the second field to name the file and split them based on the first field.
ID,Name,Value1,Value2,Value3
1,"AAA","DEF",1,2
1,"AAA","GGG",7,9
2,"BBB","DEF",1,2
2,"BBB","DEF",9,0
3,"CCC","AAA",1,1
What I want to get out are three files, all with the header row named:
AAA [1].csv
BBB [2].csv
CCC [3].csv
I have got it all working, except for the fact that I can't for the life of me work out how to remove the quotes around the filename!!
So, this command does everything (except the file is named with quotes around $2, but I need to do some kind of transformation on $2 before it goes into evname. In the actual file, I want to keep the encapsulating quotes.
awk -F, 'NR==1{h=$0;next}!($1 in files){evname=$2" ["$1"].csv";files[$1]=1;print h>evname}{print > evname}' DataExtract.csv
I've tried to push a gsub into this, but I'm struggling to work out exactly how this should look.
This is I think as close as I have got, but it is just calling everything "2" for $2, I'm not sure if this means I need to do an escape of $2 somehow in the gsub, but trying that doesn't seem to be working, so I'm at a loss as to what I'm doing wrong.
awk -F, 'NR==1{h=$0;next}!($1 in files){evname=gsub(""\","", $2)" - Event ID ["$1"].csv";files[$1]=1;print h>evname}{print > evname}' DataExtract.csv
Any help greatly appreciated.
Thanks in advance!!
Gannon
If I understand what you are attempting correctly, then
awk -F, 'NR==1{h=$0;next}!($1 in files){gsub(/"/, "", $2); evname=$2" ["$1"].csv";files[$1]=1;print h>evname}{print > evname}' DataExtract.csv
should work. That is
NR == 1 {
h = $0;
next
}
!($1 in files) {
stub = $2 # <-- this is the new bit: make a working copy
# of $2 (so that $2 is unchanged and the line
# is not rebuilt with changes for printing),
gsub(/"/, "", stub) # remove the quotes from it, and
evname = stub " [" $1 "].csv" # use it to assemble the filename.
files[$1] = 1;
print h > evname
}
{
print > evname
}
You can, of course, use
evname = stub " - Event ID [" $1 "].csv"
or any other format after the substitution (this one seems to be what you tried to get in your second code snippet).
The gsub function returns the number of substitutions made, not the result of the substitutions; that is why evname=gsub(""\","", $2)" - Event ID ["$1"].csv" does not work.
Things are always clearer with a little white space:
awk -F, '
NR==1 { hdr=$0; next }
!seen[$1]++ {
evname = $2
gsub(/"/,"",evname)
outfile = evname " [" $1 "].csv"
print hdr > outfile
}
{ print > outfile }
' DataExtract.csv
Aside: It's pretty unusual for someone to WANT to create files with spaces in their names given the complexity that introduces in any later scripts you write to process them. You sure you want to do that?
P.S. here's the gawk version as suggested by #JID below
awk -F, '
NR==1 { hdr=$0; next }
!seen[$1]++ {
outfile = gensub(/"/,"","g",$2) " [" $1 "].csv"
print hdr > outfile
}
{ print > outfile }
' DataExtract.csv
Apply the gsub before you make the assignment:
awk -F, 'NR==1{h=$0;next}
!($1 in files){
gsub("\"","",$2); # Add this line
evname=$2" ["$1"].csv";files[$1]=1;print...

Endless recursion in gawk-script

Please pardon me in advance for posting such a big part of my problem, but I just can't put my finger on the part that fails...
I got input-files like this (abas-FO if you care to know):
.fo U|xiininputfile = whatever
.type text U|xigibsgarnich
.assign U|xigibsgarnich
..
..Comment
.copy U|xigibswohl = Spaß
.ein "ow1/UWEDEFTEST.FOP"
.in "ow1/UWEINPUT2"
.continue BOTTOM
.read "SOemthing" U|xttmp
!BOTTOM
..
..
Now I want to recursivly follow each .in[put]/.ein[gabe]-statement, parse the mentioned file and if I don't know it yet, add it to an array. My code looks like this:
#!/bin/awk -f
function getFopMap(inputregex, infile, mandantdir, infiles){
while(getline f < infile){
#printf "*"
#don't match if there is a '
if(f ~ inputregex "[^']"){
#remove .input-part
sub(inputregex, "", f)
#trim right
sub(/[[:blank:]]+$/, "", f)
#remove leading and trailing "
gsub(/(^\"|\"$)/,"" ,f)
if(!(f in infiles)){
infiles[f] = "found"
}
}
}
close(infile)
for (i in infiles){
if(infiles[i] == "found"){
infiles[i] = "parsed"
cmd = "test -f \"" i "\""
if(system(cmd) == 0){
close(cmd)
getFopMap(inputregex, f, mandantdir, infiles)
}
}
}
}
BEGIN{
#Matches something like [.input myfile] or [.ein "ow1/myfile"]
inputregex = "^\\.(in|ein)[^[:blank:]]*[[:blank:]]+"
#Get absolute path of infile
cmd = "python -c 'import os;print os.path.abspath(\"" ARGV[1] "\")'"
cmd | getline rootfile
close(cmd)
infiles[rootfile] = "parsed"
getFopMap(inputregex, rootfile, mandantdir, infiles)
#output result
for(infile in infiles) print infile
exit
}
I call the script (in the same directory the paths are relative to) like this:
./script ow1/UWEDEFTEST.FOP
I get no output. It just hangs up. If I remove the comment before the printf "*" command, I'm seeing stars, without end.
I appreciate every help and hints how to do it better.
My awk:
gawk Version 3.1.7
idk it it's your only problem but you're calling getline incorrectly and consequently will go into an infinite loop in some scenarios. Make sure you fully understand all of the caveats at http://awk.info/?tip/getline and you might want to use the recursion example there as the starting point for your code.
The most important item initially for your code is that when getline fails it can return a negative value so then while(getline f < infile) will create an infinite loop since the failing getline will always be returning non-zero and will so continue to be called and continue to fail. You need to use while ( (getline f < infile) > 0) instead.