invalid char ' ' ' in expression - awk
Hi I have the following awk program
Problem is that I don't know why it complains on line 3
"awk ' invalid char ' ' ' in expression " when I do awk -f make.awk info.txt
Anyone of you out there who are brighter than me in this area? =)
#!/bin/bash
function labels2 () {
awk '
/[0-9]/{
print substr($3,length($3)-11), $3
}' $# | /bin/sort -u | awk '{print "BUILD: " NR, $2}'
}
function labels () {
awk '
/[0-9]/{
BL[$3] = substr($3,length($3)-11)
}
END {
asort(BL)
for (i in BL) {
print i, BL[i]
}
}' $#
}
labels $#
exit 0
for a in $#
do
labels $# | gawk '
/BUILD:/ {
BUILD[$2] = $3
BUILDCNT ++
next
}
/[0-9]/ {
DATEd[$3] = $1
TIMEd[$3] = $2
MODULESd[$3] = $4
CASESd[$3] = $5
FAILEDd[$3] = $6
COVERd[$3] = $7
LOCd[$3] = $8
}
END {
SUBSYSTEM=substr(FILENAME, 1, length(FILENAME)-7)
LABEL= "\"" toupper(SUBSYSTEM) "\""
print "{"
print "subsystem: " LABEL ","
print " date: {"
print " label: " LABEL ","
print " data: ["
for (i = 0 ; i <= BUILDCNT; i ++ ) {
B=BUILD[i]
if (DATEd[B]) { print " [" i ", \"" DATEd[B] "\"]," }
}
print " ]"
print " },"
}
' - $a
done
That's not an awk program, but a bash script. To run it, do
chmod +x yourscript
and then
./yourscript parameters
Related
Print only when user keywords match section keywords
I have a bash script calling awk. I want to test whether ukeys contains elements in array kaggr, if so setting display=1. My problem is how to avoid the array test when keywords or ukeys are empty. What can I do? awk -v ukeys="$keys" -v beg_ere="$beg_ere" -v pn_ere="$pn_ere" -v end_ere="$end_ere" \ '$0 ~ beg_ere { title=gensub(beg_ere, "\\2", 1, $0); subtitle=gensub(beg_ere, "\\3", 1, $0); keywords=gensub(beg_ere, "\\4", 1, $0); nk = split(keywords, kaggr, ","); nu = split(ukeys, uaggr, ","); for (i in uaggr) { match=0; for (j in kaggr) { if (uaggr[i] == kaggr[j]) { match=1; break; } } if (match == 1) { display=1; break; } } next } $0 ~ end_ere { display=0 ; print "" } display { sub(pn_ere, "") ; print } ' "$filename" I am passing keys="resource" to match values in keywords. But when keys is empty, I do not want to match anything in keywords.
Just to clarify, if the array "ukeys" or "keywords" are empty you want to skip the for-loop? Would this approach solve your issue? awk -v ukeys="$keys" -v beg_ere="$beg_ere" -v pn_ere="$pn_ere" -v end_ere="$end_ere" \ '$0 ~ beg_ere { title=gensub(beg_ere, "\\2", 1, $0); subtitle=gensub(beg_ere, "\\3", 1, $0); keywords=gensub(beg_ere, "\\4", 1, $0); nk = split(keywords, kaggr, ","); nu = split(ukeys, uaggr, ","); if(length(keys) > 0 && length(keywords) > 0) { # if either arrays are empty, skip this part for (i in uaggr) { match=0; for (j in kaggr) { if (uaggr[i] == kaggr[j]) { match=1; break } } if (match == 1) { display=1; break } } next } } $0 ~ end_ere { display=0 ; print "" } display { sub(pn_ere, "") ; print } ' "$filename"
awk to count 1's 2's 3's Multiple Column
Would like to know how to count based on $1 & $2 combination , number of 1's , 2's ,3's and 4's occurrences from $3,$4,$5,$6 and $7 Sample Input Name,Date,XXX,YYY,ZZZ,AAA,BBB ABC,19-10-2020,2,NA,4,3,NA ABC,19-10-2020,NA,3,NA,NA,4 ABC,18-10-2020,1,NA,4,4,NA ABC,18-10-2020,NA,3,NA,NA,4 CDE,19-10-2020,1,NA,4,3,NA CDE,19-10-2020,NA,2,NA,NA,4 CDE,18-10-2020,3,3,4,3,3 CDE,18-10-2020,NA,3,NA,NA,4 FGH,18-10-2020,4,4,4,4,4 Desired Output Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4 ABC,19-10-2020,0,1,2,2 ABC,18-10-2020,1,0,1,3 CDE,19-10-2020,1,1,1,2 CDE,18-10-2020,0,0,5,2 FGH,18-10-2020,0,0,0,5 I have tried like below command with un-successful output. Please help on this. awk -F"," '{OFS=","; print $1,$2}' | awk -F"," 'BEGIN {count=0} {key=$0; a[key]++} END {for (i in a) print i,a[i]}'
You never need to call awk more than once. You simply sum the occurrences and output, e.g. awk -F, ' BEGIN { OFS="," print "Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4" ones=twos=threes=fours=0 } last && last != $1" "$2 { print $1,$2,ones,twos,threes,fours ones=twos=threes=fours=0 last = $1" "$2 } FNR > 1 { for (i=3; i<=NF; i++) { $i=="1" && ones++ $i=="2" && twos++ $i=="3" && threes++ $i=="4" && fours++ } last=$1" "$2 } END { print $1,$2,ones,twos,threes,fours } ' file.csv Example Use/Output $ awk -F, ' > BEGIN { > OFS="," > print "Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4" > ones=twos=threes=fours=0 > } > last && last != $1" "$2 { > print $1,$2,ones,twos,threes,fours > ones=twos=threes=fours=0 > last = $1" "$2 > } > FNR > 1 { > for (i=3; i<=NF; i++) { > $i=="1" && ones++ > $i=="2" && twos++ > $i=="3" && threes++ > $i=="4" && fours++ > } > last=$1" "$2 > } > END { > print $1,$2,ones,twos,threes,fours > } > ' file.csv Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4 ABC,18-10-2020,0,1,2,2 CDE,19-10-2020,1,0,1,3 CDE,18-10-2020,1,1,1,2 FGH,18-10-2020,0,0,5,2 FGH,18-10-2020,0,0,0,5
This awk should also work: awk 'BEGIN { FS=OFS="," } NR > 1 { k=$1 OFS $2 arr[k] for (i=3; i<=NF; ++i) ++freq[k OFS $i] } END { print "Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4" for (i in arr) print i, freq[i OFS 1]+0, freq[i OFS 2]+0,freq[i OFS 3]+0,freq[i OFS 4]+0 }' file.csv Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4 ABC,19-10-2020,0,1,2,2 ABC,18-10-2020,1,0,1,3 CDE,19-10-2020,1,1,1,2 CDE,18-10-2020,0,0,5,2 FGH,18-10-2020,0,0,0,5
Could you please try following, written and tested with shown samples in GNU awk. awk ' BEGIN{ FS=OFS="," print "Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4" } FNR>1{ till="" delete arr for(i=3;i<=NF;i++){ ind[$1 OFS $2] if($i!="NA"){ arr[$i]++; max_till=(max_till>$i?max_till:$i) } } till=(NF-3) for(j=1;j<=till;j++){ value[$1 OFS $2 OFS j]+=arr[j] } } END{ for(k in ind){ printf("%s,",k) for(i=1;i<=max_till;i++){ printf("%d%s",(value[k OFS i]?value[k OFS i]:0),i==max_till?ORS:OFS) } } }' Input_file Output will be as follows. Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4 ABC,19-10-2020,0,1,2,2 ABC,18-10-2020,1,0,1,3 CDE,19-10-2020,1,1,1,2 CDE,18-10-2020,0,0,5,2 FGH,18-10-2020,0,0,0,5
$ cat tst.awk BEGIN { FS = OFS = "," maxVal = 4 } NR > 1 { key = $1 OFS $2 keys[key] for (i=3; i<=NF; i++) { cnt[key,$i]++ } } END { printf "Name%sDate%s", OFS, OFS for (i=1; i<=maxVal; i++) { printf "CountOF %d%s", i, (i<maxVal ? OFS : ORS) } for (key in keys) { printf "%s%s", key, OFS for (i=1; i<=maxVal; i++) { printf "%d%s", cnt[key,i], (i<maxVal ? OFS : ORS) } } } $ awk -f tst.awk file Name,Date,CountOF 1,CountOF 2,CountOF 3,CountOF 4 ABC,19-10-2020,0,1,2,2 ABC,18-10-2020,1,0,1,3 CDE,19-10-2020,1,1,1,2 CDE,18-10-2020,0,0,5,2 FGH,18-10-2020,0,0,0,5 the for (key in keys) in the END can shuffle the order of output lines. If that's an issue there's various tweaks to solve it. It'd also be trivial to calculate maxVal rather than hard-coding it to 4.
Another awk using array and split function $ awk -F, ' BEGIN {OFS="," } NR>1 { k=$1 OFS $2;$1=$2=""; a[k]=a[k] OFS $0 } END { for(i in a) { printf("%s",i); for(j=1;j<=4;j++) { n=split(a[i],t,j); printf(",%s",n-1) } print "" } } ' count_1234.txt ABC,19-10-2020,0,1,2,2 ABC,18-10-2020,1,0,1,3 CDE,19-10-2020,1,1,1,2 CDE,18-10-2020,0,0,5,2 FGH,18-10-2020,0,0,0,5 $ Breaking up in multiple lines for readability. awk -F, ' BEGIN {OFS="," } NR>1 { k=$1 OFS $2;$1=$2=""; a[k]=a[k] OFS $0 } END { for(i in a) { printf("%s",i); for(j=1;j<=4;j++) { n=split(a[i],t,j); printf(",%s",n-1) } print "" } } '
Awk If-Elseif-If Block
NR>NRMIN{ if($3 == "Leu") { if($4 == "CD1" || $4 == "HD11" || $4 == "HD12" || $4 == "HD13") { next; } } elseif($3 == "Val") { if($4 == "CD1" || $4 == "HD11" || $4 == "HD12" || $4 == "HD13") { next; } } else { print; } } I intend to selectively print lines of a space-delimited file. Please let me know why the above code is giving an error when gawk -f FILE_Modifier.awk NRMIN = 90 FILE > NEWFILE Error Message gawk: FILE_Modifier.awk:7: elseif($3 == "Val") { gawk: FILE_Modifier.awk:7: ^ syntax error gawk: FILE_Modifier.awk:12: else { gawk: FILE_Modifier.awk:12: ^ syntax error
There is no elseif. Anyway, you can rewrite the script as just: awk -v nrmin=90 '(NR > nrmin) && !(($3 ~ /^(Leu|Val)$/) && ($4 ~ /^(CD1|HD11|HD12|HD13)$/))' file Don't use all upper case variable names to avoid clashes with builtin names. Do set variables up front using -v unless you have a specific reason not to.
script is not hitting any validation steps in awk
this script is supposed to read in csv in the following format Name,Date,ID,Number John Smith,09/05/2015,s,999-999-99 Mike Smith,09/06/2015,s,989-979-99 Fred Smith,09/03/2015,s,781-999-99 The first line is a header it is supposed to be skipped. So when script runs every .csv file seems to be moving to the GoodFile direcotory which i think is false positive, i fudged with the validation steps like the 3rd one and entered QE instead of SE(it has to be S or E) it doesn't even hit the code? i am not sure why.. for(linenum = 1; linenum <nr; linenum++) { if (length(dataArr[linenum,3]) == 0){ printf "Failed 3rd a validation" exit 1 #!/bin/sh for file in test/*.csv ; do awk -F',' ' # skip the header and blank lines NR = 1 || NF == 0 {next} #save the data in to a 2d array called dataArr { for (i=1; i <= NF; i++) dataArr[++nr,i] = $i } END { STATUS = "GOOD" #verify coulmn 1 for( linenum=1; linenum <= nr; linenum++) { if (length(dataArr[linenum,1]) == 0){ printf "Failed 1st validation" exit 1 } } printf "file: %s, verify column 1, STATUS: %s\n", FILENAME, STATUS #verify coulmn 2 for(linenum = 1; linenum <nr; linenum++) { if (length(dataArr[linenum,2]) == 0){ printf "Failed 2nd a validation" exit 1 } if ((dataArr[linenum,2]) !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/){ printf "Failed 2nd b validation" exit 1 } } #verify coulmn 3 for(linenum = 1; linenum <nr; linenum++) { if (length(dataArr[linenum,3]) == 0){ printf "Failed 3rd a validation" exit 1 } # has to be either S or E if ((dataArr[linenum,3]) !~ /^[SE]$/){ printf "Failed 3rd b validation" exit 1 } } #verify coulmn 4 for(linenum = 1; linenum <nr; linenum++) { #lenght has to between 9 AND 11 if ((length(dataArr[linenum,4])) < 9 || (length(dataArr[linenum,4]) > 11)){ printf "Failed 4th validation" exit 1 } } }' "$file" if [[ $? -eq 0 ]]; then # "good" status mv ${file} test1/goodFile else # "bad" status mv ${file} test1/badFile fi done
You don't need to save the file in an array, all you need is: awk -F',' ' # skip the header and blank lines NR == 1 || NF == 0 {next} $1 == "" { fails1++ } $2 == "" { fails2a++ } $2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/) { fails2b++ } $3 == "" { fails3a++ } $3 !~ /^[SE]$/ { fails3b++ } length($4) < 9 || length($4) > 11 { fails4++ } END { if (fails1) { print "Failed 1st validation"; exit 1 } if (fails2a) { print "Failed 2nd a validation"; exit 1 } if (fails2b) { print "Failed 2nd b validation"; exit 1 } if (fails3a) { print "Failed 3rd a validation"; exit 1 } if (fails3b) { print "Failed 3rd b validation"; exit 1 } if (fails4) { print "Failed 4th validation"; exit 1 } }' "$file" To print the failure messages to stderr instead of stdout, btw, would portably be: if (fails4) { print "Failed 4th validation" | "cat>&2"; exit 1 } Here's the version if you don't care which error is reported first when the file contains multiple errors: awk -F',' ' # skip the header and blank lines NR == 1 || NF == 0 {next} $1 == "" { print "Failed 1st validation"; exit 1 } $2 == "" { print "Failed 2nd a validation"; exit 1 } $2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/) { print "Failed 2nd b validation"; exit 1 } $3 == "" { print "Failed 3rd a validation"; exit 1 } $3 !~ /^[SE]$/ { print "Failed 3rd b validation"; exit 1 } length($4) < 9 || length($4) > 11 { print "Failed 4th validation"; exit 1 } ' "$file"
tail -f | awk and end tail once data is found
I am trying to build a script which tail -f | awk the log file which is getting updated every second. awk part will fetch me only the required part of the log file based on my search parameter. Output XML is also captured in an output file. Script is working fine - as expected. Issue - However ever after the search is performed it stays hung due to tail -f. Any idea how to update below script - so that once the output XML is captured, it should break the tail part?? XMLF=/appl/logs/abc.log aa_pam=${1-xml} [[ ${2-xml} = "xml" ]] && tof=xml_$(date +%Y%m%d%H%M%S).xml || tof=$2 tail -f $XMLF | \ awk ' BEGIN { Print_SW=0; Cnt_line=1; i=0} /\<\?xml version\=/ { if (Print_SW==1) p_out(Cnt_Line,i) Print_SW=0; Cnt_line=1; } { Trap_arry[Cnt_line++]=$0; } /'${1-xml}'/ { Print_SW=1; } /\<\/XYZ_999/ { if (Print_SW==1) p_out(Cnt_Line, i); Print_SW=0; Cnt_line=1; } END { if (Print_SW==1) p_out(Cnt_Line, i); } function p_out(Cnt_Line, i) { for (i=1;i<Cnt_line;i++) {print Trap_arry[i] | "tee '$tof'" } } ' | tee $tof Update Tried as per below suggestion of using exit - it is existing the script successfully - however the xml that is getting captured at output is getting duplicated. So in the output file - same XML appears twice..!! XMLF=/appl/logs/abc.log aa_pam=${1-xml} [[ ${2-xml} = "xml" ]] && tof=xml_$(date +%Y%m%d%H%M%S).xml || tof=$2 tail -f $XMLF | \ awk ' BEGIN { Print_SW=0; Cnt_line=1; i=0} /\<\?xml version\=/ { if (Print_SW==1) p_out(Cnt_Line,i) Print_SW=0; Cnt_line=1; } { Trap_arry[Cnt_line++]=$0; } /'${1-xml}'/ { Print_SW=1; } /\<\/XYZ_999/ { if (Print_SW==1) p_out(Cnt_Line, i); Print_SW=0; Cnt_line=1; } END { if (Print_SW==1) p_out(Cnt_Line, i); } function p_out(Cnt_Line, i) { for (i=1;i<Cnt_line;i++) {print Trap_arry[i] | "tee '$tof'" } { exit } } ' | tee $tof
Call exit (which will jump to your END block prior to termination) after you are finished capturing your output. When awk terminates, the next write() to stdout by tail -f will result in an EPIPE error. tail knows to terminate when that happens. UPDATE: You seem to be having some problem trying to decide where to put the exit. It should not be in p_out because you call p_out from both the closing XML tag match expression and from the END block. Try this instead: XMLF=/appl/logs/abc.log aa_pam=${1-xml} [[ ${2-xml} = "xml" ]] && tof=xml_$(date +%Y%m%d%H%M%S).xml || tof=$2 tail -f $XMLF | \ awk ' BEGIN { Print_SW=0 Cnt_line=1 i=0 } /\<\?xml version\=/ { if (Print_SW==1) p_out(Cnt_Line,i) Print_SW=0 Cnt_line=1 } { Trap_arry[Cnt_line++]=$0 } /'${1-xml}'/ { Print_SW=1; } /\<\/XYZ_999/ { if (Print_SW==1) p_out(Cnt_Line, i) Print_SW=0 Cnt_line=1 exit } END { if (Print_SW==1) p_out(Cnt_Line, i); } function p_out(Cnt_Line, i) { for (i=1;i<Cnt_line;i++) { print Trap_arry[i] | "tee '$tof'" } } ' | tee $tof
You could, in the awk script, add a line such as: /some-end-of-xml-marker/ { close(/dev/stdin) ; } I didn't try it, but you get the idea: close STDIN when you reached the end of the file, so that the loop in awk stops and you get to the END part (not tested, I hope this proves to be correct...)
Based on this question How to break a tail -f command in bash you could try #! /bin/bash XMLF=/appl/logs/abc.log aa_pam=${1-xml} [[ ${2-xml} = "xml" ]] && tof=xml_$(date +%Y%m%d%H%M%S).xml || tof=$2 mkfifo log.pipe tail -f "$XMLF" > log.pipe & tail_pid=$! awk -vpar1="$aa_pam" -vtof="$tof" -f t.awk < log.pipe kill $tail_pid rm log.pipe where t.awk is: /<\?xml version\=/ { if (Print_SW==1) { p_out(Cnt_Line) } Print_SW=0 Cnt_line=0 } { Trap_arry[++Cnt_line]=$0 } $0 ~ par1 { Print_SW=1; } /<\/XYZ_999/ { if (Print_SW==1) p_out(Cnt_Line) Print_SW=0 Cnt_line=0 } function p_out(Cnt_Line, i) { for (i=1; i<Cnt_line; i++) { print Trap_arry[i] | ("tee " tof) } exit 1 }