script is not hitting any validation steps in awk - awk

this script is supposed to read in csv in the following format
Name,Date,ID,Number
John Smith,09/05/2015,s,999-999-99
Mike Smith,09/06/2015,s,989-979-99
Fred Smith,09/03/2015,s,781-999-99
The first line is a header it is supposed to be skipped. So when script runs every .csv file seems to be moving to the GoodFile direcotory which i think is false positive, i fudged with the validation steps like the 3rd one and entered QE instead of SE(it has to be S or E) it doesn't even hit the code? i am not sure why.. for(linenum = 1; linenum <nr; linenum++) {
if (length(dataArr[linenum,3]) == 0){
printf "Failed 3rd a validation"
exit 1
#!/bin/sh
for file in test/*.csv ; do
awk -F',' '
# skip the header and blank lines
NR = 1 || NF == 0 {next}
#save the data in to a 2d array called dataArr
{ for (i=1; i <= NF; i++) dataArr[++nr,i] = $i }
END {
STATUS = "GOOD"
#verify coulmn 1
for( linenum=1; linenum <= nr; linenum++) {
if (length(dataArr[linenum,1]) == 0){
printf "Failed 1st validation"
exit 1
}
}
printf "file: %s, verify column 1, STATUS: %s\n", FILENAME, STATUS
#verify coulmn 2
for(linenum = 1; linenum <nr; linenum++) {
if (length(dataArr[linenum,2]) == 0){
printf "Failed 2nd a validation"
exit 1
}
if ((dataArr[linenum,2]) !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/){
printf "Failed 2nd b validation"
exit 1
}
}
#verify coulmn 3
for(linenum = 1; linenum <nr; linenum++) {
if (length(dataArr[linenum,3]) == 0){
printf "Failed 3rd a validation"
exit 1
}
# has to be either S or E
if ((dataArr[linenum,3]) !~ /^[SE]$/){
printf "Failed 3rd b validation"
exit 1
}
}
#verify coulmn 4
for(linenum = 1; linenum <nr; linenum++) {
#lenght has to between 9 AND 11
if ((length(dataArr[linenum,4])) < 9 || (length(dataArr[linenum,4]) > 11)){
printf "Failed 4th validation"
exit 1
}
}
}' "$file"
if [[ $? -eq 0 ]]; then
# "good" status
mv ${file} test1/goodFile
else
# "bad" status
mv ${file} test1/badFile
fi
done

You don't need to save the file in an array, all you need is:
awk -F',' '
# skip the header and blank lines
NR == 1 || NF == 0 {next}
$1 == "" { fails1++ }
$2 == "" { fails2a++ }
$2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/) { fails2b++ }
$3 == "" { fails3a++ }
$3 !~ /^[SE]$/ { fails3b++ }
length($4) < 9 || length($4) > 11 { fails4++ }
END {
if (fails1) { print "Failed 1st validation"; exit 1 }
if (fails2a) { print "Failed 2nd a validation"; exit 1 }
if (fails2b) { print "Failed 2nd b validation"; exit 1 }
if (fails3a) { print "Failed 3rd a validation"; exit 1 }
if (fails3b) { print "Failed 3rd b validation"; exit 1 }
if (fails4) { print "Failed 4th validation"; exit 1 }
}' "$file"
To print the failure messages to stderr instead of stdout, btw, would portably be:
if (fails4) { print "Failed 4th validation" | "cat>&2"; exit 1 }
Here's the version if you don't care which error is reported first when the file contains multiple errors:
awk -F',' '
# skip the header and blank lines
NR == 1 || NF == 0 {next}
$1 == "" { print "Failed 1st validation"; exit 1 }
$2 == "" { print "Failed 2nd a validation"; exit 1 }
$2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/) { print "Failed 2nd b validation"; exit 1 }
$3 == "" { print "Failed 3rd a validation"; exit 1 }
$3 !~ /^[SE]$/ { print "Failed 3rd b validation"; exit 1 }
length($4) < 9 || length($4) > 11 { print "Failed 4th validation"; exit 1 }
' "$file"

Related

Print only when user keywords match section keywords

I have a bash script calling awk. I want to test whether ukeys contains elements in array kaggr, if so setting display=1.
My problem is how to avoid the array test when keywords or ukeys are empty. What can I do?
awk -v ukeys="$keys" -v beg_ere="$beg_ere" -v pn_ere="$pn_ere" -v end_ere="$end_ere" \
'$0 ~ beg_ere {
title=gensub(beg_ere, "\\2", 1, $0);
subtitle=gensub(beg_ere, "\\3", 1, $0);
keywords=gensub(beg_ere, "\\4", 1, $0);
nk = split(keywords, kaggr, ",");
nu = split(ukeys, uaggr, ",");
for (i in uaggr) {
match=0;
for (j in kaggr) {
if (uaggr[i] == kaggr[j]) { match=1; break; }
}
if (match == 1) { display=1; break; }
}
next
}
$0 ~ end_ere { display=0 ; print "" }
display { sub(pn_ere, "") ; print }
' "$filename"
I am passing keys="resource" to match values in keywords. But when keys is empty, I do not want to match anything in keywords.
Just to clarify, if the array "ukeys" or "keywords" are empty you want to skip the for-loop? Would this approach solve your issue?
awk -v ukeys="$keys" -v beg_ere="$beg_ere" -v pn_ere="$pn_ere" -v end_ere="$end_ere" \
'$0 ~ beg_ere {
title=gensub(beg_ere, "\\2", 1, $0);
subtitle=gensub(beg_ere, "\\3", 1, $0);
keywords=gensub(beg_ere, "\\4", 1, $0);
nk = split(keywords, kaggr, ",");
nu = split(ukeys, uaggr, ",");
if(length(keys) > 0 && length(keywords) > 0) { # if either arrays are empty, skip this part
for (i in uaggr) {
match=0;
for (j in kaggr) {
if (uaggr[i] == kaggr[j]) {
match=1; break
}
}
if (match == 1) {
display=1; break
}
}
next
}
}
$0 ~ end_ere { display=0 ; print "" }
display { sub(pn_ere, "") ; print }
' "$filename"

AWK Output Produces Function Not Defined

I inherited this code so I'm seeing if someone can help me with the error message.
Here's the AWK file. This file uses a CSV file and is supposed to produce a formatted list.
The return error is FNR=1 fatal: function `header' not defined.
This must have worked at some point in time. Not sure how long it's been broken and I just learned about it a couple of days ago.
Can anyone help?
CSV File
"POSN","STATUS","TITLE","BEGIN_DATE","END_DATE","ROLL","PIDM","A_NUMBER","FIRST_NAME","LAST_NAME","EGRP"
"C99999","A","Title","01-JUL-95","","C",888888,"A00888888","John","Doe1","22"
"C99999","A","Ttile","01-JUL-95","","C",9999999,"A09999999","John","Doe2","23"
"C11111","A","Title","01-JUL-95","","C",0000001,"A00000001","John","Doe3","01"
$PROG_LC.awk
# fieldname len
# 1 posn 6
# 2 status 1
# 3 title 30
# 4 begin_date 10
# 5 end_date 10
# 6 roll 1
# 7 a_number 8
# 8 a_number 9
# 9 first_name 15
# 10 last_name 30
# 11 egrp 4
BEGIN { pagelen = 20; pagewidth = 126
lenheader = 4; lendetail = 1; lenfooter = 2 }
header() {
print trititle("XXXXXXX", "Report",
sprintf("Page %d", pagenum))
print ""
print " Posn S Title Begin Date End Date " \
"R A-Number First Name Last Name Egrp"
print "------ - ------------------------------ ---------- ---------- " \
"- -------- --------- --------------- ------------------------------ ----" }
detail(X) {
printf "%-6.6s %-1.1s %-30.30s %-10.10s %-10.10s %-1.1s %-8.8s %-9.9s " \
"%-15.15s %-30.30s %-4.4s\n", X[1], X[2], X[3], X[4], X[5],
X[6], X[7], X[8], X[9], X[10], X[11] }
footer() { print ""; print trititle(user "#" sid, one_up, today) }
Shell Script
#!/bin/sh
. $BANNER_HOME/local/exe/local_init.sh
H=/home/jobsub/${ORACLE_SID}_LOGS
PROG_LC=`echo $PROG | tr "[A-Z]" "[a-z]"`
PROG_UC=`echo $PROG | tr "[a-z]" "[A-Z]"`
CSV=$H/$(basename $PROG_LC .shl)_${ONE_UP}.csv
LOG=$H/$(basename $PROG_LC .shl)_${ONE_UP}.log
WHOAMI=$(whoami)
echo "BANUID = $BANUID" >> $LOG
echo "ONE_UP = $ONE_UP" >> $LOG
echo "PROG = $PROG" >> $LOG
echo "PRNT = $PRNT" >> $LOG
echo "ORACLE_SID = $ORACLE_SID" >> $LOG
echo "H = $H" >> $LOG
echo "PROG_LC = $PROG_LC" >> $LOG
echo "PROG_UC = $PROG_UC" >> $LOG
echo "CSV = $CSV" >> $LOG
echo "LOG = $LOG" >> $LOG
echo "WHOAMI = $WHOAMI" >> $LOG
echo "LOCAL_EXE = $LOCAL_EXE" >> $LOG
sqlplus -s $BAN9UID/#${TARGETDB} <<EOF
variable status number
begin :status := storeprocs.write_csv_file('$PROG_LC', $ONE_UP);
end;
/
exit :status
EOF
STATUS="$?"
echo "RETURN CODE = $STATUS" >> $LOG
if [ $STATUS -eq 0 ]
then echo "$PROG_UC completed successfully" >> $LOG
else echo "$PROG_UC completed with failure" >> $LOG
fi
if [ -f $LOCAL_EXE/$PROG_LC.awk ]
then LIS=$H/$(basename $PROG_LC .shl)_${ONE_UP}.lis
LC_NUMERIC=en_US.utf8 gawk -f $LOCAL_EXE/csvtolis.awk \
-f $LOCAL_EXE/$PROG_LC.awk $CSV > $LIS
gurinso -n $ONE_UP -l $LIS -j $PROG -w $BANUID $BAN9UID/#${TARGETDB}
fi
exit $STATUS
csvtolis.awk
BEGIN { linenum = 0
pagenum = 0
user = toupper(ENVIRON["BAN9UID"])
sid = ENVIRON["ORACLE_SID"]
oneup = ENVIRON["ONE_UP"]
"date +%m/%d/%Y" | getline today }
function csvsplit(str, arr, i,j,n,s,fs,qt) {
# split comma-separated fields into arr; return number of fields in arr
# fields surrounded by double-quotes may contain commas;
# doubled double-quotes represent a single embedded quote
delete arr; s = "START"; n = 0; fs = ","; qt = "\""
for (i = 1; i <= length(str); i++) {
if (s == "START") {
if (substr(str,i,1) == fs) { arr[++n] = "" }
else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
else { j = i; s = "INFIELD" } }
else if (s == "INFIELD") {
if (substr(str,i,1) == fs) {
arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
else if (s == "INQUOTES") {
if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
else if (s == "MAYBEDOUBLE") {
if (substr(str,i,1) == fs) {
arr[++n] = substr(str,j,i-j-1)
gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
else if (s == "MAYBEDOUBLE") {
arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
else if (s == "START") { arr[++n] = "" }
return n }
function trititle(left, center, right, gap1, gap2) { # assume sufficient space
gap1 = int((pagewidth - length(center)) / 2) - length(left)
gap2 = pagewidth - length(left) - length(center) - length(right) - gap1
return left sprintf("%*s", gap1, "") center sprintf("%*s", gap2, "") right }
NR > 1 { nfields = csvsplit($0, csv); # print one record, with header/footer as needed
if (pagelen - (linenum % pagelen) - lenfooter < lendetail) {
while ((linenum + lenfooter) % pagelen != 0) { print ""; linenum++ }
footer(); linenum += lenfooter }
if (linenum % pagelen == 0) { pagenum++; header(); linenum += lenheader }
detail(csv); linenum += lendetail
if ((linenum + lenfooter) % pagelen == 0) { footer(); linenum += lenfooter } }
END { if (linenum % pagelen != 0) { # if not at top of page
while ((linenum + lenfooter) % pagelen != 0) { # while not at bottom
print ""; linenum++ } # skip to bottom
footer() } } # and print footer

AWK - compare column $1 , append rows together that match (including duplicates)

I'm trying to parse two csv files that contains thousands of rows. The data is to be matched and appended based solely on the data in the first column. I currently have it parsing the files and outputting to 3 files:
1 - key matched
2 - file1 only
3 - file2 only
The issue I am having is that I have noticed once it makes one match it move on to the next line rather than finding the other entries. for the data in question I would rather output multiple lines containing some duplicates than to miss nay data. (The name column for instance varies depending on who entered the data)
INPUT FILES
file1.csv
topic,group,name,allow
fishing,boaties,dave,yes
fishing,divers,steve,no
flying,red,luke,yes
walking,red,tom,yes
file2.csv
Resource,name,email,funny
fishing,frank,frank#home.com,no
swiming,lee,lee#wallbanger.com,no
driving,lee,lee#wallbanger.com,no
CURRENT OUTPUT
key matched
topic,group,name,allow,Resource,name,email,funny
fishing,divers,steve,no,fishing,frank,frank#home.com,no
file1_only
topic,group,user,allow
fishing,divers,steve,no
flying,red,luke,yes
walking,red,tom,yes
file2_only
Resource,user,email,funny
swiming,lee,lee#wallbanger.com,no
driving,lee,lee#wallbanger.com,no
Expected Output
key matched
topic,group,name,allow,Resource,name,email,funny
fishing,divers,steve,no,fishing,frank,frank#home.com,no
fishing,boaties,dave,yes,fishing,frank,frank#home.com,no
file1_only
topic,group,user,allow
flying,red,luke,yes
walking,red,tom,yes
file2_only
Resource,user,email,funny
swiming,lee,lee#wallbanger.com,no
driving,lee,lee#wallbanger.com,no
So for every key in file 1 column 1, it needs to output/append every key that matches in file2 column1.
This is my current awk filter. Im guessing I need to add a loop in if possible?
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = $1 }
NR==FNR {
file1[key] = $0
next
}
key in file1 {
print file1[key], $0 > "./out_combined.csv"
delete file1[key]
next
}
{
print > "./out_file2_only.csv"
}
END {
for (key in file1) {
print file1[key] > "./out_file1_only.csv"
}
}
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
if ( NR==FNR ) {
file1hdr = $0
}
else {
print file1hdr > "./out_file1_only.csv"
print > "./out_file2_only.csv"
print file1hdr, $0 > "./out_combined.csv"
}
next
}
{ key = $1 }
NR==FNR {
file1[key,++cnt[key]] = $0
next
}
{
file2[key]
if ( key in cnt ) {
for ( i=1; i<=cnt[key]; i++ ) {
print file1[key,i], $0 > "./out_combined.csv"
}
}
else {
print > "./out_file2_only.csv"
}
}
END {
for ( key in cnt ) {
if ( !(key in file2) ) {
for ( i=1; i<=cnt[key]; i++ ) {
print file1[key,i] > "./out_file1_only.csv"
}
}
}
}
$ awk -f tst.awk file1.csv file2.csv
$ head out_*
==> out_combined.csv <==
topic,group,name,allow,Resource,name,email,funny
fishing,boaties,dave,yes,fishing,frank,frank#home.com,no
fishing,divers,steve,no,fishing,frank,frank#home.com,no
==> out_file1_only.csv <==
topic,group,name,allow
flying,red,luke,yes
walking,red,tom,yes
==> out_file2_only.csv <==
Resource,name,email,funny
swiming,lee,lee#wallbanger.com,no
driving,lee,lee#wallbanger.com,no

awk - restricting area in file, printing

I have a large input which part looks like:
SUM OF ABSOLUTE VALUES OF CHECKS IS 0.844670D-13
Input-Output in F Format
No. Curve Input Param. Correction Output Param. Standard Deviation
9 0 43.8999000000 -0.2148692026 43.6850307974 0.1066086900
10 0 0.0883000000 -0.0081173828 0.0801826172 0.0006755954
11 0 2.5816650000 0.1530838229 2.7347488229 0.0114687081
15 0 0.2175000000 0.0018561462 0.2193561462 0.0017699976
16 0 80.4198910000 3.4449399961 83.8648309961 0.1158732928
20 0 1.9424000000 0.3078499311 2.2502499311 0.0047924544
23 0 3.5047300000 0.4315780848 3.9363080848 0.0052905759
24 0 5.5942300000 1.8976306735 7.4918606735 0.0092102115
26 0 54804.4046000000 -0.0029799077 54804.4016200923 0.0006133608
Input-Output in D Format
No. Curve Input Param. Correction Output Param. Standard Deviation
9 0 0.4389990000D+02 -0.2148692026D+00 0.4368503080D+02 0.1066086900D+00
10 0 0.8830000000D-01 -0.8117382819D-02 0.8018261718D-01 0.6755954153D-03
11 0 0.2581665000D+01 0.1530838229D+00 0.2734748823D+01 0.1146870812D-01
15 0 0.2175000000D+00 0.1856146162D-02 0.2193561462D+00 0.1769997586D-02
16 0 0.8041989100D+02 0.3444939996D+01 0.8386483100D+02 0.1158732928D+00
20 0 0.1942400000D+01 0.3078499311D+00 0.2250249931D+01 0.4792454358D-02
23 0 0.3504730000D+01 0.4315780848D+00 0.3936308085D+01 0.5290575930D-02
24 0 0.5594230000D+01 0.1897630674D+01 0.7491860674D+01 0.9210211480D-02
26 0 0.5480440460D+05 -0.2979907673D-02 0.5480440162D+05 0.6133608199D-03
I would like to print a column of numbers from the first table from column $5 and $6. I would like to applicate an arithmetic operations for numbers on rows 11, 15 and 20 and print these results instead of number in the table. I have a code:
BEGIN { CONVFMT="%0.17f" }
/D Format/ { exit }
$1 == 9 { prt(1,1) }
$1 == 10 { prt(1,1) }
$1 == 11 { prt(180,3.141592653589) }
$1 == 15 { prt(100,1) }
$1 == 16 { prt(1,1) }
$1 == 20 { prt(10,1) }
$1 == 23 { prt(1,1) }
$1 == 24 { prt(1,1) }
$1 != 26 { prt(1,1) }
function prt(mult, div) {
print trunc($5 * mult / div) ORS trunc($6 * mult / div)
}
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
I would like to get an output:
43.685030
0.106608
0.080182
0.000675
156.68965
0.657068
21.935614
0.176999
83.864830
0.115873
22.502499
0.047924
3.936308
0.005290
7.491860
0.009210
but I am getting these number twice and I haven't got good restricted area in file.
So my questions are:
1) How to print numbers from tables only one times. I mean this 16 numbers:
$1 == 9 { prt(1,1) }
$1 == 10 { prt(1,1) }
$1 == 11 { prt(180,3.141592653589) }
$1 == 15 { prt(100,1) }
$1 == 16 { prt(1,1) }
$1 == 20 { prt(10,1) }
$1 == 23 { prt(1,1) }
$1 == 24 { prt(1,1) }
2) How to restict an area that the program should works with table between strings /F Format/ to /D Format/?
Thank you very much.
Eddited code
BEGIN { CONVFMT="%0.17f" }
/D Format/ { exit }
$1 == 9 { prt(1,1); next }
$1 == 10 { prt(1,1); next }
$1 == 11 { prt(180,3.141592653589); next }
$1 == 15 { prt(100,1); next }
$1 == 16 { prt(1,1); next }
$1 == 20 { prt(10,1); next }
$1 == 23 { prt(1,1); next }
$1 == 24 { prt(1,1); next }
$1 != 26 && $1 + 0 > 0 { prt(1,1); next }
function prt(mult, div) {
print trunc($5 * mult / div) ORS trunc($6 * mult / div)
}
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
The problem of duplicate outputs is due to a single line matching both its own number and the $1 != 26 condition in the end. A simple solution is to add ; next after each prt(…) call.
The problem with zero outputs is likewise due to the $1 != 26 matching too much. You could, for example, add additional conditions to this line (such as $1 != 26 && $1 + 0 > 0).
These changes should produce the desired output. Other than that, the script has a lot of redundancy that could be optimised (e.g., all the { prt(1,1); next } lines could be merged into one with a more complex condition), but that may not be worthwhile for a one-off script.
edit: For example, this could be a complete set of pattern lines for this example:
/D Format/ { exit }
!($1 ~ /^[1-9]/) { next }
$1 == 26 { next }
$1 == 11 { prt(180,3.141592653589); next }
{ prt(1,1) }

Awk If-Elseif-If Block

NR>NRMIN{
if($3 == "Leu") {
if($4 == "CD1" || $4 == "HD11" || $4 == "HD12" || $4 == "HD13") {
next;
}
}
elseif($3 == "Val") {
if($4 == "CD1" || $4 == "HD11" || $4 == "HD12" || $4 == "HD13") {
next;
}
}
else {
print;
}
}
I intend to selectively print lines of a space-delimited file.
Please let me know why the above code is giving an error when gawk -f FILE_Modifier.awk NRMIN = 90 FILE > NEWFILE
Error Message
gawk: FILE_Modifier.awk:7: elseif($3 == "Val") {
gawk: FILE_Modifier.awk:7: ^ syntax error
gawk: FILE_Modifier.awk:12: else {
gawk: FILE_Modifier.awk:12: ^ syntax error
There is no elseif. Anyway, you can rewrite the script as just:
awk -v nrmin=90 '(NR > nrmin) && !(($3 ~ /^(Leu|Val)$/) && ($4 ~ /^(CD1|HD11|HD12|HD13)$/))' file
Don't use all upper case variable names to avoid clashes with builtin names. Do set variables up front using -v unless you have a specific reason not to.