awk math: check result before printing - awk

My input is:
cat input
1
4
-2
I want to subtract some value (e.g., 2) from column 1. If result is > 0 print it or else print 0. In other words, I don't want any negative numbers.
My try is:
awk '{NUMBER=$1-2} if (NUMBER > 0) print NUMBER; else print 0'
But I am probably making some syntax mistake.
Wanted ouput is:
0
2
0

This can be an option:
$ awk '{$1=$1-2>0?$1-2:0; print $1}' file
0
2
0
$1=$1-2>0?$1-2:0 is a ternary operator and is read like this:
Set $1 to:
if ($1-2 > 0)
==> $1-2
else ==> 0
and then it is printed out print $1.

With this:
awk '{NUMBER=$1-2} if (NUMBER > 0) print NUMBER; else print 0'
you're putting the if... statement in the condition part of the awk body instead of the action part because it's not enclosed in curly brackets. This is the correct syntax for what you wrote:
awk '{NUMBER=$1-2; if (NUMBER > 0) print NUMBER; else print 0}'
but in reality I'd write it as:
awk '{NUMBER=$1-2; print (NUMBER > 0 ? NUMBER : 0)}'

Related

awk to get result of multiple lines in one sentence with if statement

I am new to awk and I was wondering if I could get one single result for an if operation on awk.
Example:
cat example.txt:
0
0
0
0
0
awk '{ if ($1==0) print "all zeros"; else print "there is 1"}'
result:
all zeros
all zeros
all zeros
all zeros
all zeros
I would like to have only one all zeros as answer or a TRUE . Is this the case where I should use an awk function to return something ? Thanks
Have your code in this way. Written and tested with shown samples.
awk '$0==0{count++} END{if(count==FNR){print "TRUE"}}' Input_file
OR
awk '$0==0{count++} END{if(count==FNR){print "All lines are zeroes"}}' Input_file
OR to print a message when some non-zero line(s) found:
awk '$0==0{count++} END{if(count==FNR){print "TRUE"} else{print "There is non-zero line(s) found."}}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
$0==0{ ##Checking condition if current line is zero then do following.
count++ ##Increasing count with 1 here.
}
END{ ##Starting END block of this program from here.
if(count==FNR){ ##Checking condition if count is equal to number of total lines of file.
print "TRUE" ##If above condition is TRUE then print TRUE here.
}
}
' Input_file ##Mentioning Input_file name here.
Here is an alternative using gnu-awk:
awk -v RS='^(0\r?\n)+$' '{print (NF ? "there is 1" : "all zeros")}' file
all zeros
I would do it following way using GNU AWK let file.txt content be
0
0
0
0
0
then
awk '{nonzero = nonzero || $1!="0"}END{print nonzero?"has not zero":"all zeros"}' file.txt
output
all zeros
Explanation: I am using nonzero to store information if some non-zero was already save (value 1) or not (value 0). If you are about variable which is not set in awk in arithmetic context then its value is 0, so I do not need to declare nonzero=0 in BEGIN section. I harness || which is logical or and might be described as follows:
if you did not see non-zero earlier and do not see it now that mean there is not non-zero element so far (0 || 0 == 0)
if you did not see non-zero earlier and do see it now that mean there is non-zero element so far (0 || 1 == 1)
if you did see non-zero earlier and do not see it now that mean there is non-zero element so far (1 || 0 == 1)
if you did see non-zero eralier and do see it now that mean there is non-zero element so far (1 || 1 == 1)
After processing all lines, in END section I print either has not zero or all zeros depending on nonzero value and harnessing ternary.
(tested in gawk 4.2.1)
Another:
$ awk '$0{exit v=1}END{printf "All %szeroes\n",(v?"not ":"")}' file
Output with sample data:
All zeroes
Alternative output:
All not zeroes
Explained:
$ awk '
$0 { # if record evaluates to non-zero
exit v=1 # jump to END with "parameter" 1
} # why continue once non-zero seen
END {
printf "All %szeroes\n",(v?"not ":"") # if "parameter" v was set, output "not"
}' file
The condition to examine $0 could of course be something more specific (like $0=="0") but it's sufficient for this purpose. exit v=1 sets var v to value 1 but it also exits the program once a non-zero value of $0 has been found and jumps to END where the value of v is examined. The program finally exits with exit code 1. If that is not acceptable, you need to exit from END explicitly with exit 0.

Loop through file to count field number

I have a script bash to add users from a .txt file.
It is really simple:
name firstname uid gid
space separated values
I want to check with awk if each row contains 4 fields. If yes I want to return 1, if not return 0.
file=/my/file.txt
awk=$(awk -F' ' '{(NF != 4) ? res = 0 : res = 1; print res)}' $file)
echo $awk
Right now, awk returns 1 for each row, but I want it to return 1 or 0 at the end, not for each line in the file.
On UNIX you'll return 0 in case of success and !=0 in case of an error. For me it makes more sense to return 0 when all records have 4 fields and 1 when not all records have 4 fields.
To achieve that, use exit:
awk 'NF!=4{exit 1}' file
FYI: awk will exit with 0 by default.
If you want to use it in a shell conditional:
#!/bin/bash
if ! awk 'NF!=4{exit 1}' file ; then
echo "file is invalid"
fi
PS: -F' ' in your example is superfluous because ' ' is the default field delimiter.
You can use:
awk 'res = NF!=4{exit} END{exit !res}' file
This will exit with 1 if all rows have 4 columns otherwise it will exist with 0
Subtle changes to your script would do
result=$(awk -F' ' 'BEGIN{flag=1}NF!=4{flag=0;exit}END{print flag}' "$file")
[ ${result:-0} -eq 0 ] && echo "Problematic entries found in file"
The approach
set the flag to 1 hoping that every record would contain 4 fields.
check if record actually contains 4 fields, if not set flag to zero and exit.
And exit would skip the rest of the input and go to the END rule.
print the flag and store it in result.
Check the result and proceed with the action course.

awk to select CSV file columns greater than a certain number

I have a CSV file named as "awk_column_select_test.csv"
a,b,c
0.2,0.4,0.5
0.3,0.6,0.7
0.4,0.8,0.9
I was trying to write an awk code to select the rows where either column 1 or column 2 or column 3 is great than 0.5.
My awk program, named "awk_select_column_test.awk" looks like this:
#!/usr/bin/awk -f
BEGIN {FS=","; cutoff="0.5"}
{$1 > cutoff || $2 > cutoff || $3 > cutoff}
END {print}
Then, I tried to run on command line using:
awk -f awk_select_column_test.awk awk_column_select_test.csv
I got the following output with only 1 row:
0.4,0.8,0.9
However, I was hoping to get 2 rows like this:
0.3,0.6,0.7
0.4,0.8,0.9
You have extra curly brackets {} and END block print last line, the logic is all right.
#!/usr/bin/awk -f
BEGIN {FS=","; cutoff=0.5}
NR>1 && ($1 > cutoff || $2 > cutoff || $3 > cutoff){print}
This can be written, also
#!/usr/bin/awk -f
BEGIN {FS=","; cutoff=0.5}
NR>1 && ($1 > cutoff || $2 > cutoff || $3 > cutoff)

multiple statements after if statement

I have an awk line to calculate an average that works fine but when I put it into an if statement, I get a syntax error referring to the part with "END". I want to calculate the average only if certain conditions are fulfilled.
Line for calculating average that works:
awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }' input.txt
Line for calculating average after if statement which doesn't work:
awk '{if ( $1 > 5 ) { {sum += $2; n++} END { if (n > 0) print sum / n; }}}' input.txt
I would like to know where the error is, changing the type and number of brackets did not help.
try this
awk '$1>5 {sum+=$2; n++}
END {if(n) print sum/n}' file

awk script for finding smallest value from column

I am beginner in AWK, so please help me to learn it. I have a text file with name snd and it values are
1 0 141
1 2 223
1 3 250
1 4 280
I want to print the entire row when the third column value is minimu
This should do it:
awk 'NR == 1 {line = $0; min = $3}
NR > 1 && $3 < min {line = $0; min = $3}
END{print line}' file.txt
EDIT:
What this does is:
Remember the 1st line and its 3rd field.
For the other lines, if the 3rd field is smaller than the min found so far, remember the line and its 3rd field.
At the end of the script, print the line.
Note that the test NR > 1 can be skipped, as for the 1st line, $3 < min will be false. If you know that the 3rd column is always positive (not negative), you can also skip the NR == 1 ... test as min's value at the beginning of the script is zero.
EDIT2:
This is shorter:
awk 'NR == 1 || $3 < min {line = $0; min = $3}END{print line}' file.txt
You don't need awk to do what you want. Use sort
sort -nk 3 file.txt | head -n 1
Results:
1 0 141
I think sort is an excellent answer, unless for some reason what you're looking for is the awk logic to do this in a larger script, or you want to avoid the extra pipes, or the purpose of this question is to learn more about awk.
$ awk 'NR==1{x=$3;line=$0} $3<x{line=$0} END{print line}' snd
Broken out into pieces, this is:
NR==1 {x=$3;line=$0} -- On the first line, set an initial value for comparison and store the line.
$3<x{line=$0} - On each line, compare the third field against our stored value, and if the condition is true, store the line. (We could make this run only on NR>1, but it doesn't matter.
END{print line} -- At the end of our input, print whatever line we've stored.
You should read man awk to learn about any parts of this that don't make sense.
a short answer for this would be:
sort -k3,3n temp|head -1
since you have asked for awk:
awk '{if(min>$3||NR==1){min=$3;a[$3]=$0}}END{print a[min]}' your_file
But i prefer the shorter one always.
For calculating the smallest value in any column , let say last column
awk '(FNR==1){a=$NF} {a=$NF < a?$NF:a} END {print a}'
this will only print the smallest value of the column.
In case if complete line is needed better to use sort:
sort -r -n -t [delimiter] -k[column] [file name]
awk -F ";" '(NR==1){a=$NF;b=$0} {a=$NF<a?$NF:a;b=$NF>a?b:$0} END {print b}' filename
this will print the line with smallest value which is encountered first.
awk 'BEGIN {OFS=FS=","}{if ( a[$1]>$2 || a[$1]=="") {a[$1]=$2;} if (b[$1]<$2) {b[$1]=$2;} } END {for (i in a) {print i,a[i],b[i]}}' input_file
We use || a[$1]=="" because when 1st value of field 1 is encountered it will have null in a[$1].