simple awk string comparison unexpected result - awk

In general string comparison, "A" > "a" is false.
However, I am getting unexpected result from this awk execution:
$ echo "A a"| awk '{if ($1 > $2) print "gt"; else print "leq"}'
gt
What am I missing?
Environment info:
$ uname -r -s -v -M
AIX 1 6 IBM,9110-510
$ locale
LANG=en_AU.8859-15
LC_COLLATE="en_AU.8859-15"
LC_CTYPE="en_AU.8859-15"
LC_MONETARY="en_AU.8859-15"
LC_NUMERIC="en_AU.8859-15"
LC_TIME="en_AU.8859-15"
LC_MESSAGES="en_AU.8859-15"
LC_ALL=
Diagnostics:
$ echo "A a"| awk '{print NF}'
2
Update It produces the correct result after setting LC_ALL=POSIX (thanks JS웃). Need to investigate further into this.

I am unable to reproduce this but you can force a string comparison by concatenating the operand with the null string:
echo "A a"| awk '{if ($1"" > $2"") print "gt"; else print "leq"}'
Note: Concatenating with any one operand should suffice.
Update:
As suspected the locale settings of OP were causing the issue. After setting LC_ALL=POSIX the issue was resolved.

Related

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

SED extract first occurance after 2 patterns match

I'm trying to use c-shell (I'm afraid that other option is not available) and SED to solve this problem. Given this example file with a report of all some tests that were failing:
============
test_085
============
- Signature code: F2B0C
- Failure reason: timeout
- Error: test has timed out
============
test_102
============
- Signature code: B4B4A
- Failure reason: syntax
- Error: Syntax error on file example.c at line 245
============
test_435
============
- Signature code: 000FC0
- Failure reason: timeout
- Error: test has timed out
I have a script that loops through all the tests that I'm running and I check them against this report to see if has failed and do some statistics later on:
if (`grep -c $test_name $test_report` > 0) then
printf ",TEST FAILED" >>! $report
else
printf ",TEST PASSED" >>! $report
endif
What I would like to do is to extract the reason if $test_name is found in $test_report. For example for test_085 I want to extract only 'timeout', for test_102 extract only 'syntax' and for test_435 'timeout', for test_045 it won't be the case because is not found in this report (meaning it has passed). In essence I want to extract first occurrence after these two pattern matches: test_085, Failure reason:
To extract "Failure reason" for the specified test name - short awk approach:
awk -v t_name="test_102" '$1==t_name{ f=1 }f && /Failure reason/{ print $4; exit }' reportfile
$1==t_name{ f=1 } - on encountering line matching the pattern(i.e. test name t_name) - set the flag f into active state
f && /Failure reason/ - while iterating through the lines under considered test name section (while f is "active") - capture the line with Failure reason and print the reason which is in the 4th field
exit - exit script execution immediately to avoid redundant processing
The output:
syntax
You can try handling RS and FS variables of awk to make the parsing easier:
$ awk -v RS='' -F='==*' '{gsub(/\n/," ")
sub(/.*Failure reason:/,"",$3)
sub(/- Error:.*/,"",$3)
printf "%s : %s\n",$2,$3}' file
output:
test_085 : timeout
test_102 : syntax
test_435 : timeout
If you don't care the newlines, you can remove the gsub() function.
Whenever you have input that has attributes with name to value mappings as your does, the best approach is to first create an array to capture those mappings (n2v[]) below and then access the values by their names. For example:
$ cat tst.awk
BEGIN { RS=""; FS="\n" }
$2 == id {
for (i=4; i<=NF; i++) {
name = value = $i
gsub(/^- |:.*$/,"",name)
gsub(/^[^:]+: /,"",value)
n2v[name] = value
}
print n2v[attr]
}
$ awk -v id='test_085' -v attr='Failure reason' -f tst.awk file
timeout
$ awk -v id='test_085' -v attr='Error' -f tst.awk file
test has timed out
$ awk -v id='test_102' -v attr='Signature code' -f tst.awk file
B4B4A
$ awk -v id='test_102' -v attr='Error' -f tst.awk file
Syntax error on file example.c at line 245
$ awk -v id='test_102' -v attr='Failure reason' -f tst.awk file
syntax

Regex "^[[:digit:]]$" not working as expected in AWK/GAWK

My GAWK version on RHEL is:
gawk-3.1.5-15.el5
I wanted to print a line if the first field of it has all digits (no special characters, even space to be considered)
Example:
echo "123456789012345,3" | awk -F, '{if ($1 ~ /^[[:digit:]]$/) print $0}'
Output:
Nothing
Expected Output:
123456789012345,3
What is going wrong here ? Does my AWK version not understand the GNU character classes ? Kindly help
To match multiple digits in the the [[:digit:]] character class add a +, which means match one or more number of digits in $1.
echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/) print $0}'
123456789012345,3
which satisfies your requirement.
A more idiomatic way ( as suggested from the comments) would be to drop the print and involve the direct match on the line and print it,
echo "123456789012345,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
123456789012345,3
Some more examples which demonstrate the same,
echo "a1,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
(and)
echo "aa,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
do NOT produce any output a per the requirement.
Another POSIX compliant way to do strict length checking of digits can be achieved with something like below, where {3} denotes the match length.
echo "123,3" | awk --posix -F, '$1 ~ /^[0-9]{3}$/'
123,3
(and)
echo "12,3" | awk --posix -F, '$1 ~ /^[0-9]{3}$/'
does not produce any output.
If you are using a relatively newer version of bash shell, it supports a native regEx operator with the ~ using POSIX character classes as above, something like
#!/bin/bash
while IFS=',' read -r row1 row2
do
[[ $row1 =~ ^([[:digit:]]+)$ ]] && printf "%s,%s\n" "$row1" "$row2"
done < file
For an input file say file
$ cat file
122,12
a1,22
aa,12
The script produces,
$ bash script.sh
122,12
Although this works, bash regEx can be slower a relatively straight-forward way using string manipulation would be something like
while IFS=',' read -r row1 row2
do
[[ -z "${row1//[0-9]/}" ]] && printf "%s,%s\n" "$row1" "$row2"
done < file
The "${row1//[0-9]/}" strips all the digits from the row and the condition becomes true only if there are no other characters left in the variable.
Here you are printing every line that matches a pattern. This is exactly the purpose of grep. Since #Inian brilliantly told you what was wrong with your code, let me propose an alternative grep-based answer that does exactly the same as the awk command (albeit much faster):
grep -E '^[[:digit:]]+,'
Could you please try following and let me know if this helps.
echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]*)$/) print $0}'
EDIT: Above code could be reduced a bit to as follows too.
echo "123456789012345,3" | awk -F, '($1 ~ /^[[:digit:]]*$/)'

Assigning a value in a variable

I need assign into a variable a value using awk:
$ echo "hello world" | awk -v x=substr($0,3,4) '{print $x}'
-bash: syntax error near unexpected token `('
What's wrong here ?
I think what you want to do is
$ echo "hello world" | awk '{print substr($0,3,4)}'
llo
substr is an awk function and only available in the awk script (inside the quotes)
Or, assigning it to a variable
$ awk -v x='hello world' 'BEGIN{print substr(x,3,4)}'
llo

Converting a parameter to a floating point number in AWK

I have this code that works fine:
time1=23245321;
ratio=0.9761;
time1=int(time1*ratio);
but I get it not to work when I transform the 'ratio' variable to a parameter passed to the script with the -v option - time1 results to be equal to 0 (zero).
awk -f script.awk -v ratio=0.9761
It seems ratio is no longer treated as a float. How can I solve this problem?
This works fine on my machine:
awk -v ratio=0.9761 '{
time1=23245321;
time1=int(time1*ratio);
print ratio,time1}' <(echo "Hello world!")
It returns:
0.9761 22689757
It was just a 'locale' issue: I should have called the script using
awk -f script.awk -v ratio=0,9761
because the comma is the decimal separator on my laptop