It is my understanding that in awk a conditional evaluation could be started using either of the following:
if ($0 ~ /no/) {cmd}
($0 ~ /no/) {cmd}
$0 ~ /no/ {cmd}
/no/ {cmd}
In the generic command line
BEGIN { } (body) END { }
I find it most logical to enclose (body) in brackets, as in {(body} (referred to as "bracketed").
Under GNU awk, Ubuntu 12.04, only the 1st of the 4 options executes if bracketed - at least on my machine (the others produce syntax errors). If i run the line un-bracketed, only the 1st failed, the rest 3 work fine. Could someone carefully explain why that is so?
awk statements follow the rule:
PATTERN{action}
So BEGIN or END are just special PATTERNs. basically if PATTERN is true, do the action in {..}
PATTERN could be regex, expression and range also empty.
the empty pattern looks like:
awk '{print "foo"}' input
You can read http://www.gnu.org/software/gawk/manual/gawk.html section 7.1 for deatails.
Back to your question, if you executed those 4 lines in Action part, that is, between {...}, (in fact empty pattern), only the first, with if is valid conditional statement. However if you use the matching check as pattern, (outside the {..}):
if ($0 ~ /no/) {cmd} # this would ***NOT*** work, because it is statment, not valid awk expression. (I don't know how did you make it work.)
($0 ~ /no/) {cmd} # this will work, it is a boolean expression
$0 ~ /no/ {cmd} # same as above
/no/ {cmd} # this is regexp pattern, so it works too.
Syntactically, an awk script is a set of rules, like the following [Ref]:
pattern { action }
pattern { action }
....
Where pattern is an expression, and action is a series of commands that are executed if pattern is evaluated to true (a non-zero number, or a non-empty string).
If the pattern is omitted, then action will be executed for every record.
If the { action } part is omitted, the default action is executed, which is { print $0 } or { print } (equivalent).
As you mentioned in the question, pattern can be BEGIN, END, ... etc, or any other expression.
Your first example will be executed correctly in the { action } block, since it is a command. The other three options are not commands, but are combinations of pattern { action } blocks.
Related
To print multiple(2) lines following the pattern using awk:
I have found somewhere the following solution
$ awk '/Linux/{x=NR+2}(NR<=x){print}' file
Linux
Solaris
Aix
I am trying to understand the syntax
Generally awk syntax is
awk 'pattern{action}' file
Here we find
pattern = /Linux/
action = {x=NR+2}
then what is (NR<=x){print}
Solution:
My understaning of c-like syntax for this is:
While read (file,line)
{
if (line ~ '/pattern/') then
{
x= NR+2
}
if (NR <= x)
{
print
{
}
for NR=1 and if (line ~ '/pattern/') then x is set to NR+2 eg(1+2 =3). This value will not be reset till the process is over. SO when the next line is read and !(line ~ '/pattern/') then x is still 3, (NR (2) <= 3) is true so it prints the next line
Thanks to #Edmorton for the undestating
FWIW I wouldn't write the code you're asking about, instead I'd write:
awk '/Linux/{c=3} c&&c--' file
See example "g" at https://stackoverflow.com/a/17914105/1745001.
Having said that, your original code in C-like syntax would be:
NR=0
x=0
While read (file,line)
{
NR++
if (line ~ "Linux") {
x = NR+2
}
if (NR <= x) {
print
}
}
Btw, I know it's frequently mis-used but don't use the word "pattern" in your software as it's highly ambiguous - use string or regexp or condition (or in shell but not awk, sed, grep, etc. and only where appropriate "globbing pattern"), whichever it is you really mean.
For example you wrote that awk syntax is:
awk 'pattern{action}' file
No. Or maybe, depending on what you think "pattern" means! Despite what many books, tutorials, etc. say so as to remove any ambiguity you should simply think of awk syntax as:
awk 'condition{action}' file
where condition can be any of:
a key word like BEGIN or END
an arithmetic expression like var < 7 or NF or 1
a regexp comparison like $0 ~ "foo" or $0 ~ /foo/ or /foo/ or $0 ~ var or match($0,/foo/)
a string comparison like $0 == "foo" or index($0,"foo")
nothing at all in which case it's assumed to be true when there's an associated action block.
and probably other things I'm forgetting to list.
your script has two blocks
$ awk '/Linux/ {x=NR+2}
NR<=x {print}' file
first block sets the variable x, second uses to print the lines. Note that you can drop {print}, since it's the default action.
I'm having trouble understanding this awk code:
$0 ~ ENVIRON["search"] {
match($0, /id=[0-9]+/);
if (RSTART) {
print substr($0, RSTART+3, RLENGTH-3)
}
}
How do the ~ and match() operators interact with each other?
How does the match() have any effect, if its output isn't printed or echo'd? What does it actually return or do? How can I use it in my own code?
This is related to Why are $0, ~, &c. used in a way that violates usual bash syntax docs inside an argument to awk?, but that question was centered around understanding the distinction between bash and awk syntaxes, whereas this one is centered around understanding the awk portions of the script.
Taking your questions one at a time:
How do the ~ and match() operators interact with each other?
They don't. At least not directly in your code. ~ is the regexp comparison operator. In the context of $0 ~ ENVIRON["search"] it is being used to test if the regexp contained in the environment variable search exists as part of the current record ($0). If it does then the code in the subsequent {...} block is executed, if it doesn't then it isn't.
How does the match() have any effect, if its output isn't printed or
echoed?
It identifies the starting point (and stores it in the awk variable RSTART) and the length (RLENGTH) of the first substring within the first parameter ($0) that matches the regexp provides as the second parameter (id=[0-9]+). With GNU awk it can also populate a 3rd array argument with segments of the matching string identified by round brackets (aka "capture groups").
What does it actually return or do?
It returns the value of RSTART which is zero if no match was found, 1 or greater otherwise. For what it does see the previous answer.
How can I use it in my own code?
As shown in the example you posted would be one way but that code would more typically be written as:
($0 ~ ENVIRON["search"]) && match($0,/id=[0-9]+/) {
print substr($0, RSTART+3, RLENGTH-3)
}
and using a string rather than regexp comparison for the first part would probably be even more appropriate:
index($0,ENVIRON["search"]) && match($0,/id=[0-9]+/) {
print substr($0, RSTART+3, RLENGTH-3)
}
Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn how to use awk.
use the regex id=[0-9]+ to find a match in each line
if the start position of the match (RSTART) is not 0 then:
print the match without the id=
this is shorter but does the same:
xinput --list | grep -Po 'id=[0-9]+' | cut -c4-
This comment on awk change once per file made me think 1 and {print} are equal in awk. But it is not.
awk '/^\S/ {core=0} /^_core/ {core=1} !core 1' views.view.who_s_online.yml|head
uuid: 50715f68-3b13-4a15-8455-853110fd1d8b
langcode: en
status: true
dependencies:
module:
- user
_core:
default_config_hash: DWLCIpl8ku4NbiI9t3GgDeuW13KSOy2l1zho7ReP_Bg
id: who_s_online
label: 'Who''s online block'
Compare to (and this is what I wanted btw):
awk '/^\S/ {core=0} /^_core/ {core=1} !core {print}' views.view.who_s_online.yml|head
uuid: 50715f68-3b13-4a15-8455-853110fd1d8b
langcode: en
status: true
dependencies:
module:
- user
id: who_s_online
label: 'Who''s online block'
module: user
description: 'Shows the user names of the most recently active users, and the total number of active users.'
The structure of an awk program is a series of conditions and actions:
condition { action }
The default value of condition is 1 (true), so actions without a condition always happen:
{ action } # is equivalent to
1 { action }
The default action is print, so quite often you will see a 1 in an awk script instead of { print }.
However, in your script, your condition is !core 1. This will negate the value of core, coerce it to a string and concatenate with the string "1". A non-empty string is always true, so every record will be printed.
If you want to only print records where core is false, then you can use !core as a condition by itself.
Yes, they are the same.
These expressions are completely equivalent:
awk '1' file
awk '{print}' file
awk '42' file
awk '2342352' file
awk '{print $0}' file
Why? Because a True condition triggers awk's default action: to print the current record; that is, {print $0}.
Now, what is happening here?
You are using them in a different way. In one case, you are saying:
awk '{things...} !core 1' file
# ^^^^^^^
In this case, !core is not doing anything. It reads as just awk '{things...} 1' file. Or better see Tom Fenech's explanation on it.
You can test it by executing both seq 10 | awk '/5/ {core++} 1' and seq 10 | awk '/5/ {core++} !core'. Both return all the numbers from 1 to 10. Instead, you would like to use seq 10 | awk '/5/ {core++} !core' to print from 1 to 4.
Notice also the difference with:
awk '{things...} !core; 1' file
# ^
By having this semi colon, this will trigger the action in !core (that is, to print the current record -line- if core evaluates to False) and then 1 will make it print all the records, no matter any condition. So seq 10 | awk '/5/ {core++} !core; 1' will print from 1 to 10, printing 1 to 4 twice.
On the other case you say:
awk '{things...} !core {print}' file
# ^^^^^^^^^^^^^
This reads as: if core evaluates to False, then do print.
What is the reason behind them being equivalent?
This is because awk works on the syntax condition { action }, which can have { action } suppressed when it consists in the default action: {print $0}. This way, whenever you want an action to be always triggered, you can say either {print $0} or just use the condition part with a True condition, like 1, 42, a variable that you already set to a positive value, etc.
Then, this is also useful to make awk code more idiomatic: if you work with variables as a flag, you can say var and it will trigger the print whenever the flag is set to a True condition, likewise you do with core in your code. This way, these two awk expressions are equivalent:
awk '/5/ {core++} core'
awk '/5/ {core++} { if (core > 0) {print $0}}'
awk '{ if ($0~/5/) {core++}} { if (core > 0) {print $0}}'
See how using idiomatic code makes it look better? ;)
Check GNU Awk User's Guide → 7.3 Actions for a more technical explanation (and probably better wording than mine!):
An awk program or script consists of a series of rules and function
definitions interspersed. (Functions are described later. See
User-defined.) A rule contains a pattern and an action, either of
which (but not both) may be omitted. The purpose of the action is to
tell awk what to do once a match for the pattern is found. Thus, in
outline, an awk program generally looks like this:
[pattern] { action }
pattern [{ action }]
…
function name(args) { … }
…
An action consists of one or more awk statements, enclosed in braces
(‘{…}’). Each statement specifies one thing to do. The statements are
separated by newlines or semicolons. The braces around an action must
be used even if the action contains only one statement, or if it
contains no statements at all. However, if you omit the action
entirely, omit the braces as well. An omitted action is equivalent to
‘{ print $0 }’:
/foo/ { } match foo, do nothing — empty action
/foo/ match foo, print the record — omitted action
In this answer,
awk '$2=="no"{$3="N/A"}1' file
was accepted. Note the 1 at the end of the AWK script. In the comments, the author of the answer said
[1 is] a cryptic way to display the current line.
I'm puzzled. How does that work?
In awk,
Since 1 always evaluates to true, it performs default operation {print $0}, hence prints the current line stored in $0
So, awk '$2=="no"{$3="N/A"}1' file is equivalent to and shorthand of
awk '$2=="no"{$3="N/A"} {print $0}' file
Again $0 is default argument to print, so you could also write
awk '$2=="no"{$3="N/A"} {print}' file
In-fact you could also use any non-zero number or any condition which always evaluates to true in place of 1
The documentation says
In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.
So, it treats 1 as pattern with no action. The default action is to print the line.
Even if you have a couple of rules, like in
awk '
in_net {
if (/^\s+bindIp:/) {
print " bindIp: 0.0.0.0"
next
} else if (/^\s*(#.*)?$/) {
in_net = 0
}
}
/^net:/ {
in_net = 1
}
1
' /etc/mongod.conf
You still need 1, since default action is triggered only when encountering rule with no action.
AWK works on method of condition and then action. So if any condition is TRUE any action which we mention to happen will be executed then.
In case of 1 it means we are making that condition TRUE and in this case we are not mentioning any action to happen, so awk's by default action print will happen.
So this is why we write 1 in shortcut actually speaking.
I thought I’d add an answer that explains how this shorthand works in terms of the POSIX specification for awk:
Basic description:
An awk program is composed of pairs of the form:
pattern { action }
Missing action:
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
Description of pattern
A pattern is any valid expression
Description of Expression patterns:
An expression pattern shall be evaluated as if it were an expression in a
Boolean context. If the result is true, the pattern shall be considered to
match, and the associated action (if any) shall be executed.
Boolean context:
When an expression is used in a Boolean context, if it has a numeric value,
a value of zero shall be treated as false and any other value shall be
treated as true. Otherwise, a string value of the null string shall be
treated as false and any other value shall be treated as true.
In the example of awk '$2=="no"{$3="N/A"}1', the pattern of the first pair is $2=="no" with a corresponding action of $3="N/A". This leaves 1 by itself as the next “pair” (pattern without a corresponding action).
Instead of 1, this lone expression pattern could be any numeric value or non-empty string, e.g.,
awk 9999
awk '"string"'
The awk 1 short-hand is fine when typing one-liners in an interactive shell. On the other hand, when writing scripts, I prefer my code to be more maintainable and readable for others by using the more explicit awk '{ print }'.
Hello every one I want to ask that I am very very confused about the brackets {} in awk like I have written a code
{
FNR == 3 { print $1 " age is " $2 }
}
but it gave me error on outer brackets but didn't give error on the brackets around the print statement why is it so :/ also in the following code
{
s = $1
d = $2
no = $1 + $2
{print no}
}
when I remove outer brackets my arguments displayed, the number of LOC times why is it I am very confuse kindly help me
thanks
An awk script consists of commands. Each command has a pattern and an action:
pattern1 { action1 }
pattern2 { action2 }
For each line in the input, awk tests each pattern and performs the corresponding action when the pattern is true.
The pattern can be omitted, in which case it is taken as always true and the action is performed for each line. Similarly, the action can be omitted, in which case it is taken as a print; this lets you easily use awk to select lines without changing the lines.
With this structure in mind, we can interpret the given examples. The first one is a single action that is applied to every line. But the action isn't well formed---if you remove the outer brackets, it becomes a distinct pattern and action, both of which are correctly constructed.
The second example also is applied to every line. It takes the first two (whitespace separated) fields from the lines, adds them as numbers, and prints the result. Removing the outer brackets gives you three patterns without corresponding actions, and an action without a pattern. Thus, the patterns---which are the value of the assignments, and usually true---have an implicit print that is usually invoked. Similarly, the action is always invoked, printing the value of no.