awk difference between 1 and {print}? - awk

This comment on awk change once per file made me think 1 and {print} are equal in awk. But it is not.
awk '/^\S/ {core=0} /^_core/ {core=1} !core 1' views.view.who_s_online.yml|head
uuid: 50715f68-3b13-4a15-8455-853110fd1d8b
langcode: en
status: true
dependencies:
module:
- user
_core:
default_config_hash: DWLCIpl8ku4NbiI9t3GgDeuW13KSOy2l1zho7ReP_Bg
id: who_s_online
label: 'Who''s online block'
Compare to (and this is what I wanted btw):
awk '/^\S/ {core=0} /^_core/ {core=1} !core {print}' views.view.who_s_online.yml|head
uuid: 50715f68-3b13-4a15-8455-853110fd1d8b
langcode: en
status: true
dependencies:
module:
- user
id: who_s_online
label: 'Who''s online block'
module: user
description: 'Shows the user names of the most recently active users, and the total number of active users.'

The structure of an awk program is a series of conditions and actions:
condition { action }
The default value of condition is 1 (true), so actions without a condition always happen:
{ action } # is equivalent to
1 { action }
The default action is print, so quite often you will see a 1 in an awk script instead of { print }.
However, in your script, your condition is !core 1. This will negate the value of core, coerce it to a string and concatenate with the string "1". A non-empty string is always true, so every record will be printed.
If you want to only print records where core is false, then you can use !core as a condition by itself.

Yes, they are the same.
These expressions are completely equivalent:
awk '1' file
awk '{print}' file
awk '42' file
awk '2342352' file
awk '{print $0}' file
Why? Because a True condition triggers awk's default action: to print the current record; that is, {print $0}.
Now, what is happening here?
You are using them in a different way. In one case, you are saying:
awk '{things...} !core 1' file
# ^^^^^^^
In this case, !core is not doing anything. It reads as just awk '{things...} 1' file. Or better see Tom Fenech's explanation on it.
You can test it by executing both seq 10 | awk '/5/ {core++} 1' and seq 10 | awk '/5/ {core++} !core'. Both return all the numbers from 1 to 10. Instead, you would like to use seq 10 | awk '/5/ {core++} !core' to print from 1 to 4.
Notice also the difference with:
awk '{things...} !core; 1' file
# ^
By having this semi colon, this will trigger the action in !core (that is, to print the current record -line- if core evaluates to False) and then 1 will make it print all the records, no matter any condition. So seq 10 | awk '/5/ {core++} !core; 1' will print from 1 to 10, printing 1 to 4 twice.
On the other case you say:
awk '{things...} !core {print}' file
# ^^^^^^^^^^^^^
This reads as: if core evaluates to False, then do print.
What is the reason behind them being equivalent?
This is because awk works on the syntax condition { action }, which can have { action } suppressed when it consists in the default action: {print $0}. This way, whenever you want an action to be always triggered, you can say either {print $0} or just use the condition part with a True condition, like 1, 42, a variable that you already set to a positive value, etc.
Then, this is also useful to make awk code more idiomatic: if you work with variables as a flag, you can say var and it will trigger the print whenever the flag is set to a True condition, likewise you do with core in your code. This way, these two awk expressions are equivalent:
awk '/5/ {core++} core'
awk '/5/ {core++} { if (core > 0) {print $0}}'
awk '{ if ($0~/5/) {core++}} { if (core > 0) {print $0}}'
See how using idiomatic code makes it look better? ;)
Check GNU Awk User's Guide → 7.3 Actions for a more technical explanation (and probably better wording than mine!):
An awk program or script consists of a series of rules and function
definitions interspersed. (Functions are described later. See
User-defined.) A rule contains a pattern and an action, either of
which (but not both) may be omitted. The purpose of the action is to
tell awk what to do once a match for the pattern is found. Thus, in
outline, an awk program generally looks like this:
[pattern] { action }
pattern [{ action }]
…
function name(args) { … }
…
An action consists of one or more awk statements, enclosed in braces
(‘{…}’). Each statement specifies one thing to do. The statements are
separated by newlines or semicolons. The braces around an action must
be used even if the action contains only one statement, or if it
contains no statements at all. However, if you omit the action
entirely, omit the braces as well. An omitted action is equivalent to
‘{ print $0 }’:
/foo/ { } match foo, do nothing — empty action
/foo/ match foo, print the record — omitted action

Related

Why does NR==FNR; {} behave differently when used as NR==FNR{ }?

Hoping someone can help explain the following awk output.
awk --version: GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
OS: Linux sub system on Windows; Linux Windows11x64 5.10.102.1-microsoft-standard-WSL2
user experience: n00b
Important: In the two code snippets below, the only difference is the semi colon ( ; ) after NR==FNR in sample # 2.
sample # 1
'awk 'NR==FNR { print $0 }' lines_to_show.txt all_lines.txt
output # 1
2
3
4
5
7
sample # 2
'awk 'NR==FNR; { print $0 }' lines_to_show.txt all_lines.txt
output # 2
2 # why is value in file 'lines_to_show.txt appearing twice?
2
3
3
4
4
5
5
7
7
line -01
line -02
line -03
line -04
line -05
line -06
line -07
line -08
line -09
line -10
Generate the text input files
lines_to_show.txt: echo -e "2\n3\n4\n5\n7" > lines_to_show.txt
all_lines.txt: echo -e "line\t-01\nline\t-02\nline\t-03\nline\t-04\nline\t-05\nline\t-06\nline\t-07\nline\t-08\nline\t-09\nline\t-10" > all_lines.txt
Request/Questions:
If you can please explain why you know the answers to the questions below (experience, tutorial, video, etc..)
How does one read an `awk' program? I was under the impression that a semi colon ( ; ) is only a statement terminator, just like in C. It should not have an impact on the execution of the program.
In output # 2, why are the values in the file 'lines_to_show.txt appearing twice? Seems like awk is printing values from the 1st file "lines_to_show.txt" but printing them 10 times, which is the number of records in the file "all_lines.txt". Is this true? why?
Why in output # 1, only output from "lines_to_show.txt" is displayed? I thought awk will process each record in each file, so I expcted to see 15 lines (10 + 5).
What have I tried so far?
going though https://www.linkedin.com/learning/awk-essential-training/using-awk-command-line-flags?autoSkip=true&autoplay=true&resume=false&u=61697657
modifying the code to see the difference and use that to 'understand' what is going on.
trying to work through the flow using pen and paper
going through https://www.baeldung.com/linux/awk-multiple-input-files --> https://www.baeldung.com/linux/awk-multiple-input-files
awk 'NR==FNR { print $0 }' lines_to_show.txt all_lines.txt
Here you have one pattern-action pair, that is if (total) number of row equals file number of row then print whole line.
awk 'NR==FNR; { print $0 }' lines_to_show.txt all_lines.txt
Here you have two pattern-action pairs, as ; follows condition it is assumed that you want default action which is {print $0}, in other words that is equivalent to
awk 'NR==FNR{print $0}{ print $0}' lines_to_show.txt all_lines.txt
first print $0 is used solely when processing 1st file, 2nd print $0 is used indiscriminately (no condition given), so for lines_to_show.txt both prints are used, for all_lines.txt solely 2nd print.
man awk is the best reference:
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the
enclosing brace characters) can be omitted.
A missing pattern shall match any record of input,
and a missing action shall be equivalent to:
{ print }
; terminates a pattern-action block. So you have two pattern/action blocks, both whose action is to print the line.

AWK script, linefeed under Windows causing different function

I have a simple AWK script which I try to execute under Windows. Gnu AWK 3.1.6.
The awk script is run with awk -f script.awk f1 f2 under Windows 10.
After spending almost half a day debugging, I came to find that the following two scenarios produce different results:
FNR==NR{
a[$0]++;cnt[1]+=1;next
}
!a[$0]
versus
FNR==NR
{
a[$0]++;cnt[1]+=1;next
}
!a[$0]
The difference of course being the linefeed at line 1.
It puzzles me because I don't recall seeing anywhere awk should be critical about linefeeds. Other linefeeds in the script are unimportant.
In example one, desired result is achieved. Example 2 prints f1, which is not desred.
So I made it work, but would like to know why
From the docs (https://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html)
awk is a line-oriented language. Each rule’s action has to begin on
the same line as the pattern. To have the pattern and action on
separate lines, you must use backslash continuation; there is no other
option.
Note that the action only has to begin on the same line as the pattern. After that as we're all aware it can be spread over multiple lines, though not willy-nilly. From the same page in the docs:
However, gawk ignores newlines after any of the following symbols and
keywords:
, { ? : || && do else
In Example 2, since there is no action beginning on the same line as the FNR == NR pattern, the default action of printing the line is performed when that statement is true (which it is for all and only f1). Similarly in that example, the action block is not paired with any preceding pattern on its same line, so it is executed for every record (though there's no visible result for that).

Understanding syntax for print multiple lines after pattern match

To print multiple(2) lines following the pattern using awk:
I have found somewhere the following solution
$ awk '/Linux/{x=NR+2}(NR<=x){print}' file
Linux
Solaris
Aix
I am trying to understand the syntax
Generally awk syntax is
awk 'pattern{action}' file
Here we find
pattern = /Linux/
action = {x=NR+2}
then what is (NR<=x){print}
Solution:
My understaning of c-like syntax for this is:
While read (file,line)
{
if (line ~ '/pattern/') then
{
x= NR+2
}
if (NR <= x)
{
print
{
}
for NR=1 and if (line ~ '/pattern/') then x is set to NR+2 eg(1+2 =3). This value will not be reset till the process is over. SO when the next line is read and !(line ~ '/pattern/') then x is still 3, (NR (2) <= 3) is true so it prints the next line
Thanks to #Edmorton for the undestating
FWIW I wouldn't write the code you're asking about, instead I'd write:
awk '/Linux/{c=3} c&&c--' file
See example "g" at https://stackoverflow.com/a/17914105/1745001.
Having said that, your original code in C-like syntax would be:
NR=0
x=0
While read (file,line)
{
NR++
if (line ~ "Linux") {
x = NR+2
}
if (NR <= x) {
print
}
}
Btw, I know it's frequently mis-used but don't use the word "pattern" in your software as it's highly ambiguous - use string or regexp or condition (or in shell but not awk, sed, grep, etc. and only where appropriate "globbing pattern"), whichever it is you really mean.
For example you wrote that awk syntax is:
awk 'pattern{action}' file
No. Or maybe, depending on what you think "pattern" means! Despite what many books, tutorials, etc. say so as to remove any ambiguity you should simply think of awk syntax as:
awk 'condition{action}' file
where condition can be any of:
a key word like BEGIN or END
an arithmetic expression like var < 7 or NF or 1
a regexp comparison like $0 ~ "foo" or $0 ~ /foo/ or /foo/ or $0 ~ var or match($0,/foo/)
a string comparison like $0 == "foo" or index($0,"foo")
nothing at all in which case it's assumed to be true when there's an associated action block.
and probably other things I'm forgetting to list.
your script has two blocks
$ awk '/Linux/ {x=NR+2}
NR<=x {print}' file
first block sets the variable x, second uses to print the lines. Note that you can drop {print}, since it's the default action.

Why does "1" in awk print the current line?

In this answer,
awk '$2=="no"{$3="N/A"}1' file
was accepted. Note the 1 at the end of the AWK script. In the comments, the author of the answer said
[1 is] a cryptic way to display the current line.
I'm puzzled. How does that work?
In awk,
Since 1 always evaluates to true, it performs default operation {print $0}, hence prints the current line stored in $0
So, awk '$2=="no"{$3="N/A"}1' file is equivalent to and shorthand of
awk '$2=="no"{$3="N/A"} {print $0}' file
Again $0 is default argument to print, so you could also write
awk '$2=="no"{$3="N/A"} {print}' file
In-fact you could also use any non-zero number or any condition which always evaluates to true in place of 1
The documentation says
In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.
So, it treats 1 as pattern with no action. The default action is to print the line.
Even if you have a couple of rules, like in
awk '
in_net {
if (/^\s+bindIp:/) {
print " bindIp: 0.0.0.0"
next
} else if (/^\s*(#.*)?$/) {
in_net = 0
}
}
/^net:/ {
in_net = 1
}
1
' /etc/mongod.conf
You still need 1, since default action is triggered only when encountering rule with no action.
AWK works on method of condition and then action. So if any condition is TRUE any action which we mention to happen will be executed then.
In case of 1 it means we are making that condition TRUE and in this case we are not mentioning any action to happen, so awk's by default action print will happen.
So this is why we write 1 in shortcut actually speaking.
I thought I’d add an answer that explains how this shorthand works in terms of the POSIX specification for awk:
Basic description:
An awk program is composed of pairs of the form:
pattern { action }
Missing action:
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
Description of pattern
A pattern is any valid expression
Description of Expression patterns:
An expression pattern shall be evaluated as if it were an expression in a
Boolean context. If the result is true, the pattern shall be considered to
match, and the associated action (if any) shall be executed.
Boolean context:
When an expression is used in a Boolean context, if it has a numeric value,
a value of zero shall be treated as false and any other value shall be
treated as true. Otherwise, a string value of the null string shall be
treated as false and any other value shall be treated as true.
In the example of awk '$2=="no"{$3="N/A"}1', the pattern of the first pair is $2=="no" with a corresponding action of $3="N/A". This leaves 1 by itself as the next “pair” (pattern without a corresponding action).
Instead of 1, this lone expression pattern could be any numeric value or non-empty string, e.g.,
awk 9999
awk '"string"'
The awk 1 short-hand is fine when typing one-liners in an interactive shell. On the other hand, when writing scripts, I prefer my code to be more maintainable and readable for others by using the more explicit awk '{ print }'.

Awk syntax for enclosing {}

It is my understanding that in awk a conditional evaluation could be started using either of the following:
if ($0 ~ /no/) {cmd}
($0 ~ /no/) {cmd}
$0 ~ /no/ {cmd}
/no/ {cmd}
In the generic command line
BEGIN { } (body) END { }
I find it most logical to enclose (body) in brackets, as in {(body} (referred to as "bracketed"). 
Under GNU awk, Ubuntu 12.04, only the 1st of the 4 options executes if bracketed - at least on my machine (the others produce syntax errors). If i run the line un-bracketed, only the 1st failed, the rest 3 work fine. Could someone carefully explain why that is so?
awk statements follow the rule:
PATTERN{action}
So BEGIN or END are just special PATTERNs. basically if PATTERN is true, do the action in {..}
PATTERN could be regex, expression and range also empty.
the empty pattern looks like:
awk '{print "foo"}' input
You can read http://www.gnu.org/software/gawk/manual/gawk.html section 7.1 for deatails.
Back to your question, if you executed those 4 lines in Action part, that is, between {...}, (in fact empty pattern), only the first, with if is valid conditional statement. However if you use the matching check as pattern, (outside the {..}):
if ($0 ~ /no/) {cmd} # this would ***NOT*** work, because it is statment, not valid awk expression. (I don't know how did you make it work.)
($0 ~ /no/) {cmd} # this will work, it is a boolean expression
$0 ~ /no/ {cmd} # same as above
/no/ {cmd} # this is regexp pattern, so it works too.
Syntactically, an awk script is a set of rules, like the following [Ref]:
pattern { action }
pattern { action }
....
Where pattern is an expression, and action is a series of commands that are executed if pattern is evaluated to true (a non-zero number, or a non-empty string).
If the pattern is omitted, then action will be executed for every record.
If the { action } part is omitted, the default action is executed, which is { print $0 } or { print } (equivalent).
As you mentioned in the question, pattern can be BEGIN, END, ... etc, or any other expression.
Your first example will be executed correctly in the { action } block, since it is a command. The other three options are not commands, but are combinations of pattern { action } blocks.