Why does "1" in awk print the current line? - awk

In this answer,
awk '$2=="no"{$3="N/A"}1' file
was accepted. Note the 1 at the end of the AWK script. In the comments, the author of the answer said
[1 is] a cryptic way to display the current line.
I'm puzzled. How does that work?

In awk,
Since 1 always evaluates to true, it performs default operation {print $0}, hence prints the current line stored in $0
So, awk '$2=="no"{$3="N/A"}1' file is equivalent to and shorthand of
awk '$2=="no"{$3="N/A"} {print $0}' file
Again $0 is default argument to print, so you could also write
awk '$2=="no"{$3="N/A"} {print}' file
In-fact you could also use any non-zero number or any condition which always evaluates to true in place of 1

The documentation says
In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.
So, it treats 1 as pattern with no action. The default action is to print the line.
Even if you have a couple of rules, like in
awk '
in_net {
if (/^\s+bindIp:/) {
print " bindIp: 0.0.0.0"
next
} else if (/^\s*(#.*)?$/) {
in_net = 0
}
}
/^net:/ {
in_net = 1
}
1
' /etc/mongod.conf
You still need 1, since default action is triggered only when encountering rule with no action.

AWK works on method of condition and then action. So if any condition is TRUE any action which we mention to happen will be executed then.
In case of 1 it means we are making that condition TRUE and in this case we are not mentioning any action to happen, so awk's by default action print will happen.
So this is why we write 1 in shortcut actually speaking.

I thought I’d add an answer that explains how this shorthand works in terms of the POSIX specification for awk:
Basic description:
An awk program is composed of pairs of the form:
pattern { action }
Missing action:
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
Description of pattern
A pattern is any valid expression
Description of Expression patterns:
An expression pattern shall be evaluated as if it were an expression in a
Boolean context. If the result is true, the pattern shall be considered to
match, and the associated action (if any) shall be executed.
Boolean context:
When an expression is used in a Boolean context, if it has a numeric value,
a value of zero shall be treated as false and any other value shall be
treated as true. Otherwise, a string value of the null string shall be
treated as false and any other value shall be treated as true.
In the example of awk '$2=="no"{$3="N/A"}1', the pattern of the first pair is $2=="no" with a corresponding action of $3="N/A". This leaves 1 by itself as the next “pair” (pattern without a corresponding action).
Instead of 1, this lone expression pattern could be any numeric value or non-empty string, e.g.,
awk 9999
awk '"string"'
The awk 1 short-hand is fine when typing one-liners in an interactive shell. On the other hand, when writing scripts, I prefer my code to be more maintainable and readable for others by using the more explicit awk '{ print }'.

Related

AWK script, linefeed under Windows causing different function

I have a simple AWK script which I try to execute under Windows. Gnu AWK 3.1.6.
The awk script is run with awk -f script.awk f1 f2 under Windows 10.
After spending almost half a day debugging, I came to find that the following two scenarios produce different results:
FNR==NR{
a[$0]++;cnt[1]+=1;next
}
!a[$0]
versus
FNR==NR
{
a[$0]++;cnt[1]+=1;next
}
!a[$0]
The difference of course being the linefeed at line 1.
It puzzles me because I don't recall seeing anywhere awk should be critical about linefeeds. Other linefeeds in the script are unimportant.
In example one, desired result is achieved. Example 2 prints f1, which is not desred.
So I made it work, but would like to know why
From the docs (https://www.gnu.org/software/gawk/manual/html_node/Statements_002fLines.html)
awk is a line-oriented language. Each rule’s action has to begin on
the same line as the pattern. To have the pattern and action on
separate lines, you must use backslash continuation; there is no other
option.
Note that the action only has to begin on the same line as the pattern. After that as we're all aware it can be spread over multiple lines, though not willy-nilly. From the same page in the docs:
However, gawk ignores newlines after any of the following symbols and
keywords:
, { ? : || && do else
In Example 2, since there is no action beginning on the same line as the FNR == NR pattern, the default action of printing the line is performed when that statement is true (which it is for all and only f1). Similarly in that example, the action block is not paired with any preceding pattern on its same line, so it is executed for every record (though there's no visible result for that).

How do awk match and ~ operators work together?

I'm having trouble understanding this awk code:
$0 ~ ENVIRON["search"] {
match($0, /id=[0-9]+/);
if (RSTART) {
print substr($0, RSTART+3, RLENGTH-3)
}
}
How do the ~ and match() operators interact with each other?
How does the match() have any effect, if its output isn't printed or echo'd? What does it actually return or do? How can I use it in my own code?
This is related to Why are $0, ~, &c. used in a way that violates usual bash syntax docs inside an argument to awk?, but that question was centered around understanding the distinction between bash and awk syntaxes, whereas this one is centered around understanding the awk portions of the script.
Taking your questions one at a time:
How do the ~ and match() operators interact with each other?
They don't. At least not directly in your code. ~ is the regexp comparison operator. In the context of $0 ~ ENVIRON["search"] it is being used to test if the regexp contained in the environment variable search exists as part of the current record ($0). If it does then the code in the subsequent {...} block is executed, if it doesn't then it isn't.
How does the match() have any effect, if its output isn't printed or
echoed?
It identifies the starting point (and stores it in the awk variable RSTART) and the length (RLENGTH) of the first substring within the first parameter ($0) that matches the regexp provides as the second parameter (id=[0-9]+). With GNU awk it can also populate a 3rd array argument with segments of the matching string identified by round brackets (aka "capture groups").
What does it actually return or do?
It returns the value of RSTART which is zero if no match was found, 1 or greater otherwise. For what it does see the previous answer.
How can I use it in my own code?
As shown in the example you posted would be one way but that code would more typically be written as:
($0 ~ ENVIRON["search"]) && match($0,/id=[0-9]+/) {
print substr($0, RSTART+3, RLENGTH-3)
}
and using a string rather than regexp comparison for the first part would probably be even more appropriate:
index($0,ENVIRON["search"]) && match($0,/id=[0-9]+/) {
print substr($0, RSTART+3, RLENGTH-3)
}
Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn how to use awk.
use the regex id=[0-9]+ to find a match in each line
if the start position of the match (RSTART) is not 0 then:
print the match without the id=
this is shorter but does the same:
xinput --list | grep -Po 'id=[0-9]+' | cut -c4-

awk difference between 1 and {print}?

This comment on awk change once per file made me think 1 and {print} are equal in awk. But it is not.
awk '/^\S/ {core=0} /^_core/ {core=1} !core 1' views.view.who_s_online.yml|head
uuid: 50715f68-3b13-4a15-8455-853110fd1d8b
langcode: en
status: true
dependencies:
module:
- user
_core:
default_config_hash: DWLCIpl8ku4NbiI9t3GgDeuW13KSOy2l1zho7ReP_Bg
id: who_s_online
label: 'Who''s online block'
Compare to (and this is what I wanted btw):
awk '/^\S/ {core=0} /^_core/ {core=1} !core {print}' views.view.who_s_online.yml|head
uuid: 50715f68-3b13-4a15-8455-853110fd1d8b
langcode: en
status: true
dependencies:
module:
- user
id: who_s_online
label: 'Who''s online block'
module: user
description: 'Shows the user names of the most recently active users, and the total number of active users.'
The structure of an awk program is a series of conditions and actions:
condition { action }
The default value of condition is 1 (true), so actions without a condition always happen:
{ action } # is equivalent to
1 { action }
The default action is print, so quite often you will see a 1 in an awk script instead of { print }.
However, in your script, your condition is !core 1. This will negate the value of core, coerce it to a string and concatenate with the string "1". A non-empty string is always true, so every record will be printed.
If you want to only print records where core is false, then you can use !core as a condition by itself.
Yes, they are the same.
These expressions are completely equivalent:
awk '1' file
awk '{print}' file
awk '42' file
awk '2342352' file
awk '{print $0}' file
Why? Because a True condition triggers awk's default action: to print the current record; that is, {print $0}.
Now, what is happening here?
You are using them in a different way. In one case, you are saying:
awk '{things...} !core 1' file
# ^^^^^^^
In this case, !core is not doing anything. It reads as just awk '{things...} 1' file. Or better see Tom Fenech's explanation on it.
You can test it by executing both seq 10 | awk '/5/ {core++} 1' and seq 10 | awk '/5/ {core++} !core'. Both return all the numbers from 1 to 10. Instead, you would like to use seq 10 | awk '/5/ {core++} !core' to print from 1 to 4.
Notice also the difference with:
awk '{things...} !core; 1' file
# ^
By having this semi colon, this will trigger the action in !core (that is, to print the current record -line- if core evaluates to False) and then 1 will make it print all the records, no matter any condition. So seq 10 | awk '/5/ {core++} !core; 1' will print from 1 to 10, printing 1 to 4 twice.
On the other case you say:
awk '{things...} !core {print}' file
# ^^^^^^^^^^^^^
This reads as: if core evaluates to False, then do print.
What is the reason behind them being equivalent?
This is because awk works on the syntax condition { action }, which can have { action } suppressed when it consists in the default action: {print $0}. This way, whenever you want an action to be always triggered, you can say either {print $0} or just use the condition part with a True condition, like 1, 42, a variable that you already set to a positive value, etc.
Then, this is also useful to make awk code more idiomatic: if you work with variables as a flag, you can say var and it will trigger the print whenever the flag is set to a True condition, likewise you do with core in your code. This way, these two awk expressions are equivalent:
awk '/5/ {core++} core'
awk '/5/ {core++} { if (core > 0) {print $0}}'
awk '{ if ($0~/5/) {core++}} { if (core > 0) {print $0}}'
See how using idiomatic code makes it look better? ;)
Check GNU Awk User's Guide → 7.3 Actions for a more technical explanation (and probably better wording than mine!):
An awk program or script consists of a series of rules and function
definitions interspersed. (Functions are described later. See
User-defined.) A rule contains a pattern and an action, either of
which (but not both) may be omitted. The purpose of the action is to
tell awk what to do once a match for the pattern is found. Thus, in
outline, an awk program generally looks like this:
[pattern] { action }
pattern [{ action }]
…
function name(args) { … }
…
An action consists of one or more awk statements, enclosed in braces
(‘{…}’). Each statement specifies one thing to do. The statements are
separated by newlines or semicolons. The braces around an action must
be used even if the action contains only one statement, or if it
contains no statements at all. However, if you omit the action
entirely, omit the braces as well. An omitted action is equivalent to
‘{ print $0 }’:
/foo/ { } match foo, do nothing — empty action
/foo/ match foo, print the record — omitted action

Add a number by subtracting an existing number by awk

I would like to convert
Title Page/4,Black,notBold,notItalic,open,TopLeftZoom,0,0,0.0
Contents/16,Black,notBold,notItalic,open,TopLeftZoom,0,0,0.0
to
Title Page 1/4,Black,notBold,notItalic,open,TopLeftZoom,0,0,0.0
Contents 13/16,Black,notBold,notItalic,open,TopLeftZoom,0,0,0.0
The rule is to subtract the number following / by 3 and add that result in front of /.
I tried to do that with awk.
awk -F',/' '{gsub(/\//, ($2-10) + "\/"}' myfile
but it doesn't work. Why is it? Thanks.
A slight modification to your attempt produces the desired output:
$ awk -F'[,/]' '{sub(/\//, " " ($2-3) "/") }1' file
Title Page 1/4,Black,notBold,notItalic,open,TopLeftZoom,0,0,0.0
Contents 13/16,Black,notBold,notItalic,open,TopLeftZoom,0,0,0.0
-F is used to specify the input field separator. I have changed it to a regex group which matches commas and slashes, which means that the second field $2 contains the number that you are trying to replace. As you are only interested in making a single substitution in each record, I have used sub rather than gsub. Note that in awk, strings are automatically concatenated (you shouldn't use +).
Awk programs are stuctured like condition { action }. If no condition is specified, the action block is always run. If no action is specified, the default action is { print }, which prints the record. In the above script, 1 is used to print the record, as it is the simplest expression that evaluates to true.

where to put brackets in awk

Hello every one I want to ask that I am very very confused about the brackets {} in awk like I have written a code
{
FNR == 3 { print $1 " age is " $2 }
}
but it gave me error on outer brackets but didn't give error on the brackets around the print statement why is it so :/ also in the following code
{
s = $1
d = $2
no = $1 + $2
{print no}
}
when I remove outer brackets my arguments displayed, the number of LOC times why is it I am very confuse kindly help me
thanks
An awk script consists of commands. Each command has a pattern and an action:
pattern1 { action1 }
pattern2 { action2 }
For each line in the input, awk tests each pattern and performs the corresponding action when the pattern is true.
The pattern can be omitted, in which case it is taken as always true and the action is performed for each line. Similarly, the action can be omitted, in which case it is taken as a print; this lets you easily use awk to select lines without changing the lines.
With this structure in mind, we can interpret the given examples. The first one is a single action that is applied to every line. But the action isn't well formed---if you remove the outer brackets, it becomes a distinct pattern and action, both of which are correctly constructed.
The second example also is applied to every line. It takes the first two (whitespace separated) fields from the lines, adds them as numbers, and prints the result. Removing the outer brackets gives you three patterns without corresponding actions, and an action without a pattern. Thus, the patterns---which are the value of the assignments, and usually true---have an implicit print that is usually invoked. Similarly, the action is always invoked, printing the value of no.