I am trying to understand what is the difference between two command (I was expecting same result from the two):
Case-I
echo 'one,two,three,four,five' |awk -v FS=, '{NF=3}1'
one two three
Case-II
echo 'one,two,three,four,five' |awk -v FS=, -v NF=3 '{$1=$1}1'
one two three four five
Here is my current understanding:
$1=$1 is used to force awk to reconstruct and use the variables defined. I am assigning FS like -v FS="," which is in effect unlike -v NF=3 .
Question: Why NF=3 is not taking effect where as FS=, does.
https://www.gnu.org/software/gawk/manual/gawk.html#Options:
-v var=val
--assign var=val
Set the variable var to the value val before execution of the program begins.
https://www.gnu.org/software/gawk/manual/gawk.html#Fields:
NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record.
In your first program, you execute {NF=3} after each line is read, overwriting NF.
In your second program, you initially set NF=3 via -v, but that value is overwritten by awk when the first line of input is read.
FS is different because awk never sets this variable. It will keep whatever value you give it.
NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record.
Remember : whenever awk reads record/line/row, awk will parse fields by
field separator FS (default single space), and will recalculate fields
and update the same in variable NF.
Therefore, below one does not work.
Why this doesn't work ?
You defined NF, which is before the execution of the program
awk read record/line/row, parsed fields, recalculated fields, so variable NF overwritten.
case - 1 :
echo 'one,two,three,four,five' |awk -v FS=, -v NF=3 '{$1=$1}1'
one two three four five
Why this works ?
awk read record/line/row, parsed fields, calculated fields, NF will be 5
you have overwritten variable NF
case -2 :
echo 'one,two,three,four,five' |awk -v FS=, '{ NF=3 }1'
one two three
^
Because you have overwritten variable
$ echo 'one,two,three,four,five' |awk -v FS=, '{print "Before:"NF; NF=3; print "After:"NF}1'
Before:5
After:3
one two three
Related
I have the following multiple lines in a file on Linux, the line information differs but the format is always the same:
-item bread.maker -model "modelname model type modelnum-43453-23241.7" -date1 23.10.01 -date2 30.10.04 -date3 04.02.05
I want to output only the 2nd, 4th and last columns of each line. I've tried with awk -F, and print $NF, but I cannot seem get it to treat the double quoted part as 1 column.
With any awk:
$ awk 'match($0,/"[^"]*"/){print $2, substr($0,RSTART,RLENGTH), $NF}' file
bread.maker "modelname model type modelnum-43453-23241.7" 04.02.05
or:
$ awk -v OFS='"' '{split($0,f,/"/); print $2, f[2], $NF}' file
bread.maker"modelname model type modelnum-43453-23241.7"04.02.05
or with GNU awk for FPAT:
$ awk -v FPAT='[^" ]+|"[^"]*"' '{print $2, $4, $10}' file
bread.maker "modelname model type modelnum-43453-23241.7" 04.02.05
Set OFS as appropriate if you want something other than a blank char to separate the output fields. I used " as the OFS for the 2nd script since it must not be present in your input if you're already using it to quote strings.
With bash v5.1, we can assign an even-numbered list of words as an associative array: it will be treated as a key-value list.
declare -A fields
while IFS= read -r line; do
eval fields=("$line") # yeah, eval is needed here to respect
# the quotes in the line
printf '%s,%s,%s\n' "${fields[-item]}" "${fields[-model]}" "${fields[-date3]}"
done < file
bread.maker,modelname model type modelnum-43453-23241.7,04.02.05
I am working with bash script that has this command in it.
awk -F ‘‘ ‘/abc/{print $3}’|xargs
What is the meaning of this command?? Assume input is provided to awk.
The quick answer is it'll do different things depending on the version of awk you're running and how many fields of output the awk script produces.
I assume you meant to write:
awk -F '' '/abc/{print $3}'|xargs
not the syntactically invalid (due to "smart quotes"):
awk -F ‘’’/abc/{print $3}’|xargs
-F '' is undefined behavior per POSIX so what it will do depends on the version of awk you're running. In some awks it'll split the current line into 1 character per field. in others it'll be ignored and the line will be split into fields at every sequence of white space. In other awks still it could do anything else.
/abc/ looks for a string matching the regexp abc on the current line and if found invokes the subsequent action, in this case {print $3}.
However it's split into fields, print $3 will print the 3rd such field.
xargs as used will just print chunks of the multi-line input it's getting all on 1 line so you could get 1 line of all-fields output if you don't have many fields being output or several lines of multi-field output if you do.
I suspect the intent of that code was to do what this code actually will do in any awk alone:
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
e.g.:
$ printf 'foo\nxabc\nyzabc\nbar\n' |
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
b a
I was reading the definition of the PROCINFO built-in variable on GNU Awk User's Guide → 7.5.2 Built-in Variables That Convey Information:
PROCINFO #
The elements of this array provide access to information about the running awk program. The following elements (listed alphabetically) are guaranteed to be available:
PROCINFO["FS"]
This is "FS" if field splitting with FS is in effect, "FIELDWIDTHS" if field splitting with FIELDWIDTHS is in effect, "FPAT" if field matching with FPAT is in effect, or "API" if field splitting is controlled by an API input parser.
And yes, it works very well. See this example when I provide the string "hello;you" and I set, by order, FS to ";", FIELDWIDTHS to "2 2 " and FPAT to three characters:
$ gawk 'BEGIN{FS=";"}{print PROCINFO["FS"]; print $1}' <<< "hello;you"
FS
hello
$ gawk 'BEGIN{FIELDWIDTHS="2 2 2"}{print PROCINFO["FS"]; print $1}' <<< "hello;you"
FIELDWIDTHS
he
$ gawk 'BEGIN{FPAT="..."}{print PROCINFO["FS"]; print $1}' <<< "hello;you"
FPAT
hel
This is fine and works very well.
The, a bit before they mention in 4.8 Checking How gawk Is Splitting Records:
In order to tell which kind of field splitting is in effect, use PROCINFO["FS"] (see section Built-in Variables That Convey Information). The value is "FS" if regular field splitting is being used, "FIELDWIDTHS" if fixed-width field splitting is being used, or "FPAT" if content-based field splitting is being used.
And also in Changing FS Does Not Affect the Fields they describe how the changes affect the next record:
According to the POSIX standard, awk is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of FS after a record is read, the values of the fields (i.e., how they were split) should reflect the old value of FS, not the new one.
This case explains it very well:
$ gawk 'BEGIN{FS=";"} {FS="|"; print $1}' <<< "hello;you
bye|everyone"
hello # "hello;you" is splitted using FS=";", the assignment FS="|" doesn't affect it yet
bye # "bye|everyone" is splitted using FS="|"
Having all of this into consideration, I would assume that PROCINFO["FS"] would always reflect the "FS" as the field splitting in the record it is being printed on.
However, see this case:
$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print $1}' <<< "hello;you"
FS
hel
PROCINFO["FS"] shows the info set in the current record (FS), not the one that Awk is taking into account when processing the data (that is, FPAT). The same occurs if we swap the assignments:
$ gawk 'BEGIN{FS=";"}{FPAT="..."; print PROCINFO["FS"]; print $1}' <<< "hello;you"
FPAT
hello
Why is PROCINFO["FS"] showing a different FS than the one that is being used in the record it is printed in?
Field splitting (using FS, FIELDWIDTHS, or FPAT) occurs when a record is read or $0 as a whole is given a new value otherwise (e.g. $0="foo" or sub(/foo/,"bar")). print PROCINFO["FS"] tells you the value that PROCINFO["FS"] currently has which is not necessarily the same value it had when field splitting last occurred.
With:
$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print $1}' <<< "hello;you"
FS
hel
You're setting FS=";" after $1 has already been populated based on FPAT="...", then printing PROCINFO["FS"] new value (which will be used the next time a record is split into fields), then printing the value of $1 which was populated before you set FS=";".
If you set $0 to itself the field splitting will occur again, this time using the new FS value rather than the original FPAT value:
$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print $1; $0=$0; print $1}' <<< "hello;you"
FS
hel
hello
I have a sample file which contains the following.
logging.20160309.113.txt.log: 0 Rows successfully loaded.
logging.20160309.1180.txt.log: 0 Rows successfully loaded.
logging.20160309.1199.txt.log: 0 Rows successfully loaded.
I currently am familiar with 2 ways of implementing a Field Separator syntax in awk. However, I am currently getting different results.
For the longest time I use
"FS=" syntax when my FS is more than one character.
"-f" flag when my FS is just one character.
I would like to understand why the FS= syntax is giving me an unexpected result as seen below. Somehow the 1st record is being left behind.
$ head -3 reload_list | awk -F"\.log\:" '{ print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
$ head -3 reload_list | awk '{ FS="\.log\:" } { print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt
The reason you are getting different results, is that in the case where you set FS in the awk program, it is not in a BEGIN block. So by the time you've set it, the first record has already been parsed into fields (using the default separator).
Setting with -F
$ awk -F"\\.log:" '{ print $1 }' b.txt
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
Setting FS after parsing first record
$ awk '{ FS= "\\.log:"} { print $1 }' b.txt
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt
Setting FS before parsing any records
$ awk 'BEGIN { FS= "\\.log:"} { print $1 }' b.txt
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
I noticed this relevant bit in an awk manual. If perhaps you've seen different behavior previously or with a different implementation, this could explain why:
According to the POSIX standard, awk is supposed to behave as if
each record is split into fields at the time that it is read. In
particular, this means that you can change the value of FS after a
record is read, but before any of the fields are referenced. The value
of the fields (i.e. how they were split) should reflect the old value
of FS, not the new one.
However, many implementations of awk do not do this. Instead,
they defer splitting the fields until a field reference actually
happens, using the current value of FS! This behavior can be
difficult to diagnose.
-f is for running a script from a file. -F and FS works the same
$ awk -F'.log' '{print $1}' logs
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
$ awk 'BEGIN{FS=".log"} {print $1}' logs
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!