awk to take FS into effect - awk

Why does the following happen? How can I understand the logic?
$ echo "123456" | awk 'BEGIN {FS="4"; OFS="-"}; {print}'
123456
But if I "modify" some of the fields, everything is OK:
$ echo "123456" | awk 'BEGIN {FS="4"; OFS="-"}; {$1=$1;print}'
123-56

The Output Field Separator only takes effect once record has been touched in some way. From the GNU AWK manual:
It is important to remember that $0 is the full record, exactly as it was read from the input. This includes any leading or trailing whitespace, and the exact whitespace (or other characters) that separates the fields.
It is a common error to try to change the field separators in a record simply by setting FS and OFS, and then expecting a plain print or print $0 to print the modified record.
But this does not work, because nothing was done to change the record itself. Instead, you must force the record to be rebuilt, typically with a statement such as $1 = $1

Related

What does this Awk expression mean

I am working with bash script that has this command in it.
awk -F ‘‘ ‘/abc/{print $3}’|xargs
What is the meaning of this command?? Assume input is provided to awk.
The quick answer is it'll do different things depending on the version of awk you're running and how many fields of output the awk script produces.
I assume you meant to write:
awk -F '' '/abc/{print $3}'|xargs
not the syntactically invalid (due to "smart quotes"):
awk -F ‘’’/abc/{print $3}’|xargs
-F '' is undefined behavior per POSIX so what it will do depends on the version of awk you're running. In some awks it'll split the current line into 1 character per field. in others it'll be ignored and the line will be split into fields at every sequence of white space. In other awks still it could do anything else.
/abc/ looks for a string matching the regexp abc on the current line and if found invokes the subsequent action, in this case {print $3}.
However it's split into fields, print $3 will print the 3rd such field.
xargs as used will just print chunks of the multi-line input it's getting all on 1 line so you could get 1 line of all-fields output if you don't have many fields being output or several lines of multi-field output if you do.
I suspect the intent of that code was to do what this code actually will do in any awk alone:
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
e.g.:
$ printf 'foo\nxabc\nyzabc\nbar\n' |
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
b a

AWK doesn't update record with new separator

I'm a bit confused with awk (I'm totally new to awk)
find static/*
static/conf
static/conf/server.xml
my goal is to romove 'static/' from the result
First step:
find static/* | awk -F/ '{print $(0)}'
static/conf
static/conf/server.xml
Same result. I expected it. Now deleting the first part:
find static/* | awk -F/ '{$1="";print $(0)}'
conf
conf server.xml
thats nearly good, but I don't now why the delimiter is killed
But I can deal with it just adding the delimiter to the output:
find static/* | awk -F/ '{$1="";OFS=FS;print $(0)}'
conf
/conf/server.xml
OK now I'm completley lost.
Why is a '/' on the second line and not on the first? In both cases I deleted the first column.
Any explanations, ideas.
BTW my preferred output would be
conf
conf/server.xml
Addendum: thank you for your kind answers. they will help me to fix the problem.
However I want to understand why the first '/' is deleted in my last try. To make it a bit clearer:
find static/* | awk -F/ '{$1="";OFS="#";print $(0)}'
conf
^ a space and no / ?
#conf#server.xml
but I don't now why the delimiter is killed.
Whenever you redefine a field in awk using a statement like:
$n = new_value
awk will rebuild the current record $0 and automatically replace all field separators defined by FS, by the output field separator OFS (see below). The default value of OFS is a single space. This implies the following:
awk -F/ '{$1="";print $(0)}'
The field separator FS is set to a single <slash>-character. The first field is reset to "" which enables the re-evaluation of $0 by which all regular expression matches corresponding to FS are replaced by the string OFS which is currently a single space.
awk -F/ '{$1="";OFS=FS;print $(0)'
The same action applies as earlier. However, after the re-computation of $0, the output field separator OFS is set to FS. This implies that from record 2 onward, you will not replace FS with a space, but with the value of FS.
Possible solution with same ideology
awk 'BEGIN{FS=OFS="/"}{$1=""}{print substr($0,2)}'
The substring function substr is needed to remove the first /
DESCRIPTION
The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non- <blank> non- <newline> characters. This default <blank> and <newline> field delimiter can be changed by using the FS built-in variable or the -F sepstring option. The awk utility shall denote the first field in a record $1, the second $2, and so on. The symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of $0. Assigning to $0 shall reset the values of all other fields and the NF built-in variable.
Variables and Special Variables
References to nonexistent fields (that is, fields after $NF), shall evaluate to the uninitialized value. Such references shall not create new fields. However, assigning to a nonexistent field (for example, $(NF+2)=5) shall increase the value of NF; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, with the fields being separated by the value of OFS. Each field variable shall have a string value or an uninitialized value when created. Field variables shall have the uninitialized value when created from $0 using FS and the variable does not contain any characters.
source: POSIX standard: awk utility
Be aware that the default field separator FS=" " has some special rules
If you have GNU find you don't need awk at all.
$ find static/ -mindepth 1 -printf '%P\n'
conf
conf/server.xml
1st solution: Considering that in your output word static will come only once if this is the case try. I am simply making field separator as string static/ for lines and printing the last field of lines then which will be after word static/.
find static/* | awk -F'static/' '{print $NF}'
2nd solution: Adding a more generic solution here. Which will match values from very first occurrence of / to till last of the line and while printing it will not printing starting /.
find static/* | awk 'match($0,/\/.*/){print substr($0,RSTART+1,RLENGTH)}'
When you reset the first field value the field is still there. Just remove the initial / chars after that with sub(/^\/+/, "") (where ^\/+ pattern matches one or more / chars at the start of the string):
awk 'BEGIN{OFS=FS="/"} {$1="";sub(/^\/+/, "")}1'
See an online demo:
s="static/conf
static/conf/server.xml"
awk 'BEGIN{OFS=FS="/"} {$1="";sub(/^\/+/, "")}1' <<< "$s"
Output:
conf
conf/server.xml
Note that with BEGIN{OFS=FS="/"} you set the input/output field separator just once at the start, and 1 at the end triggers the default line print operation.

awk: Adding a new column based on concatenated value of two columns

I am trying to add a new column to a text file based on the concatenated values of two columns. Value is being inserted in the middle instead of the end of the string.
I am using awk. Here are two sample lines
$ head -1 file.txt
8502CC169154|02|GA|TN|89840|9|2008-11-15 00:00:00.000|2009-11-15 00:00:00.000|1|TEAM1|1639009|1000000|0|2008-11-15 00:00:00.000|2009-11-15 00:00:00.000|85|00|37421||241|20|331|1052A|5000|0|.1500|Chattanooga|47065|.000|025|35|25000|0|0|0|0|0|718||E|-17.00|-17.00|-17.00|-17.00|-17.00|-2.55|-2.55|-2.55|-2.55|D|C9N7I4115531902|-2.19|-2.19|-2.19|-2.19|-14.81|051|2008-12-31 00:00:00.000|151|2008-12-17 00:00:00.000|||AC|CC|Y||2008-12-31 00:00:00.000|.000000|A|.000000|.000000|.000000|Y|8502CC169154-8|8502CC169154|8|||122130|122130M|7764298|RA
I tried the following.
$ head -1 file.txt | awk -F'|' '{$(NF+1)=$1"-"$6;}1' OFS='|'
I am expecting a new column at the end of the string. But you can see that the concatenated field is being inserted in the middle of the string instead of the end of the string.
8502CC169154|02|GA|TN|89840|9|2008-11-15 00:00:00.000|2009-11-15 00:00:00.000|1|TEAM1|1639009|1000000|0|2008-11-15 00:00:00.000|2009-11-15 00:00:00.000|85|00|37421||241|20|331|1052A|5000|0|.1500|Chattanooga|47065|.000|025|35|25000|0|0|0|0|0|718||E|-17.00|-17.00|-17.00|-17.00|-17.00|-2.55|-2.55|-2.55|-2.55|D|C9N7I4115531902|-2.19|-2.19|-2.19|-2.19|-14.81|051|2008-12-31 00:00:00.000|151|2008|8502CC169154-9.000|||AC|CC|Y||2008-12-31 00:00:00.000|.000000|A|.000000|.000000|.000000|Y|8502CC169154-8|8502CC169154|8|||122130|122130M|7764298|RA
Your original code works for me using GNU awk but I suspect that not all awks support setting $(NF+1). To avoid that, try:
head -1 file.txt | awk -F'|' '{$0=$0 FS $1"-"$6;}1' OFS='|'
Awk is a surprising powerful language and it has all the capabilities that head has, making the pipeline unnecessary. So, for greater efficiency, try the simple command:
awk -F'|' '{print $0 FS $1"-"$6; exit}' file.txt
How it works:
-F'|'
This sets the field separator to a vertical bar.
print $0 FS $1"-"$6
This prints the output line that you want which consists of the original line, $0, followed by a field separator, FS, followed by combination of the first field, a dash, and the sixth field.
exit
After the first line is printed, this tells awk to exit. This eliminates the need for head -1.

Output field separators in awk after substitution in fields

Is it always the case, after modifying a specific field in awk, that information on the output field separator is lost? What happens if there are multiple field separators and I want them to be recovered?
For example, suppose I have a simple file example that contains:
a:e:i:o:u
If I just run an awk script, which takes account of the input field separator, that prints each line in my file, such as running
awk -F: '{print $0}' example
I will see the original line. If however I modify one of the fields directly, e.g. with
awk -F: '{$2=$2"!"; print $0}' example
I do not get back a modified version of the original line, rather I see the fields separated by the default whitespace separator, i.e:
a e! i o u
I can get back a modified version of the original by specifying OFS, e.g.:
awk -F: 'BEGIN {OFS=":"} {$2=$2"!"; print $0}' example
In the case, however, where there are multiple potential field separators but in the case of multiple separators is there a simple way of restoring the original separators?
For example, if example had both : and ; as separators, I could use -F":|;" to process the file but OFS would no be sufficient to restore the original separators in their relative positions.
More explicitly, if we switched to example2 containing
a:e;i:o;u
we could use
awk -F":|;" 'BEGIN {OFS=":"} {$2=$2"!"; print $0}' example2
(or -F"[:;]") to get
a:e!:i:o:u
but we've lost the distinction between : and ; which would have been maintained if we could recover
a:e!;i:o;u
You need to use GNU awk for the 4th arg to split() which saves the separators, like RT does for RS:
$ awk -F'[:;]' '{split($0,f,FS,s); $2=$2"!"; r=s[0]; for (i=1;i<=NF;i++) r=r $i s[i]; $0=r} 1' file
a:e!;i:o;u
There is no automatically populated array of FS matching strings because of how expensive it'd be in time and memory to store the string that matches FS every time you split a record into fields. Instead the GNU awk folks provided a 4th arg to split() so you can do it yourself if/when you want it. That is the result of a long conversation a few years ago in the comp.lang.awk newsgroup between experienced awk users and gawk providers before all agreeing that this was the best approach.
See split() at https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions.

Enclosing a single quote in Awk

I currently have this line of code, that needs to be increased by one every-time in run this script. I would like to use awk in increasing the third string (570).
'set t 570'
I currently have this to change the code, however I am missing the closing quotation mark. I would also desire that this only acts on this specific (above) line, however am unsure about where to place the syntax that awk uses to do that.
awk '/set t /{$3+=1} 1' file.gs >file.tmp && mv file.tmp file.gs
Thank you very much for your input.
Use sub() to perform a replacement on the string itself:
$ awk '/set t/ {sub($3+0,$3+1,$3)} 1' file
'set t 571'
This looks for the value in $3 and replaces it with itself +1. To avoid replacing all of $3 and making sure the quote persists in the string, we say $3+0 so that it evaluates to just the number, not the quote:
$ echo "'set t 570'" | awk '{print $3}'
570'
$ echo "'set t 570'" | awk '{print $3+0}'
570
Note this would fail if the value in $3 happens more times in the same line, since it will replace all of them.