awk -OFS vs OFS gives different output [closed] - awk

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have a sample file:
đŸ’»~/dev/test[1]â‘‚master*$ cat test.properties
startTime: 0515
stopTime: 2015
dataFiles: foo
fixVersion: 4.2
retry: 5
kafkaRelay.type: kafkaSink
kafkaRelay.producerId: blah
kafkaRelay.partitioningTag: 49
kafkaRelay.topic: topicname-pre-transform-{0,date,yyyyMMdd}
I have a particular awk command I want to run. It gives different output when I use -F vs {BEGIN FS = ...:
đŸ’»~/dev/test[4]â‘‚master*$ awk 'BEGIN{ FS = ": *"; OFS =": " } \
$1 ~ /(startTime|fixVersion)/ {print $2, $1}; \
$1 ~ /kafkaRelay.topic/ {$1="kafkaWriter.topic";print; $1="kafkaReader.topic"; print}; \
$1 ~ /stopTime/ { $2+=100; $2%=2400; printf("%s: %04d\n", $1, $2) }' test.properties
0515: startTime
stopTime: 2115
4.2: fixVersion
kafkaWriter.topic: topicname-pre-transform-{0,date,yyyyMMdd}
kafkaReader.topic: topicname-pre-transform-{0,date,yyyyMMdd}
đŸ’»~/dev/test[5]â‘‚master*$ awk -F': *' -OFS': ' \
'$1 ~ /(startTime|fixVersion)/ {print $2, $1}; \
$1 ~ /kafkaRelay.topic/ {$1="kafkaWriter.topic";print; $1="kafkaReader.topic"; print}; \
$1 ~ /stopTime/ { $2+=100; $2%=2400; printf("%s: %04d\n", $1, $2) }' test.properties
startTime: 0515
stopTime: 2015: 0100
fixVersion: 4.2
kafkaWriter.topic
kafkaReader.topic
The first version outputs exactly as I expect. The second version has a bunch of differences and I don't understand how they come about. I also tried 1-4 \ in front of the * in the second version, hoping it was something to do with escaping the *, but that had no effect.
Why does this happen? I followed awk's regexp field splitting, and the command line field separator doesn't have any special instructions for -F vs FS =. The only StackOverflow question I could find fails to use BEGIN, which isn't my problem.
For reference:
đŸ’»~/dev/test[6]â‘‚master*$ awk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)

The problem is the use of -OFS.
POSIX guidelines for command-line parsing indicate that after a single dash, flags are parsed character by character. Thus, this means -O, and -FS -- with the -FS overriding the -F ': *' with a value of just S.
If you want to set OFS, doing it in a BEGIN block is the Right Thing.

Related

can't print anything after last field using awk [duplicate]

This question already has answers here:
Why does my tool output overwrite itself and how do I fix it?
(3 answers)
Closed 5 months ago.
I'm having trouble doing something very simple with awk. I'd like to print the last field, followed by another field.
Input file looks like this:
03 Oct 22, Southern ,Mad,WIN,Gro,,33.10
03 Oct 22, Mpd ,Mad,WIN,Auto,-208.56,
23 Sep 22, Thank ,n/a,WIN,,-97.93,
This way round works fine:
$ awk -F',' '{print "first " $6 " and then " $7}' input.csv
first and then 33.10
first -208.56 and then
first -97.93 and then
But when I swap the fields over I get the strangest result:
$ awk -F',' '{print "first " $7 " and then " $6}' input.csv
and then 0
and then -208.56
and then -97.93
I must be missing something really simple. What on earth is going on?
$ awk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)
Only suggestion I have is to update awk. It works perfectly fine on my MacBook - with this version:
awk --version
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)

How to extract number with awk in quotes after equal sign

I have something like this in my parameters:
config_version = "1.2.3"
I am trying to get 1.2.3 without quotes with awk command, is it possible ?
how I get quoted number:
awk '/config_version =/ {print $3}' params.txt
output: "1.2.3"
desired: 1.2.3
find the line with the right label, trim the quotes of the value and print.
$ awk '$1=="config_version"{gsub(/"/,"",$NF); print $NF}' file
And also with awk:
$ echo 'config_version = "1.2.3"' | awk -F'=' '{gsub(/"/,"",$2);print $2}'
1.2.3
I'd use gsub to remove leading and trailing "s:
$ awk '{gsub(/^"|"$/,"",$3);print $3}'
The obligatory (or, perhaps "one of", rather than "the". There are lots of ways to do this!) sed solution:
sed -n '/^config_version *= */{y/"/ /; s///p;}'
Note that this leaves a trailing space in the result.
Use grep:
echo 'config_version = "1.2.3"' | grep -Po 'config_version\s+=\s+"\K[^"]+'
1.2.3
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
grep manual
perlre - Perl regular expressions
You might set FS so " would be treated as part of field seperator, let file.txt content be:
config_version = "1.2.3"
then
awk 'BEGIN{FS="[ \"]+"}/config_version =/{print $3}'
output
1.2.3
Explanation: I instruced AWK to treaty any non-empty string consisting of spaces or " or combination thereof to be treated as field seperator. If you want to know more about FS and others I suggest reading 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR.
(tested in gawk 4.2.1)
Using gnu awk, you might also use a pattern with a capture group and print the group 1 value using m[1]
awk 'match($0, /config_version = "([^"]+)"/, m) {print m[1]}' file
If there should be digits optionally followed by a dot an digits:
awk 'match($0, /config_version = "([0-9]+(\.[0-9]+)*)"/, m) {print m[1]}' file
Output
1.2.3
there are 3 ways to do it. the clean way like (\042 octal is double quote " )
{mawk 1/2 | gawk} 'BEGIN { FS = "\042" } $1 ~ /config_version =$/ {print $2}'
I specify $1 ~ in the offball chance that it's a phrase that shows up AFTER the version number, if data was misformatted. Another more extreme version of it asks FS to do all the work
{mawk 1/2 | gawk} 'BEGIN { FS = "(^[ \t]*config_version =\042|\042.*$)"
} NF==3 {print $2}'
Here i let FS gobble up the rest of the record, from left to right, so NF==3 provides enforcement exactly only this scenario will show up. And finally, a purist approach
{mawk 1/2 | gawk} 'BEGIN { FS = "(^[ \t]*config_version =\042|\042.*$)" ;
OFS = "" ;} ( NF == 3 ) && ( $1 = $1 )'

awk with $var as search pattern [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I am facing a problem where I want awk to search for a pattern that comes from a bash variable...
Now it looks like:
for i in ${CLASS_LIST[*]}; do
cat log.txt | awk -F ":" '$1 ~ /$i/ {print substr($0, index($0,$2))}'
done
But awk has no output here.
If I do it like:
cat log.txt | awk -F ":" '$1 ~ /LABEL/ {print substr($0, index($0,$2))}
it works well... what is wrong in the first example?
Thanks a lot for your help in advance.
regards,
Joerg
Thanks for the tip, Cyrus. I've already tried the "-v" option with the same result.
But I found my mistake...
when give a variable as search pattern, don't pack it in to "/"...
This is working for me:
for i in ${CLASS_LIST[*]}; do
cat log.txt | awk -v var="$i" -F ":" '$1 ~ var {print substr($0, index($0,$2))}'
done
Regards,
Joerg

Naming of variables in gawk when using gensub

I am doing string substitution in gawk. The code below is a simplified version (the real replacement argument to gensub involves lots of "\\1\\3\\2", which is why I can't use sub/gsub). My question is one of robustness: since I'm modifying the 1st field ($1) with gensub, can I store the output of gensub in the variable $1, or does this have potential to cause problems (in other contexts; it works fine in my code)?
# test data
$ printf "Header_1\tHeader_2\nHiC_scaffold_1_1234\t1234\nHiC_scaffold_2_7890\t7890\n" > input.txt
# code I'm using (works as expected)
$ gawk 'BEGIN {FS = "\t"} FNR == 1 {next} \
> {one = gensub(/HiC_scaffold_([0-9]+)_([0-9]+) ?/, "HIC_SCAFFOLD_\\2_\\1", "g", $1)} \
> {print $2 "\t" one}' \
> input.txt > output.txt1
# code I'm asking about (works as expected with these test data)
$ gawk 'BEGIN {FS = "\t"} FNR == 1 {next} \
> {$1 = gensub(/HiC_scaffold_([0-9]+)_[0-9]+ ?/, "HIC_SCAFFOLD_\\2_\\1", "g", $1)} \
> {print $2 "\t" $1}' \
> input.txt > output.txt2
$ head *txt*
==> input.txt <==
Header_1 Header_2
HiC_scaffold_1_1234 1234
HiC_scaffold_2_7890 7890
==> output.txt1 <==
1234 HIC_SCAFFOLD_1
7890 HIC_SCAFFOLD_2
==> output.txt2 <==
1234 HIC_SCAFFOLD_1
7890 HIC_SCAFFOLD_2
If I got you correctly, you are asking for a bit of a review on the second code.
Can you assign a field? Yes, so $1 = gensub(something) is ok (ref).
Potential issues? Yes: if $n doesn't exist, then you are creating it, and thus modifying $0 as well. You are doing it on $1, as far as I know, if a record exists ($0) then it must have at least one field ($1) - might be empty.
Another caveat would be if you were assigning to $0, but feels a little bit out of scope. Do not attempt to $1 = $1 after your gensub().
Finally, let's have a look at gensub(). If you provide no target to it, then it falls back to use $0. You are not doing so.
In the end, I cannot see a trivial situation where this can go wrong. Your code seems fine to me.

Different results in awk when using different FS syntax

I have a sample file which contains the following.
logging.20160309.113.txt.log: 0 Rows successfully loaded.
logging.20160309.1180.txt.log: 0 Rows successfully loaded.
logging.20160309.1199.txt.log: 0 Rows successfully loaded.
I currently am familiar with 2 ways of implementing a Field Separator syntax in awk. However, I am currently getting different results.
For the longest time I use
"FS=" syntax when my FS is more than one character.
"-f" flag when my FS is just one character.
I would like to understand why the FS= syntax is giving me an unexpected result as seen below. Somehow the 1st record is being left behind.
$ head -3 reload_list | awk -F"\.log\:" '{ print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
$ head -3 reload_list | awk '{ FS="\.log\:" } { print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt
The reason you are getting different results, is that in the case where you set FS in the awk program, it is not in a BEGIN block. So by the time you've set it, the first record has already been parsed into fields (using the default separator).
Setting with -F
$ awk -F"\\.log:" '{ print $1 }' b.txt
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
Setting FS after parsing first record
$ awk '{ FS= "\\.log:"} { print $1 }' b.txt
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt
Setting FS before parsing any records
$ awk 'BEGIN { FS= "\\.log:"} { print $1 }' b.txt
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
I noticed this relevant bit in an awk manual. If perhaps you've seen different behavior previously or with a different implementation, this could explain why:
According to the POSIX standard, awk is supposed to behave as if
each record is split into fields at the time that it is read. In
particular, this means that you can change the value of FS after a
record is read, but before any of the fields are referenced. The value
of the fields (i.e. how they were split) should reflect the old value
of FS, not the new one.
However, many implementations of awk do not do this. Instead,
they defer splitting the fields until a field reference actually
happens, using the current value of FS! This behavior can be
difficult to diagnose.
-f is for running a script from a file. -F and FS works the same
$ awk -F'.log' '{print $1}' logs
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
$ awk 'BEGIN{FS=".log"} {print $1}' logs
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt