understanding SUB used in an AWK command - awk

I need to understand how this command is working:
awk 'BEGIN{while(a++<30)s=s " "};{sub(/^.{6}/,"&" s)};l' myfile
I understand how the first part (the expression in the BEGIN{} section) creates a 30 character long string of spaces. But don't understand the second part (sub).
The sub adds the recently generated string "s" to the 6th column of 'myfile'. But the way I see the command, the search pattern /^.{6}/ should look for all lines that start with one character (.) and then {6} and replace those with space-added string!
Can you please help me to understand this better?

It has nothing to do with the 6th column, and it's not looking for a literal {6}.
The curly braces mean "this many of the preceding pattern" (if you invoke GNU awk with --posix or --re-interval).
So this pattern:
/^.{6}/
Is equivalent to this:
/^....../
What it's doing is adding the string s after the first 6 characters, which may be any characters.
The following awk command would do something similar:
awk 'BEGIN{while(a++<30)s=s " "} {print substr($0, 1, 6) s substr($0, 7)}' myfile

See #BillKarwin's answer for what it's doing, and see the 2nd awk script below for the more concise way to do it:
$ cat file
abcdefghi
$ awk 'BEGIN{while(a++<30)s=s " "} {sub(/^.{6}/,"&" s)} 1' file
abcdef ghi
$ awk '{printf "%-36s%s\n",substr($0,1,6),substr($0,7)}' file
abcdef ghi

Related

awk command works, but not in openwrt's awk

Works here: 'awk.js.org/`
but not in openwrt's awk, which returns the error message:
awk: bad regex '^(server=|address=)[': Missing ']'
Hello everyone!
I'm trying to use an awk command I wrote which is:
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
Which counts invalid lines in a dns blocklist (oisd in this case):
Input would be eg:
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
Output for this should be "2" since there are two lines that don't match the criteria (correctly formed address, comments, or blank lines).
I've tried formatting the command every which way with [], but can't find anything that works. Does anyone have an idea what format/syntax/option needs adjusting?
Thanks!
To portably include - in a bracket expression it has to be the first or last character, otherwise it means a range, and \s is shorthand for [[:space:]] in only some awks. This will work in any POSIX awk:
$ awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:].-]+([/]|[/]#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
Per #tripleee's comment below if your awk is broken such that a / inside a bracket expression isn't treated as literal then you may need this instead:
$ awk '!/^(server=|address=)\/[[:alnum:]][[:alnum:].-]+(\/|\/#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
but get a new awk, e.g. GNU awk, as who knows what other surprises the one you're using may have in store for you!
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
- has special meaning inside [ and ], it is used to denote range e.g. [A-Z] means uppercase ASCII letter, use \ escape sequence to make it literal dash, let file.txt content be
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
then
awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:]\-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}' file.txt
gives output
2
You might also consider replacing \s using [[:space:]] in order to main consistency.
(tested in GNU Awk 5.0.1)

how to use "," as field delimiter [duplicate]

This question already has answers here:
Escaping separator within double quotes, in awk
(3 answers)
Closed 1 year ago.
i have a file like this:
"1","ab,c","def"
so only use comma a field delimiter will get wrong result, so i want to use "," as field delimiter, i tried like this:
awk -F "," '{print $0}' file
or like this:
awk -F "","" '{print $0}' file
or like this:
awk -F '","' '{print $0}' file
but the result is incorrect, don't know how to include "" as part of the field delimiter itself,
If you can handle GNU awk, you could use FPAT:
$ echo '"1","ab,c","def"' | # echo outputs with double quotes
gawk ' # use GNU awk
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")" # because FPAT
}
{
for(i=1;i<=NF;i++) # loop all fields
gsub(/^"|"$/,"",$i) # remove leading and trailing double quotes
print $2 # output for example the second field
}'
Output:
ab,c
FPAT cannot handle RS inside the quotes.
What you are attempting seems misdirected anyway. How about this instead?
awk '/^".*"$/{ sub(/^\"/, ""); sub(/\"$/, ""); gsub(/\",\", ",") }1'
The proper solution to handling CSV files with quoting in them is to use a language which has an actual CSV parser. My thoughts go to Python, which includes a csv module in its standard library.
In GNU AWK
{print $0}
does print whole line, if no change were made original line is printed, no matter what field separator you set you will get original lines if only action is print $0. Use $1=$1 to trigger string rebuild.
If you must do it via FS AT ANY PRICE, then you might do it as follows: let file.txt content be
"1","ab,c","def"
then
BEGIN{FS="\x22,?\x22?"}{$1=$1;print $0}
output
1 ab,c def
Note leading space (ab,c is $3). Explanation: I inform GNU AWK that field separator is literal " (\x22, " is 22(hex) in ASCII) followed by zero or one (?) , followed by zero or one (?) literal " (\x22). $1=$1 trigger line rebuilt as mentioned earlier. Disclaimer: this solution assume that you never have escaped " inside your string,
(tested in gawk 4.2.1)

Awk: gsub("\\\\", "\\\\") yields suprising results

Consider the following input:
$ cat a
d:\
$ cat a.awk
{ sub("\\", "\\\\"); print $0 }
$ cat a_double.awk
{ sub("\\\\", "\\\\"); print $0 }
Now running cat a | awk -f a.awk gives
d:\
and running cat a | awk -f a_double.awk gives
d:\\
and I expect exactly the other way around. How should I interpret this?
$ awk -V
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
Yes, its expected behavior of awk. When you run sub("\\", "\\\\") in your first script, in sub's inside "(double quotes) since we are NOT using / for matching pattern we need to escape first \(actual literal character) then for escaping we are using \ so we need to escape that also, hence it will become \\\\
\\ \\
| |
| |
first 2 chars are denoting escaping next 2 chars are denoting actual literal character \
Which is NOT happening your 1st case hence NO match so no substitution in it, in your 2nd awk script you are doing this(escaping part in regex matching section of sub) hence its matching \ perfectly.
Let's see this by example and try putting ... for checking purposes.
When Nothing happens: Since no match on
awk '{sub("\\", "....\\\\"); print $0}' Input_file
d:\
When pattern matching happens:
awk '{sub("\\\\", "...\\\\"); print $0}' Input_file
d:...\\
From man awk:
gsub(r, s [, t])
For each substring matching the regular expression r in the string t,
substitute the string s, and return the number of substitutions.
How could we could do perform actual escaping part(where we need to use only \ before character only once)? Do mention your regexp in /../ in first section of sub like and we need NOT to double escape \ here.
awk '{sub(/\\/,"&\\")} 1' Input_file
The first arg to *sub() is a regexp, not a string, so you should use regexp (/.../) rather than string ("...") delimiters. The former is a literal regexp which is used as-is while the latter defines a dynamic (or computed) regexp which forces awk to interpret the string twice, the first time to convert the string to a regexp and the second to use it as a regexp, hence double the backslashes needed for escaping. See https://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps.
In the following we just need to escape the backslash once because we're using a literal, rather than dynamic, regexp:
$ cat a
d:\
$ awk '{sub(/\\/,"\\\\")}1' a
d:\\
Your first script would rightfully produce a syntax error in a more recent version of gawk (5.1.0) since "\\" in a dynamic regexp is equivalent to /\/ in a literal one and in that expression the backslash is escaping the final forward slash which means there is no final delimiter:
$ cat a.awk
{ sub("\\", "\\\\"); print $0 }
$ awk -f a.awk a
awk: a.awk:1: (FILENAME=a FNR=1) fatal: invalid regexp: Trailing backslash: /\/

Enclosing a single quote in Awk

I currently have this line of code, that needs to be increased by one every-time in run this script. I would like to use awk in increasing the third string (570).
'set t 570'
I currently have this to change the code, however I am missing the closing quotation mark. I would also desire that this only acts on this specific (above) line, however am unsure about where to place the syntax that awk uses to do that.
awk '/set t /{$3+=1} 1' file.gs >file.tmp && mv file.tmp file.gs
Thank you very much for your input.
Use sub() to perform a replacement on the string itself:
$ awk '/set t/ {sub($3+0,$3+1,$3)} 1' file
'set t 571'
This looks for the value in $3 and replaces it with itself +1. To avoid replacing all of $3 and making sure the quote persists in the string, we say $3+0 so that it evaluates to just the number, not the quote:
$ echo "'set t 570'" | awk '{print $3}'
570'
$ echo "'set t 570'" | awk '{print $3+0}'
570
Note this would fail if the value in $3 happens more times in the same line, since it will replace all of them.

Unable to match regex in string using awk

I am trying to fetch the lines in which the second part of the line contains a pattern from the first part of the line.
$ cat file.txt
String1 is a big string|big
$ awk -F'|' ' { if ($2 ~ /$1/) { print $0 } } ' file.txt
But it is not working.
I am not able to find out what is the mistake here.
Can someone please help?
Two things: No slashes, and your numbers are backwards.
awk -F\| '$1~$2' file.txt
I guess what you meant is part of the string in the first part should be a part of the 2nd part.if this is what you want! then,
awk -F'|' '{n=split($1,a,' ');for(i=1,i<=n;i++){if($2~/a[i]/)print $0}}' your_file
There are surprisingly many things wrong with your command line:
1) You aren't using the awk condition/action syntax but instead needlessly embedding a condition within an action,
2) You aren't using the default awk action but instead needlessly hand-coding a print $0.
3) You have your RE operands reversed.
4) You are using RE comparison but it looks like you really want to match strings.
You can fix the first 3 of the above by modifying your command to:
awk -F'|' '$1~$2' file.txt
but I think what you really want is "4" which would mean you need to do this instead:
awk -F'|' 'index($1,$2)' file.txt