What is the function of a comma between arguments in awk?

What is the function of a comma between arguments in awk? - awk

For this HackerRank bash challenge (round to 3 decimal places), the following solution works well:
$ echo '5+50*3/20 + (19*2)/7' | bc -l | awk '{ printf ("%.3f \n",$1) }'
17.929
whereas the same without a comma between printf's format string and the $1 produces the following error on a bash prompt:
$ echo '5+50*3/20 + (19*2)/7' | bc -l | awk '{ printf ("%.3f \n" $1) }'
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: not enough arguments to satisfy format string
`%.3f
17.92857142857142857142'
^ ran out for this one
The error message suggests that the $1 without comma is not supplied as an argument to printf, but its elision has hitherto not caused me issues (awk '{ print $0 " with appendix." }' happily prints the appended text). Understandably, searching the manual for values separated by commas is not helpful. What is the function of the comma in separating arguments in awk (aside from inserting a space between strings)? Additionally: what are the round brackets doing in the example? For what it's worth, HackerRank gives the following error:
bc -l | awk '{ printf ("%.3f \n" $1) }'
Your Output (stdout)
0.000
17.92857142857142857142

First of all you don't even need awk to restrict decimal number to 3 decimal points. bc itself can do that:
bc -l <<< 'scale=3; 5+50*3/20 + (19*2)/7'
17.928
Now question about printf, syntax of printf should be:
printf format, item1, item2, …
But when you use it like this:
printf ("%.3f \n" $1)
You don't supply enough number of arguments to satisfy %.3f format string (since "%.3f \n" and $1 are concatenated into a single string), hence you get this error:
not enough arguments to satisfy format string
Even if you put parentheses around, it doesn't make error go away. (...) is optional in printf so it can be either of these 2 statements:
printf "%.3f \n", $1
printf ("%.3f \n", $1)

awk does not have an explicit string concatenation operator. Two strings are concatenated by simply placing then side-by-side
print "foo" "bar" # => prints "foobar"
When you omit the comma, you have essentially this:
fmt = "%.3f \n" $1 # the string => "%.3f\n17.92"
printf (fmt)
and theres a %f directive but no value given.

The error message suggests that the $1 without comma is not supplied as an argument to printf, but its elision has hitherto not caused me issues (awk '{ print $0 " with appendix." }' happily prints the appended text)
Yes. Both effects arise from the fact that awk concatenates adjacent strings without any explicit operator. And not only literals. See section 6.2.2 of the manual for details and examples. In the case of your print statement, that produces an effect that serves your purpose, but in the case of your printf call, it means that you are passing only one, concatenated argument to printf, which it interprets as a format string.
When you put a comma between the strings, whether in a print statement or in a printf call, you get a list of two items instead of a single, concatenated string.

Related

Awk: gsub("\\\\", "\\\\") yields suprising results

Consider the following input:
$ cat a
d:\
$ cat a.awk
{ sub("\\", "\\\\"); print $0 }
$ cat a_double.awk
{ sub("\\\\", "\\\\"); print $0 }
Now running cat a | awk -f a.awk gives
d:\
and running cat a | awk -f a_double.awk gives
d:\\
and I expect exactly the other way around. How should I interpret this?
$ awk -V
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)

Yes, its expected behavior of awk. When you run sub("\\", "\\\\") in your first script, in sub's inside "(double quotes) since we are NOT using / for matching pattern we need to escape first \(actual literal character) then for escaping we are using \ so we need to escape that also, hence it will become \\\\
\\ \\
| |
| |
first 2 chars are denoting escaping next 2 chars are denoting actual literal character \
Which is NOT happening your 1st case hence NO match so no substitution in it, in your 2nd awk script you are doing this(escaping part in regex matching section of sub) hence its matching \ perfectly.
Let's see this by example and try putting ... for checking purposes.
When Nothing happens: Since no match on
awk '{sub("\\", "....\\\\"); print $0}' Input_file
d:\
When pattern matching happens:
awk '{sub("\\\\", "...\\\\"); print $0}' Input_file
d:...\\
From man awk:
gsub(r, s [, t])
For each substring matching the regular expression r in the string t,
substitute the string s, and return the number of substitutions.
How could we could do perform actual escaping part(where we need to use only \ before character only once)? Do mention your regexp in /../ in first section of sub like and we need NOT to double escape \ here.
awk '{sub(/\\/,"&\\")} 1' Input_file

The first arg to *sub() is a regexp, not a string, so you should use regexp (/.../) rather than string ("...") delimiters. The former is a literal regexp which is used as-is while the latter defines a dynamic (or computed) regexp which forces awk to interpret the string twice, the first time to convert the string to a regexp and the second to use it as a regexp, hence double the backslashes needed for escaping. See https://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps.
In the following we just need to escape the backslash once because we're using a literal, rather than dynamic, regexp:
$ cat a
d:\
$ awk '{sub(/\\/,"\\\\")}1' a
d:\\
Your first script would rightfully produce a syntax error in a more recent version of gawk (5.1.0) since "\\" in a dynamic regexp is equivalent to /\/ in a literal one and in that expression the backslash is escaping the final forward slash which means there is no final delimiter:
$ cat a.awk
{ sub("\\", "\\\\"); print $0 }
$ awk -f a.awk a
awk: a.awk:1: (FILENAME=a FNR=1) fatal: invalid regexp: Trailing backslash: /\/

Understanding awk delimiter - escaping in a regex-based field separator

I have the following shell command:
awk -F'\[|\]' '{print $2}'
What is this command doing? Split into fields using as delimiter [sometext]?
E.g.:
$ echo "this [line] passed to awk" | awk -F'\[|\]' '{print $2}'
line
Editor's note: Only Mawk, as used on Ubuntu by default, produces the output above.

The apparent intent is to treat literal [ and ] as field-separator characters, i.e., to split each input record into fields by each occurrence of [ and/or ], which, with the sample line, yields this  as field 1 ($1), line as field 2 ($2), and  passed to awk as the last field ($3).
This is achieved by a regex (regular expression) that uses alternation (|), either side of which defines a field separator (delimiter): \[ and \] in a regex are needed to represent literal [ and ], because, by default, [ and ] are so-called metacharacters (characters with special syntactical meaning).
Note that awk always interprets the value of the FS variable (-F option) as a regex.
However, the correct form is '\\[|\\]':
$ echo "this [line] passed to awk" | awk -F'\\[|\\]' '{print $2}'
line
That said, a more concise version that uses a character set ([...]) rather than alternation (|) is:
$ echo "this [line] passed to awk" | awk -F'[][]' '{print $2}'
line
Note the careful placement of ] before [ inside the enclosing [...] to make this work, and how the enclosing [...] now have special meaning: they enclose a set of characters, any of which matches.
As for why 2 \ instances are needed in '\\[|\\]':
Taken as a regex in isolation, \[|\] would work:
\[ matches literal [
\] matches literal ]
| is an alternation that matches one or the other.
However, Awk's string processing comes first:
It should, due to \ handling in a string, reduce \[|\] to [|] before interpretation as a regex.
Unfortunately, however, Mawk, the default Awk on Ubuntu, for instance, resorts to guesswork in this particular scenario.[1]
[|], interpreted as a regex, would then only match a single, literal |
Thus, the robust and portable way is to use \\ in a string literal when you mean to pass a single \ as part of a regex.
This quote from the relevant section of the GNU Awk manual sums it up well:
To get a backslash into a regular expression inside a string, you have to type two backslashes.
[1] Implementation differences:
Unfortunately, at least 1 major Awk implementation resorts to guesswork in the presence of a single \ before a regex metacharacter inside a string literal.
BSD/macOS Awk and GNU Awk act predictably and GNU Awk also issues a helpful warning when a singly \-prefixed regex metacharacter is found:
# GNU Awk: Predictable string-first processing + a helpful warning.
echo 'a[b]|c' | gawk -F'\[|\]' '{print $2}'
gawk: warning: escape sequence '\[' treated as plain '['
gawk: warning: escape sequence '\]' treated as plain ']'
c
# BSD/macOS Awk: Predictable string-first processing, no warning.
echo 'a[b]|c' | awk -F'\[|\]' '{print $2}'
c
# Mawk: *Guesses* that a *regex* was intended.
# The unambiguous form -F'\\[|\\]' works too, fortunately.
echo 'a[b]|c' | mawk -F'\[|\]' '{print $2}'
b
Optional reading: regex literals inside Awk scripts
Awk supports regex literals enclosed in /.../, the use of which bypasses the double-escaping problem.
However:
These literals (which are invariably constant) are only available inside an Awk script,
and, it seems, you can only use them as patterns or function arguments - you cannot store them in a variable.
Therefore, even though /\[|\]/ is in principle equivalent to "\\[|\\]", you can not use the following, because the regex literal cannot be assigned to (special) variable FS:
# !! DOES NOT WORK in any of the 3 major Awk implementations.
# Note that nothing is output, and no error/warning is displayed.
$ echo 'a[b]|c' | awk 'BEGIN { FS=/\[|\]/ } { print $2 }'
# Using a double-escaped *string* to house the regex again works as expected:
$ echo 'a[b]|c' | awk 'BEGIN { FS="\\[|\\]" } { print $2 }'
b

How to use variable including special symbol in awk?

For my case, if a certain pattern is found as the second field of one line in a file, then I need print the first two fields. And it should be able to handle case with special symbol like backslash.
My solution is first using sed to replace \ with \\, then pass the new variable to awk, then awk will parse \\ as \ then match the field 2.
escaped_str=$( echo "$pattern" | sed 's/\\/\\\\/g')
input | awk -v awk_escaped_str="$escaped_str" '$2==awk_escaped_str { $0=$1 " " $2 " "}; { print } '
While this seems too complicated, and cannot handle various case.
Is there a better way which is more simpler and could cover all other special symbol?

The way to pass a shell variable to awk without backslashes being interpreted is to pass it in the arg list instead of populating an awk variable outside of the script:
$ shellvar='a\tb'
$ awk -v awkvar="$shellvar" 'BEGIN{ printf "<%s>\n",awkvar }'
<a b>
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]=""; printf "<%s>\n",awkvar }' "$shellvar"
<a\tb>
and then you can search a file for it as a string using index() or ==:
$ cat file
a b
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } index($0,awkvar)' "$shellvar" file
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } $0 == awkvar' "$shellvar" file
a\tb
You need to set ARGV[1]="" after populating the awk variable to avoid the shell variable value also being treated as a file name. Unlike any other way of passing in a variable, ALL characters used in a variable this way are treated literally with no "special" meaning.

There are three variations you can try without needing to escape your pattern:
This one tests literal strings. No regex instance is interpreted:
$2 == expr
This one tests if a literal string is a subset:
index($2, expr)
This one tests regex pattern:
$2 ~ pattern

In awk, how can I use a file containing multiple format strings with printf?

I have a case where I want to use input from a file as the format for printf() in awk. My formatting works when I set it in a string within the code, but it doesn't work when I load it from input.
Here's a tiny example of the problem:
$ # putting the format in a variable works just fine:
$ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}'
hello: world
foobar
$ # But getting the format from an input file does not.
$ echo "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
hello:\tworld\n\tfoobar
$
So ... format substitutions work ("%s"), but not special characters like tab and newline. Any idea why this is happening? And is there a way to "do something" to input data to make it usable as a format string?
UPDATE #1:
As a further example, consider the following using bash heretext:
[me#here ~]$ awk -vs="hello: %s\nworld: %s\n" '{printf(s, "foo", "bar");}' <<<""
hello: foo
world: bar
[me#here ~]$ awk '{s=$0; printf(s, "foo", "bar");}' <<<"hello: %s\nworld: %s\n"
hello: foo\nworld: bar\n[me#here ~]$
As far as I can see, the same thing happens with multiple different awk interpreters, and I haven't been able to locate any documentation that explains why.
UPDATE #2:
The code I'm trying to replace currently looks something like this, with nested loops in shell. At present, awk is only being used for its printf, and could be replaced with a shell-based printf:
#!/bin/sh
while read -r fmtid fmt; do
while read cid name addy; do
awk -vfmt="$fmt" -vcid="$cid" -vname="$name" -vaddy="$addy" \
'BEGIN{printf(fmt,cid,name,addy)}' > /path/$fmtid/$cid
done < /path/to/sampledata
done < /path/to/fmtstrings
Example input would be:
## fmtstrings:
1 ID:%04d Name:%s\nAddress: %s\n\n
2 CustomerID:\t%-4d\t\tName: %s\n\t\t\t\tAddress: %s\n
3 Customer: %d / %s (%s)\n
## sampledata:
5 Companyname 123 Somewhere Street
12 Othercompany 234 Elsewhere
My hope was that I'd be able to construct something like this to do the entire thing with a single call to awk, instead of having nested loops in shell:
awk '
NR==FNR { fmts[$1]=$2; next; }
{
for(fmtid in fmts) {
outputfile=sprintf("/path/%d/%d", fmtid, custid);
printf(fmts[fmtid], $1, $2) > outputfile;
}
}
' /path/to/fmtstrings /path/to/sampledata
Obviously, this doesn't work, both because of the actual topic of this question and because I haven't yet figured out how to elegantly make awk join $2..$n into a single variable. (But that's the topic of a possible future question.)
FWIW, I'm using FreeBSD 9.2 with its built in, but I'm open to using gawk if a solution can be found with that.

Why so lengthy and complicated an example? This demonstrates the problem:
$ echo "" | awk '{s="a\t%s"; printf s"\n","b"}'
a b
$ echo "a\t%s" | awk '{s=$0; printf s"\n","b"}'
a\tb
In the first case, the string "a\t%s" is a string literal and so is interpreted twice - once when the script is read by awk and then again when it is executed, so the \t is expanded on the first pass and then at execution awk has a literal tab char in the formatting string.
In the second case awk still has the characters backslash and t in the formatting string - hence the different behavior.
You need something to interpret those escaped chars and one way to do that is to call the shell's printf and read the results (corrected per #EtanReiser's excellent observation that I was using double quotes where I should have had single quotes, implemented here by \047, to avoid shell expansion):
$ echo 'a\t%s' | awk '{"printf \047" $0 "\047 " "b" | getline s; print s}'
a b
If you don't need the result in a variable, you can just call system().
If you just wanted the escape chars expanded so you don't need to provide the %s args in the shell printf call, you'd just need to escape all the %s (watching out for already-escaped %s).
You could call awk instead of the shell printf if you prefer.
Note that this approach, while clumsy, is much safer than calling an eval which might just execute an input line like rm -rf /*.*!
With help from Arnold Robbins (the creator of gawk), and Manuel Collado (another noted awk expert), here is a script which will expand single-character escape sequences:
$ cat tst2.awk
function expandEscapes(old, segs, segNr, escs, idx, new) {
split(old,segs,/\\./,escs)
for (segNr=1; segNr in segs; segNr++) {
if ( idx = index( "abfnrtv", substr(escs[segNr],2,1) ) )
escs[segNr] = substr("\a\b\f\n\r\t\v", idx, 1)
new = new segs[segNr] escs[segNr]
}
return new
}
{
s = expandEscapes($0)
printf s, "foo", "bar"
}
.
$ awk -f tst2.awk <<<"hello: %s\nworld: %s\n"
hello: foo
world: bar
Alternatively, this shoudl be functionally equivalent but not gawk-specific:
function expandEscapes(tail, head, esc, idx) {
head = ""
while ( match(tail, /\\./) ) {
esc = substr( tail, RSTART + 1, 1 )
head = head substr( tail, 1, RSTART-1 )
tail = substr( tail, RSTART + 2 )
idx = index( "abfnrtv", esc )
if ( idx )
esc = substr( "\a\b\f\n\r\t\v", idx, 1 )
head = head esc
}
return (head tail)
}
If you care to, you can expand the concept to octal and hex escape sequences by changing the split() RE to
/\\(x[0-9a-fA-F]*|[0-7]{1,3}|.)/
and for a hex value after the \\:
c = sprintf("%c", strtonum("0x" rest_of_str))
and for an octal value:
c = sprintf("%c", strtonum("0" rest_of_str))

Since the question explicitly asks for an awk solution, here's one which works on all the awks I know of. It's a proof-of-concept; error handling is abysmal. I've tried to indicate places where that could be improved.
The key, as has been noted by various commentators, is that awk's printf -- like the C standard function it is based on -- does not interpret backslash-escapes in the format string. However, awk does interpret them in command-line assignment arguments.
awk 'BEGIN {if(ARGC!=3)exit(1);
fn=ARGV[2];ARGC=2}
NR==FNR{ARGV[ARGC++]="fmt="substr($0,length($1)+2);
ARGV[ARGC++]="fmtid="$1;
ARGV[ARGC++]=fn;
next}
{match($0,/^ *[^ ]+[ ]+[^ ]+[ ]+/);
printf fmt,$1,$2,substr($0,RLENGTH+1) > ("data/"fmtid"/"$1)
}' fmtfile sampledata
(
What's going on here is that the 'FNR==NR' clause (which executes only on the first file) adds the values (fmtid, fmt) from each line of the first file as command-line assignments, and then inserts the data file name as a command-line argument. In awk, assignments as command line arguments are simply executed as though they were assignments from a string constant with implicit quotes, including backslash-escape processing (except that if the last character in the argument is a backslash, it doesn't escape the implicit closing double-quote). This behaviour is mandated by Posix, as is the order in which arguments are processed which makes it possible to add arguments as you go.
As written, the script must be provided with exactly two arguments: the formats and the data (in that order). There is some room for improvement, obviously.
The snippet also shows two ways of concatenating trailing fields.
In the format file, I assume that the lines are well behaved (no leading spaces; exactly one space after the format id). With those constraints, substr($0, length($1)+2) is precisely the part of the line after the first field and a single space.
Processing the datafile, it may be necessary to do this with fewer constraints. First, the builtin match function is called with the regular expression /^ *[^ ]+[ ]+[^ ]+[ ]+/ which matches leading spaces (if any) and two space-separated fields, along with the following spaces. (It would be better to allow tabs, as well.) Once the regex matches (and matching shouldn't be assumed, so there's another thing to fix), the variables RSTART and RLENGTH are set, so substr($0, RLENGTH+1) picks up everything starting with the third field. (Again, this is all Posix-standard behaviour.)
Honestly, I'd use the shell printf for this problem, and I don't understand why you feel that solution is somehow sub-optimal. The shell printf interprets backslash escapes in formats, and the shell read -r will do the line splitting the way you want. So there's no reason for awk at all, as far as I can see.

Ed Morton shows the problem clearly (edit: and it's now complete, so just go accept it): awk's string literal processing handled the escapes, and file I/O code isn't a lexical analyzer.
It's an easy fix: decide what escapes you want to support, and support them. Here's a one-liner form if you're doing special-purpose work that doesn't need to handle escaped backslashes
awk '{ gsub(/\\n/,"\n"); gsub(/\\t/,"\t"); printf($0 "bar\n", "world"); }' <<\EOD
hello:\t%s\n\tfoo
EOD
but for doit-and-forgetit peace of mind just use the full form in the linked answer.

#Ed Morton's answer explains the problem well.
A simple workaround is to:
pass the format-string file contents via an awk variable, using command substitution,
assuming that file is not too large to be read into memory in full.
Using GNU awk or mawk:
awk -v formats="$(tr '\n' '\3' <fmtStrings)" '
# Initialize: Split the formats into array elements.
BEGIN {n=split(formats, aFormats, "\3")}
# For each data line, loop over all formats and print.
{ for(i=1;i<n;++i) {printf aFormats[i] "\n", $1, $2, $3} }
' sampleData
Note:
The advantage of this solution is that it works generically - you don't need to anticipate specific escape sequences and handle them specially.
On FreeBSD awk, this almost works, but - sadly - split() still splits by newlines, despite being given an explicit separator - this smells like a bug. Observed on versions 20070501 (OS X 10.9.4) and 20121220 (FreeBSD 10.0).
The above solves the core problem (for brevity, it omits stripping the ID from the front of the format strings and omits the output-file creation logic).
Explanation:
tr '\n' '\3' <fmtStrings replaces actual newlines in the format-strings file with \3 (0x3) characters, so as to be able to later distinguish them from the \n escape sequences embedded in the lines, which awk turns into actual newlines when assigning to variable formats (as desired).
\3 (0x3) - the ASCII end-of-text char. - was arbitrarily chosen as an auxiliary separator that is assumed not to be present in the input file.
Note that using \0 (NUL) is NOT an option, because awk interprets that as an empty string, causing split() to split the string into individual characters.
Inside the BEGIN block of the awk script, split(formats, aFormats, "\3") then splits the combined format strings back into individual format strings.

I had to create another answer to start clean, I believe I've come to a good solution, again with perl:
echo '%10s\t:\t%10s\r\n' | perl -lne 's/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg; printf "$_","hi","hello"'
hi : hello
That bad boy s/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg will translate any meta character I can think of, let us take a look with cat -A :
echo '%10s\t:\t%10s\r\n' | perl -lne 's/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg; printf "$_","hi","hello"' | cat -A
hi^I:^I hello^M$
PS. I didn't create that regex, I googled unquote meta and found here

What you are trying to do is called templating. I would suggest that shell tools are not the best tools for this job. A safe way to go would be to use a templating library such as Template Toolkit for Perl, or Jinja2 for Python.

The problem lies in the non-interpretation of the special characters \t and \n by echo: it makes sure that they are understood as as-is strings, and not as tabulations and newlines. This behavior can be controlled by the -e flag you give to echo, without changing your awk script at all:
echo -e "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
tada!! :)
EDIT:
Ok, so after the point rightfully raised by Chrono, we can devise this other answer corresponding to the original request to have the pattern read from a file:
echo "hello:\t%s\n\tfoo" > myfile
awk 'BEGIN {s="'$(cat myfile)'" ; printf(s "bar\n", "world")}'
Of course in the above we have to be careful with the quoting, as the $(cat myfile) is not seen by awk but interpreted by the shell.

This looks extremely ugly, but it works for this particular problem:
s=$0;
gsub(/'/, "'\\''", s);
gsub(/\\n/, "\\\\\\\\n", s);
"printf '%b' '" s "'" | getline s;
gsub(/\\\\n/, "\n", s);
gsub(/\\n/, "\n", s);
printf(s " bar\n", "world");
Replace all single quotes with shell-escaped single quotes ('\'').
Replace all escaped newline sequences that appear normally as \n with the sequence that appears as \\\\n. It would suffice to use \\\\n as the actual replacement string (meaning \\n would print if you printed it), but the version of gawk I have messes things up in POSIX mode.
Invoke the shell to execute printf '%b' 'escape'\''d format' and use awk's getline statement to retrieve the line.
Unescape \\n to yield a newline. This step wouldn't be necessary if gawk in POSIX mode played nicely.
Unescape \n to yield a newline.
Otherwise you're left to call the gsub function for each possible escape sequence, which is terrible for \001, \002, etc.

Graham,
Ed Morton's solution is the best (and perhaps only) one available.
I'm including this answer for a better explanation of WHY you're seeing what you're seeing.
A string is a string. The confusing part here is WHERE awk does the translation of \t to a tab, \n to a newline, etc. It appears NOT to be the case that the backslash and t get translated when used in a printf format. Instead, the translation happens at assignment, so that awk stores the tab as part of the format rather than translating when it runs the printf.
And this is why Ed's function works. When read from stdin or a file, no assignment is performed that will implement the translation of special characters. Once you run the command s="a\tb"; in awk, you have a three character string containing no backslash or t.
Evidence:
$ echo "a\tb\n" | awk '{ s=$0; for (i=1;i<=length(s);i++) {printf("%d\t%c\n",i,substr(s,i,1));} }'
1 a
2 \
3 t
4 b
5 \
6 n
vs
$ awk 'BEGIN{s="a\tb\n"; for (i=1;i<=length(s);i++) {printf("%d\t%c\n",i,substr(s,i,1));} }'
1 a
2
3 b
4
And there you go.
As I say, Ed's answer provides an excellent function for what you need. But if you can predict what your input will look like, you can probably get away with a simpler solution. Knowing how this stuff gets parsed, if you have a limited set of characters you need to translate, you may be able to survive with something simple like:
s=$0;
gsub(/\\t/,"\t",s);
gsub(/\\n/,"\n",s);

That's a cool question, I don't know the answer in awk, but in perl you can use eval :
echo '%10s\t:\t%-10s\n' | perl -ne ' chomp; eval "printf (\"$_\", \"hi\", \"hello\")"'
hi : hello
PS. Be aware of code injection danger when you use eval in any language, no just eval any system call can't be done blindly.
Example in Awk:
echo '$(whoami)' | awk '{"printf \"" $0 "\" " "b" | getline s; print s}'
tiago
What if the input was $(rm -rf /)? You can guess what would happen :)
ikegami adds:
Why would even think of using eval to convert \n to newlines and \t to tabs?
echo '%10s\t:\t%-10s\n' | perl -e'
my %repl = (
n => "\n",
t => "\t",
);
while (<>) {
chomp;
s{\\(?:(\w)|(\W))}{
if (defined($2)) {
$2
}
elsif (exists($repl{$1})) {
$repl{$1}
}
else {
warn("Unrecognized escape \\$1.\n");
$1
}
}eg;
printf($_, "hi", "hello");
}
'
Short version:
echo '%10s\t:\t%-10s\n' | perl -nle'
s/\\(?:(n)|(t)|(.))/$1?"\n":$2?"\t":$3/seg;
printf($_, "hi", "hello");
'

How can I use the symbols [ and ] as field seperators for gawk?

emphasized textI have some text like
CreateMainPageLink("410",$objUserData,$mnt[139]);
from which i want to extract the number 139 after the occurrence of mnt with gawk. I tried the following expression (within a pipe expression to be used on a result of a grep)
gawk '{FS="[\[\]]";print NF}'
to print the number of fields. If my field separators were [ and ] I expect to see the number 3 printed out (three fields; one before the opening rectangular bracket, one after, and the actual number I want to extract). What I get instead is one field, corresponding to the full line, and two warnings:
gawk: warning: escape sequence `\[' treated as plain `['
gawk: warning: escape sequence `\]' treated as plain `]'
I was following the example given here, but obviously there is some problem/error with my expression.
Using the following two expressions also do not work:
gawk '{FS="[]"}{print NF;}'
gawk: (FILENAME=- FNR=1) fatal: Unmatched [ or [^: /[]/
and
gawk '{FS="\[\]"}{print NF;}'
gawk: warning: escape sequence `\[' treated as plain `['
gawk: warning: escape sequence `\]' treated as plain `]'
gawk: (FILENAME=- FNR=1) fatal: Unmatched [ or [^: /[]/

gawk -F[][] '{ print $0" -> "$1"\t"$2; }'
$ gawk -F[][] '{ print $0" -> "$1"\t"$2; }'
titi[toto]tutu
titi[toto]tutu -> titi toto
1) You must set the FS before entering the main parsing loop. You could do:
awk 'BEGIN { FS="[\\[\\]]"; } { print $0" -> "$1"\t"$2; }'
Which executes the BEGIN clause before parsing the file.
I have to escape the [character twice: one because it is inside a quoted string. And another once because gawk mandate it inside a bracket expression.
I personnaly prefer to use the -F flag which is less verbose.
2) FS="[\[\]]" is wrong, because you are inside a quoted string, this escape the character inside the string. Awk will see: [[]] which is an invalid bracket expression.
3) FS="[]" is wrong because it is an empty bracket expression trying to match nothing
4) FS="\[\]" is wrong again because it is error 2) and 3) together :)
gawk manual says: The regular expressions in awk are a superset of the POSIX specification. This is why you can use either: [\\[\\]] or [][]. The later being the posix way.
To include a literal ']' in the list, make it the first character
See:
Posix Regex specification:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04
Posix awk specification:
http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
Gnu Awk manual:
http://www.gnu.org/software/gawk/manual/gawk.html#Bracket-Expressions

FS="[]" Here it looks for data inside the [] and there are none.
To use square brackets you need to write them like this [][]
This is also wrong gawk '{FS="[\[\]]";print NF}' you need FS as a variable outside expression.
Eks
echo 'CreateMainPageLink("410",$objUserData,$mnt[139]);' | awk -F[][] '{print $2}'
139
Or
awk '{print $2}' FS=[][]
Or
awk 'BEGIN {FS="[][]"} {print $2}'
All gives 139
Edit: gawk '{FS="[\[\]]";print NF}' Here you print number of fields NF and not value of it $NF. Anyway it will not help, since dividing your data with [] gives ); as last filed, use this awk '{print $(NF-1)}' FS=[][] to get second last filed.

Do you need awk? You can get the value via sed like this:
# echo 'CreateMainPageLink("410",$objUserData,$mnt[139]);' | sed -n 's:.*\[\([0-9]\+\)\].*:\1:p'
139

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What is the function of a comma between arguments in awk? - awk

Related

Awk: gsub("\\\\", "\\\\") yields suprising results

Understanding awk delimiter - escaping in a regex-based field separator

How to use variable including special symbol in awk?

In awk, how can I use a file containing multiple format strings with printf?

How can I use the symbols [ and ] as field seperators for gawk?

Categories

Resources