How to pass the argument into printf in awk?

How to pass the argument into printf in awk? - awk

format.awk is simple.
#! /usr/bin/awk
{
printf("%52s\\\n" ,$0);
}
The command is going to execute.
awk -f format.awk test
Now i want to make the number 52 in printf("%52s\\n" ,$0); as a varible.
awk -f format.awk -v n=52
If i want to pass argument 52 into format.awk,how to write the statement
printf("%ns\\\n" ,$0);
awk -f format.awk -v n=52 test
awk: run time error: improper conversion(number 1) in printf("%-ns\
")
FILENAME="test" FNR=2 NR=2

Yes you can, see the examples below:
kent$ awk 'BEGIN{a=3;printf "%.*f\n",a,10/3}'
3.333
kent$ awk 'BEGIN{a=7;printf "%.*f\n",a,10/3}'
3.3333333
The syntax is same as printf() of C, with *
Edit (add doc quote)
The C library printf’s dynamic width and prec capability (e.g.,
"%*.*s") is supported. Instead of supplying explicit width and/or prec
values in the format string, they are passed in the argument list.
Quote from gnu awk manual:
https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html

In awk, string interpolation does not work like that. You need to say
printf("%" n "s", $0)

Related

Awk: gsub("\\\\", "\\\\") yields suprising results

Consider the following input:
$ cat a
d:\
$ cat a.awk
{ sub("\\", "\\\\"); print $0 }
$ cat a_double.awk
{ sub("\\\\", "\\\\"); print $0 }
Now running cat a | awk -f a.awk gives
d:\
and running cat a | awk -f a_double.awk gives
d:\\
and I expect exactly the other way around. How should I interpret this?
$ awk -V
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)

Yes, its expected behavior of awk. When you run sub("\\", "\\\\") in your first script, in sub's inside "(double quotes) since we are NOT using / for matching pattern we need to escape first \(actual literal character) then for escaping we are using \ so we need to escape that also, hence it will become \\\\
\\ \\
| |
| |
first 2 chars are denoting escaping next 2 chars are denoting actual literal character \
Which is NOT happening your 1st case hence NO match so no substitution in it, in your 2nd awk script you are doing this(escaping part in regex matching section of sub) hence its matching \ perfectly.
Let's see this by example and try putting ... for checking purposes.
When Nothing happens: Since no match on
awk '{sub("\\", "....\\\\"); print $0}' Input_file
d:\
When pattern matching happens:
awk '{sub("\\\\", "...\\\\"); print $0}' Input_file
d:...\\
From man awk:
gsub(r, s [, t])
For each substring matching the regular expression r in the string t,
substitute the string s, and return the number of substitutions.
How could we could do perform actual escaping part(where we need to use only \ before character only once)? Do mention your regexp in /../ in first section of sub like and we need NOT to double escape \ here.
awk '{sub(/\\/,"&\\")} 1' Input_file

The first arg to *sub() is a regexp, not a string, so you should use regexp (/.../) rather than string ("...") delimiters. The former is a literal regexp which is used as-is while the latter defines a dynamic (or computed) regexp which forces awk to interpret the string twice, the first time to convert the string to a regexp and the second to use it as a regexp, hence double the backslashes needed for escaping. See https://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps.
In the following we just need to escape the backslash once because we're using a literal, rather than dynamic, regexp:
$ cat a
d:\
$ awk '{sub(/\\/,"\\\\")}1' a
d:\\
Your first script would rightfully produce a syntax error in a more recent version of gawk (5.1.0) since "\\" in a dynamic regexp is equivalent to /\/ in a literal one and in that expression the backslash is escaping the final forward slash which means there is no final delimiter:
$ cat a.awk
{ sub("\\", "\\\\"); print $0 }
$ awk -f a.awk a
awk: a.awk:1: (FILENAME=a FNR=1) fatal: invalid regexp: Trailing backslash: /\/

Recognising backslash in awk field separator

Input is
AZE D11/879\x0Dabc\x0D\x0A\x1E!DEF F11/999
awk script sets field separator to "\x0D" (I have tried with and without escaping the backslash.
awk script is
BEGIN {FS="\\x0D"}
{print NF}
It should output 3 because there are 2 occurrences of the field separator but it outputs 1 which indicates it is not being recognized.

There are 2 ways to provide a regexp in awk - a static regexp (aka regexp literal) written as /regexp/ and a dynamic regexp (aka computed regexp) written as "regexp" and used in a regexp context. A field separator is just a regexp with some additional behavior so lets just consider regexps in general to explain what's going on in your example.
The split() function takes a field separator (a regexp for our purposes) as it's third argument so it provides a good test bed:
Using a static regexp:
$ awk '{print split($0,a,/\x0D/)}' file
1
The \ above is escaping the x, it's not a literal \. For that you need to escape the \ itself:
$ awk '{print split($0,a,/\\x0D/)}' file
3
What if we used a dynamic regexp instead of the above static regexp?
$ awk '{print split($0,a,"\x0D")}' file
1
$ awk '{print split($0,a,"\\x0D")}' file
1
$ awk '{print split($0,a,"\\\x0D")}' file
' is not a known regexp operator FNR=1) warning: regexp escape sequence `\
1
$ awk '{print split($0,a,"\\\\x0D")}' file
3
The behavior above is because awk first parses the string to convert it into a regexp (using up one layer of escape chars) and then parses it a second time when using it as a regexp (using up a second layer of escape chars).
Unfortunately when you specify a FS there is no option to specify it as a literal regexp, it's always specified using a string and thus is a dynamic regexp and so needs an extra layer of escaping:
$ awk -v FS='\x0D' '{print NF}' file
1
$ awk -v FS='\\x0D' '{print NF}' file
1
$ awk -v FS='\\\x0D' '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS='\\\\x0D' '{print NF}' file
3
Now - what if you were using the wrong type of quotes in the shell part of the script, i.e. " instead of '? Then you introduce even more pain because now you're inviting the shell to also parse the string even before awk gets to see and parse it twice:
$ awk -v FS="\\\\x0D" '{print NF}' file
1
$ awk -v FS="\\\\\x0D" '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS="\\\\\\x0D" '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS="\\\\\\\x0D" '{print NF}' file
3
That's different from the case where the double quotes are using inside awk because that's all wrapped inside single quotes and so protected from the shell already:
$ awk 'BEGIN{FS="\\\\x0D"} {print NF}' file
3
So - in the shell always use the most restrictive quotes (' over " over none) unless you have a very specific reason not to, and when using regexps or field separators always use literal /.../ rather than dynamic "...", again unless you have a very specific reason not to.
The odd, truncated looking error message above are because of the \rs the tool is trying to print due to the escape sequence we're providing, they're really all warning: regexp escape sequence '\^M' is not a known regexp operator

You need two backslashes for a literal backslash since \ is an escape character:
$ echo 'AZE D11/879\x0Dabc\x0D\x0A\x1E!DEF F11/999' |
awk 'BEGIN{ FS="\\\\x0D" } { print NF }'
3

Join lines into one line using awk

I have a file with the following records
ABC
BCD
CDE
EFG
I would like to convert this into
'ABC','BCD','CDE','EFG'
I attempted to attack this problem using Awk in the following way:
awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}'
but I obtain not what I expected:
ABC,BCD,CDE,EFG
Are there any suggestions on how we can achieve this?

Could you please try following.
awk -v s1="'" 'BEGIN{OFS=","} {val=val?val OFS s1 $0 s1:s1 $0 s1} END{print val}' Input_file
Output will be as follows.
'ABC','BCD','CDE','EFG'

With GNU awk for multi-char RS:
$ awk -v RS='\n$' -F'\n' -v OFS="','" -v q="'" '{$1=$1; print q $0 q}' file
'ABC','BCD','CDE','EFG'

There are many ways of achieving this:
with pipes:
sed "s/.*/'&'/" <file> | paste -sd,
awk '{print '"'"'$0'"'"'}' <file> | paste -sd,
remark: we do not make use of tr here as this would lead to an extra , at the end.
reading the full file into memory:
sed ':a;N;$!ba;s/\n/'"','"'/g;s/.*/'"'&'"'/g' <file> #POSIX
sed -z 's/^\|\n$/'"'"'/g;s/\n/'"','"'/g;' <file> #GNU
and the solution of #EdMorton
without reading the full file into memory:
awk '{printf (NR>1?",":"")"\047"$0"\047"}' <file>
and some random other attempts:
awk '(NR-1){s=s","}{s=s"\047"$0"\047"}END{print s}' <file>
awk 'BEGIN{printf s="\047";ORS=s","s}(NR>1){print t}{t=$0}END{ORS=s;print t} <file>
So what is going on with the OP's attempts?
Writing down the OP's awk line, we have
/START/{if (x)print x;x="";next}
{x=(!x)?$0:x","$0;}
END{print x;}
What does this do? Let us analyze step by step:
/START/{if (x)print x;x="";next}:: This reads If the current record/line contains the string START, then do
if (x) print x:: if x is not an empty string, print the value of x
x="" set x to be an empty string
next:: skip to the next record/line
In this code block, the OP probably assumed that /START/ means do this at the beginning of all things. In awk, this is however written as BEGIN and since in the beginning, all variables are empty strings or zero, the if statement is not executed by default. This block could be replaced by:
BEGIN{x=""}
But again, this is not needed and thus one can remove it:
{x=(!x)?$0:x","$0;}:: concatenate the string with the correct delimiter. This is good, especially due to the usage of the ternary operator. Sadly the delimiter is set to , and not ',' which in awk is best written as \047,\047. So the line could read:
{x=(!x)?$0:x"\047,\047"$0;}
This line, can be written shorter if you realize that x could be an empty string. For an empty string, x=$0 is equivalent to x=x $0 and all you want to do is add a separator which all or not could be an empty string. So you can write this as
{x= x ((!x)?"":"\047,\047") $0}
or inverting the logic to get rid of some more characters:
{x=x(x?"\047,\047":"")$0}
one could even write
{x=x(x?"\047,\047":x)$0}
but this is not optimal as it needs to read what is the memory of x again. However, this form can be used to finally optimize it to (per #EdMorton's comment)
{x=(x?x"\047,\047":"")$0}
This is better as it removes an extra concatenation operator.
END{print x}:: Here the OP prints the result. This, however, will miss the final single-quotes at the beginning and end of the string, so they could be added
END{print "\047" x "\047"}
So the corrected version of the OP's code would read:
awk '{x=(x?x"\047,\047":"")$0}END{print "\047" x "\047"}'

awk may be better
awk '{printf fmt,$1}' fmt="'%s'\n" file | paste -sd, -
'ABC','BCD','CDE','EFG'

AWK that reads up to the /

I have the following lines of text :
170311 005201 0433 DE(N) itemhandling itemAddBarCodeData: Barcode(1/1) <0157357069/OK> ##[ti=7672,
170311 005323 0433 DE(N) itemhandling itemAddBarCodeData: Barcode(1/1) </NOREAD> ##[ti=7672,
I have the following script :
grep "itemAddBarCodeData" %myItemHandling% | gawk -F "[<>]+" -v OFS=, "{for(i=1;i<=NF;++i){if($i~/Barcode/){print substr($1,5,2)substr($1,3,2)substr($1,1,2),substr($1,8,6),$(i+1)}}}" > %myOutputPath%%myFilename%
What I need is a script that reads only the /NOREAD and the /OK so the output is like :
11-03-17,00:52:01,NOREAD
11-03-17,00:53:23,OK
any help would be greatly appreciated
Thanks

Complex gawk approach:
awk -F"[ />]" '{patsplit($1, a, /[0-9]{2}/); patsplit($2, b, /[0-9]{2}/);
printf("%s-%s-%s,%s:%s:%s,%s\n",a[3],a[2],a[1],b[1],b[2],b[3],$10)}' inpufile
The output:
11-03-17,00:52:01,OK
11-03-17,00:53:23,NOREAD
-F"[ />]" - "composite" field separator
patsplit(string, array [, fieldpat [, steps ] ])
Divide string into pieces defined by fieldpat and store the pieces in array and
the separator strings in the seps array.

You can use this following script:
script.awk
/\/[A-Z]+>/ { match($1"-"$2,/(..)(..)(..)-(..)(..)(..)/,ts)
dt=mktime( sprintf("20%s %s %s %s %s %s",
ts[1], ts[2], ts[3],
ts[4], ts[5], ts[6]) )
dtd = strftime( "%d-%m-%y", dt )
dts = strftime( "%H:%M:%S", dt )
match ( $0, /\/[A-Z]+>/) # set RSTART and RLENGTH
print dtd, dts, substr( $0, RSTART+1, RLENGTH-2)
}
Run it like this: awk -v OFS=, -f script.awk yourfile
The important part is the second match function call, which matches
a string of capital letters [A_Z]
preceded by a /
followed by a >.
It should match the OK and NOREAD case and not the Barcode(1/1).
The variables
RSTART and
RLENGTH
are set by the match function, we have to correct them by +1 and -2, because the match RE included / and >.
The first match, mktime, strftime and the sprintf function call are another way the format the date and time. The time functions are GNU AWK extensions.

Regular awk version:
awk '
{
d=$1$2
gsub(/../,"& ",d)
split(d,T)
split($8,R,"[/>]")
printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]
}
' file
With script in file:
script.awk:
{
d=$1$2
gsub(/../,"& ",d)
split(d,T)
split($8,R,"[/>]")
printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]
}
awk -f script.awk file
crammed on one line..
awk '{d=$1$2; gsub(/../,"& ",d); split(d,T); split($8,R,"[/>]"); printf "%s-%s-%s,%s:%s:%s,%s\n",T[3],T[2],T[1],T[4],T[5],T[6],R[2]}' file

You don't need grep when you're using awk. With GNU awk for gensub():
$ awk '/itemAddBarCodeData/{print gensub(/(..)(..)(..) (..)(..)(..).*\/([^>]+).*/,"\\3-\\2-\\1,\\4:\\5:\\6,\\7",1)}' file
11-03-17,00:52:01,OK
11-03-17,00:53:23,NOREAD

Here's a pragmatic combination of awk and sed that is conceptually relatively simple:
On Linux and BSD/macOS:
awk -F'[ />]' -v OFS=, '/itemAddBarCodeData/ {print $1, $2, $10}' file |
sed -E 's/^(..)(..)(..),(..)(..)(..)/\3-\2-\1,\4:\5:\6/'
On a Windows system, invoked from cmd.exe, different quoting and line continuation rules apply (assumes the presence of ported GNU utilities):
awk -F"[ />]" -v OFS=, "/itemAddBarCodeData/ {print $1, $2, $10}" file ^
| sed -E "s/^(..)(..)(..),(..)(..)(..)/\3-\2-\1,\4:\5:\6/"
Note how:
"..." strings rather than '...' strings must be used to protect the embedded content from interpretation by the shell
Unlike with "..." on Unix, $ has no special meaning to cmd.exe, so it can be used as-is.
^ as the very last character on a line serves as the explicit line-continuation character, and the line must be broken before the | (whereas on Unix a line ending in | is implicitly continued).
This is only used for readability here; of course, you can place your command on a single line.

reading a variable in awk from the command line after entering the command

I try searching a file by using awk. How can I ask awk to read a variable from the command line as a name to get searched in the file:
this is a regular way I use to search the file and I can ask the user to enter a name to search in the file.txt
awk -f myAwk.awk file.txt
How can I manage it like this :
awk -f myAwk.awk file.txt nameToSearch
How can I use ARGC and ARGV to search the nameToSearch in the file.txt?

What you're probably looking for is
awk [-W option] [-F value] [-v var=value] [--] 'program text' [file ...]
so
awk -v MYVAR=nameToSearch -v OTHERVAR=somethingElse -f myAwk.awk file.txt
Is that it? of course order of switches ( -f, -v ) does not matter. Obvously you then need to include MYVAR ( OTHERVAR ) for a variable identifier inside awk program itself.

To pass a variable to awk, you can use the -v command.
For example:
cat file.txt | awk -v p="stringToSearch" '$0 ~ p'
In this command, tou replace stringToSearch with a pattern (please keep the double quote, they are useful for preserving spaces). The awk command $0 ~ p compares the current line to the given pattern.
Another approach is to build the awk command from the shell:
p="stringToSearch"
awk "/$p/" file.txt
You must use double quotes in the command to force expanding $p.

If it's permitted to change the order of arguments, so that we can do this:
awk -f myAwk.awk nameToSearch file.txt
then you can do:
awk 'NR==1 { nameToSearch = $0; next} { ... rest of myAwk.awk here ...}' nameToSearch file.txt
You can of course add the NR==1 {...} block to the beginning of your myAwk.awk file, then continue using:
awk -f myAwk.awk nameToSearch file.txt
The technique Piotr Wadas describes has the same effect:
awk -v nameToSearch=whatever -f myAwk.awk file.txt
and that's what I'd use myself, rather than passing whatever as an additional argument to the script. Passing whatever as an additional argument is what scripters had to do before the -v facilities were added to awk. If writing -v nameToSearch= is too verbose, then I'd wrap the whole thing up in a shell script, and say:
myShellScript whatever file.txt
But you asked how to do it by passing whatever as an additional argument to the awk script, so that's what I demonstrated.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to pass the argument into printf in awk? - awk

In awk, string interpolation does not work like that. You need to say printf("%" n "s", $0)

Related

Awk: gsub("\\\\", "\\\\") yields suprising results

Recognising backslash in awk field separator

Join lines into one line using awk

AWK that reads up to the /

reading a variable in awk from the command line after entering the command

Categories

Resources