Awk print string with variables - variables

How do I print a string with variables?
Trying this
awk -F ',' '{printf /p/${3}_abc/xyz/${5}_abc_def/}' file
Need this at output
/p/APPLE_abc/xyz/MANGO_abc_def/
where ${3} = APPLE
and ${5} = MANGO

printf allows interpolation of variables. With this as the test file:
$ cat file
a,b,APPLE,d,MANGO,f
We can use printf to achieve the output you want as follows:
$ awk -F, '{printf "/p/%s_abc/xyz/%s_abc_def/\n",$3,$5;}' file
/p/APPLE_abc/xyz/MANGO_abc_def/
In printf, the string %s means insert-a-variable-here-as-a-string. We have two occurrences of %s, one for $3 and one for $5.

Not as readable, but the printf isn't necessary here. Awk can insert the variables directly into the strings if you quote the string portion.
$ cat file.txt
1,2,APPLE,4,MANGO,6,7,8
$ awk -F, '{print "/p/" $3 "_abc/xyz/" $5 "_abc_def/"}' file.txt
/p/APPLE_abc/xyz/MANGO_abc_def/

Related

How to extract word from a string that may/may not start with a single quote

Sample string:
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
cut -d' ' -f 1 string.txt gives me
'kernel-rt|kernel-alt|/kernel-'
But how do we proceed further to get just the 'kernel' from it?
Assuming you want only the 3rd kernel (in bold) and not the others
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
Here is how you extract it using single command awk (standard Linux gawk).
input="kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils"
echo $input|awk -F"|" '{split($3,a,"-");match(a[1],"[[:alnum:]]+",b);print b[0]}'
explanation
-F"|" specify field separator is | so that only is 3rd field required
split($3,a,"-") split 3rd field by -, left part assigned to a[1]
match(a[1],"[[:alnum:]]+",b) from a[1] extract sequence of alphanumeric string into b[0]
print b[0] output the matched string.
If you want to extract kernel from 2nd or 1st fields. Change $3 to $2 or $1.
$ cat file
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
$
$ awk '{print $1}' file
'kernel-rt|kernel-alt|/kernel-'
$
$ awk '{gsub(/\047/,"",$1); print $1}' file
kernel-rt|kernel-alt|/kernel-
$
$ awk '{gsub(/\047/,""); split($1,f,/[|]/); print f[1]}' file
kernel-rt
and just to make you think...
$ awk '{gsub(/\047|\.*/,"")}1' file
kernel-rt

How to use awk to find the line starting with a variable

I know 2 things about awk:
1.
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ var {print $0}' file.txt # will print the line where 3rd field includes the variable $PAT
2.
awk '$3 ~ /^aGeneName/' file.txt # will print the line where 3rd field starts with string "aGeneName"
But what I want is the combination of these two: I want to print the line where the 3rd field starts with the variable $PAT, something like
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ /^var/ {print $0}' file.txt # but this is wrong, since variable can't be put into //
One way is like this:
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ "^" var {print $0}' file.txt
And the {print $0} can be saved here, it's implied.
Another way, when the pattern var is a simple string, no RegEX character inside:
PAT='aGeneName'
awk -v var="$PAT" 'index($3, var)==1' file.txt

Exact string match in awk

I have a file test.txt with the next lines
1997 100 500 2010TJ
2010TJXML 16 20 59
I'm using the next awk line to get information only about string 2010TJ
awk -v var="2010TJ" '$0 ~ var {print $0}' test.txt
But the code print the two lines. I want to know how to get the line containing the exact string
1997 100 500 2010TJ
the string can be placed in any column of the file.
Several options:
Use a gawk word boundary (not POSIX awk...):
$ gawk '/\<2010TJ\>/' file
An actual space or tab or what is separating the columns:
$ awk '/^2010TJ /' file
Or compare the field directly to the string:
$ awk '$1=="2010TJ"' file
You can loop over the fields to test each field if you wish:
$ awk '{for (i=1;i<=NF;i++) if ($i=="2010TJ") {print; next}}' file
Or, given your example of setting a variable, those same using a variable:
$ gawk -v s=2010TJ '$0~"\\<" s "\\>"'
$ awk -v s=2010TJ '$0~"^" s " "'
$ awk -v s=2010TJ '$1==s'
Note the first is a little different than the second and third. The first is the standalone string 2010TJ anywhere in $0; the second and third is a string that starts with that string.
Try this (for testing only column 1) :
awk '$1 == "2010TJ" {print $0}' test.txt
or grep like (all columns) :
gawk '/\<2010TJ\>/ {print $0}' test.txt
Note
\< \> is word boundarys
another awk with word boundary
awk '/\y2010TJ\y/' file
note \y matches either beginning or end of a word.

Get only part of a file name in Awk

I have tried
awk '{print FILENAME}'
And the result was full path of the file.
I want to get only the file name, example: from "test/testing.test.txt" I just want to get "testing" without ".test.txt".
Use -F to delimit by the period and print the first string before that delimiter:
awk -F'.' '{ print $1 }'
Alternatively,
ls -l | awk '{ print $9 }' | awk -F"." '{ print $1 }'
will run through the whole folderđź‘Ť
(there's a fancier way to do it, but that's easy).
Use the sub and/or split functions to extract the part of FILENAME you want.

Using variables in printf format

Suppose I have a file like this:
$ cat a
hello this is a sentence
and this is another one
And I want to print the first two columns with some padding in between them. As this padding may change, I can for example use 7:
$ awk '{printf "%7-s%s\n", $1, $2}' a
hello this
and this
Or 17:
$ awk '{printf "%17-s%s\n", $1, $2}' a
hello this
and this
Or 25, or... you see the point: the number may vary.
Then a question popped: is it possible to assign a variable to this N, instead of hardcoding the integer in the %N-s format?
I tried these things without success:
$ awk '{n=7; printf "%{n}-s%s\n", $1, $2}' a
%{n}-shello
%{n}-sand
$ awk '{n=7; printf "%n-s%s\n", $1, $2}' a
%n-shello
%n-sand
Ideally I would like to know if it is possible to do this. If it is not, what would be the best workaround?
If you use * in your format string, it gets a number from the arguments
awk '{printf "%*-s%s\n", 17, $1, $2}' file
hello this
and this
awk '{printf "%*-s%s\n", 7, $1, $2}' file
hello this
and this
As read in The GNU Awk User’s Guide #5.5.3 Modifiers for printf Formats:
The C library printf’s dynamic width and prec capability (for example,
"%*.*s") is supported. Instead of supplying explicit width and/or prec
values in the format string, they are passed in the argument list. For
example:
w = 5
p = 3
s = "abcdefg"
printf "%*.*s\n", w, p, s
is exactly equivalent to:
s = "abcdefg"
printf "%5.3s\n", s
does this count?
idea is building the "dynamic" fmt, used for printf.
kent$ awk '{n=7;fmt="%"n"-s%s\n"; printf fmt, $1, $2}' f
hello this
and this
Using simple string concatenation.
Here "%", n and "-s%s\n" concatenates as a single string for the format. Based on the example below, the format string produced is %7-s%s\n.
awk -v n=7 '{ printf "%" n "-s%s\n", $1, $2}' file
awk '{ n = 7; printf "%" n "-s%s\n", $1, $2}' file
Output:
hello this
and this
you can use eval (maybe not the most beautiful with all the escape characters, but it works)
i=15
eval "awk '{printf \"%$i-s%s\\n\", \$1, \$2}' a"
output:
hello this
and this