awk doesn't assign variable and uses $0 instead - awk

I'm using the following /bin/sh code to parse the output of apt show and print only the names of the packages matching the second pattern, but it doesn't work. Instead it outputs the pattern itself in both prints as if the variable pac was never assigned and instead uses $0 in all cases.
apt show vim peazip 2> /dev/null | \
awk '
/^Package:/ {
pac = substr($0, 10);
print "found name "$pac;
}
/APT-Sources: \/var\/lib\/dpkg\/status/ {
print "bingo "$pac;
}
'
Output: (gawk on Ubuntu)
found name Package: vim
found name Package: peazip:i386
bingo APT-Sources: /var/lib/dpkg/status
What am I doing wrong?

awk is C-like in that you don't use $ to get the value of a variable:
$ awk 'BEGIN { x=42; print x }'
42
I think of $ in awk as an operator that fetches the value of the field number identified by the expression after $. For example, the 2nd field is $2, the last field is $NF where NF is a variable whose value is the number of fields in the current record.
Now, why does $pac act like $0?
awk, in a numeric context, treats an arbitrary string like this: take the string, truncate it at the first non-digit character; if the truncation results in an empty string, numerically treat the string as zero.
$ echo "foo bar" | awk '{x="2cats"; print $x}'
bar
The value of pac does not start with digits, so numerically is has value zero, then you apply the $ "operator" to get $0, or the whole string
$ echo "foo bar" | awk '{x="no-cats"; print $x}'
foo bar

Awk does not require $ in front of variable names :) your code should look like this instead:
apt show vim peazip 2> /dev/null | \
awk '
/^Package:/ {
pac = substr($0, 10);
print "found name "pac;
}
/APT-Sources: \/var\/lib\/dpkg\/status/ {
print "bingo "pac;
}
'

Related

Combining awk commands

I currently have this piece of code:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" '$4 ~ re' $fruit_file
Which uses awk to find matches in $4 that match with the pattern provided by the user under the $user_fruit variable in the $fruit_file. However, I need to alter the awk command so that it only displays line matches when the word apple is also on the line.
Any help would be greatly appreciated!
You can extend the awk pattern using boolean operators:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" '/apple/ && $4 ~ re' "$fruit_file"
I.e. print records when the record matches /apple/ and the fourth field matches the regex.
In case you want to check for the presence of literal, fixed strings, you can ise index instead of the regex search:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" 'index($0, "apple") && index($4, re)' file
Here,
index($0, "apple") - checks if there is apple substring on the whole line (if its index is not 0)
&& - AND condition
index($4, re) - checks if there is apple substring in Field 4 (if its index is not 0).
See an online demo:
s='one:two:three:2-plum+pear
apple:two:three:1-plum+pear'
user_fruit='plum+pear'
awk -F ":" -v re="$user_fruit" 'index($0, "apple") && index($4, re)' <<< "$s"
#index($3, "snow") != 0
# => apple:two:three:1-plum+pear

Issue with using awk to extract words after/before a specific word

I have a file which has several sections with a header like this
$ head -n 5 test.txt
[44610] gmx#127.0.0.1
f1(cu_atomdata, NBParamGpu, Nbnxm::gpu_plist, bool), Block Size 64, Grid Size 3599, Device 0, 99 invocations
Section: Command line profiler metrics
Metric Name Metric Unit Minimum Maximum Average
-------------------------------------------------------------------------------------------- ----------- ------------ ------------ ------------
I would like to use the following awk command to get the number after Grid Size and the number before invocations. However, the following command returns nothing.
$ awk '{for (I=1;I<NF;I++) if ($I == "Grid Size") print $(I+1)}' test.txt
$
$ awk '{for (I=1;I<NF;I++) if ($I == "invocations") print $(I-1)}' test.txt
$
Any idea to fix that?
You may use this awk that loops through each field and extract your numbers based on field values:
awk '{
for (i=3;i<NF;i++)
if ($(i-2) == "Grid" && $(i-1) == "Size")
print "gridSize:", $i+0
else if ($(i+1) == "invocations")
print "invocations:", $i+0
}' file
gridSize: 3599
invocations: 99
Alternatively, you may try this gnu grep with PCRE regex:
grep -oP 'Grid Size\h+\K\d+|\d+(?=\h+invocations)' file
3599
99
\K - match reset
(?=...) - Lookahead assertion
With GNU awk latest versions try putting array within match itself:
awk '
match($0,/Grid Size [0-9]+/, arr1){
print arr1[3]
match($0,/[0-9]+ invocations/, arr2)
print arr2[1]
}
' Input_file
With your shown samples could you please try following(when I tried above it didn't work with 4.1 awk version so adding this one as an alternative here).
awk '
match($0,/Grid Size [0-9]+/){
num=split(substr($0,RSTART,RLENGTH),arr1," ")
print arr1[num]
match($0,/[0-9]+ invocations/)
split(substr($0,RSTART,RLENGTH),arr2," ")
print arr2[1]
}
' Input_file
make it even simpler :
{mawk/mawk2/gawk} 'BEGIN {
FS = "(^.+Grid Size[ ]+|" \ # before
"[,][^,]+[,][ ]+|" \ # in-between
"[ ]+invocations.*$)"; # after
} NF == 4 { print "grid size \043 : " $2 ", invocations \043 : " $3 }'
This regex gobbles everything before, in between, and after it. Because the regex touches the 2 walls at the ends, fields $1 and $4 will also be created, but as empty ones, hence the NF==4 check.
The octal code \043 is the hash symbol # - just my own personal preference of not having comment delimiter inside my strings is its original form.
a gawk approach with gensub:
$ gawk '/Grid Size/{
s=gensub(/.*Grid\sSize\s([[:digit:]]+).*,\s([[:digit:]]+) invocations/, "gridSize: \\1\ninvocations: \\2","G"); print s
}' myFile
gridSize: 3599
invocations: 99

Replace a letter with another from the last word from the last two lines of a text file

How could I possibly replace a character with another, selecting the last word from the last two lines of a text file in shell, using only a single command? In my case, replacing every occurrence of a with E from the last word only.
Like, from a text file containing this:
tree;apple;another
mango.banana.half
monkey.shelf.karma
to this:
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
I tried using sed -n 'tail -2 'mytext.txt' -r 's/[a]+/E/*$//' but it doesn't work (my error: sed expression #1, char 10: unknown option to 's).
Could you please try following, tac + awk solution. Completely based on OP's samples only.
tac Input_file |
awk 'FNR<=2{if(/;/){FS=OFS=";"};if(/\./){FS=OFS="."};gsub(/a/,"E",$NF)} 1' |
tac
Output with shown samples is:
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
NOTE: Change gsub to sub in case you want to substitute only very first occurrence of character a in last field.
This might work for you (GNU sed):
sed -E 'N;${:a;s/a([^a.]*)$/E\1/mg;ta};P;D' file
Open a two line window throughout the length of the file by using the N to append the next line to the previous and the P and D commands to print then delete the first of these. Thus at the end of the file, signified by the $ address the last two lines will be present in the pattern space.
Using the m multiline flag on the substitution command, as well as the g global flag and a loop between :a and ta, replace any a in the last word (delimited by .) by an E.
Thus the first pass of the substitution command will replace the a in half and the last a in karma. The next pass will match nothing in the penultimate line and replace the a in karmE. The third pass will match nothing and thus the ta command will fail and the last two lines will printed with the required changes.
If you want to use Sed, here's a solution:
tac input_file | sed -E '1,2{h;s/.*[^a-zA-Z]([a-zA-Z]+)/\1/;s/a/E/;x;s/(.*[^a-zA-Z]).*/\1/;G;s/\n//}' | tac
One tiny detail. In your question you say you want to replace a letter, but then you transform karma in kErme, so what is this? If you meant to write kErma, then the command above will work; if you meant to write kErmE, then you have to change it just a bit: the s/a/E/ should become s/a/E/g.
With tac+perl
$ tac ip.txt | perl -pe 's/\w+\W*$/$&=~tr|a|E|r/e if $.<=2' | tac
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
\w+\W*$ match last word in the line, \W* allows any possible trailing non-word characters to be matched as well. Change \w and \W accordingly if numbers and underscores shouldn't be considered as word characters - for ex: [a-zA-Z]+[^a-zA-Z]*$
$&=~tr|a|E|r change all a to E only for the matched portion
e flag to enable use of Perl code in replacement section instead of string
To do it in one command, you can slurp the entire input as single string (assuming this'll fit available memory):
perl -0777 -pe 's/\w+\W*$(?=(\n.*)?\n\z)/$&=~tr|a|E|r/gme'
Using GNU awk forsplit() 4th arg since in the comments of another solution the field delimiter is every sequence of alphanumeric and numeric characters:
$ gawk '
BEGIN {
pc=2 # previous counter, ie how many are affected
}
{
for(i=pc;i>=1;i--) # buffer to p hash, a FIFO
if(i==pc && (i in p)) # when full, output
print p[i]
else if(i in p) # and keep filling
p[i+1]=p[i] # above could be done using mod also
p[1]=$0
}
END {
for(i=pc;i>=1;i--) {
n=split(p[i],t,/[^a-zA-Z0-9\r]+/,seps) # split on non alnum
gsub(/a/,"E",t[n]) # replace
for(j=1;j<=n;j++) {
p[i]=(j==1?"":p[i] seps[j-1]) t[j] # pack it up
}
print p[i] # output
}
}' file
Output:
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
Would this help you ? on GNU awk
$ cat file
tree;apple;another
mango.banana.half
monkey.shelf.karma
$ tac file | awk 'NR<=2{s=gensub(/(.*)([.;])(.*)$/,"\\3",1);gsub(/a/,"E",s); print gensub(/(.*)([.;])(.*)$/,"\\1\\2",1) s;next}1' | tac
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
Better Readable version :
$ tac file | awk 'NR<=2{
s=gensub(/(.*)([.;])(.*)$/,"\\3",1);
gsub(/a/,"E",s);
print gensub(/(.*)([.;])(.*)$/,"\\1\\2",1) s;
next
}1' | tac
With GNU awk you can set FS with the two separators, then gsub for the replacement in $3, the third field, if NR>1
awk -v FS=";|[.]" 'NR>1 {gsub("a", "E",$3)}1' OFS="." file
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
With GNU awk for the 3rd arg to match() and gensub():
$ awk -v n=2 '
NR>n { print p[NR%n] }
{ p[NR%n] = $0 }
END {
for (i=0; i<n; i++) {
match(p[i],/(.*[^[:alnum:]])(.*)/,a)
print a[1] gensub(/a/,"E","g",a[2])
}
}
' file
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
or with any awk:
awk -v n=2 '
NR>n { print p[NR%n] }
{ p[NR%n] = $0 }
END {
for (i=0; i<n; i++) {
match(p[i],/.*[^[:alnum:]]/)
lastWord = substr(p[i],1+RLENGTH)
gsub(/a/,"E",lastWord )
print substr(p[i],1,RLENGTH) lastWord
}
}
' file
If you want to do it for the last 50 lines of a file instead of the last 2 lines just change -v n=2 to -v n=50.
The above assumes there are at least n lines in your input.
You can let sed repeat changing an a into E only for the last word with a label.
tac mytext.txt| sed -r ':a; 1,2s/a(\w*)$/E\1/; ta' | tac

How can I use awk to insert something in the middle of the word?

I have an input:
This is a test
And I want to insert some letters in the middle of the word, like:
This is a teSOMETHINGst
I know I can define the needed word by $i, but how can I modify the word that way?
I'm trying to do it like that:
{
i=4 # finding somehow
print (substr($i,1,(length($i)/2)) "SOMETHING" substr($i,(length($i)/2),(length($i)/2)))
}
As I'm new to awk I wonder if it is a right way.
This may be what you're looking for:
$ awk 'match($0,/\<test\>/){mid=int(RLENGTH/2); $0=substr($0,RSTART,mid) "SOMETHING" substr($0,RSTART+mid,RELNGTH-mid)} 1'
e.g. some test cases (no pun intended):
$ echo 'This is a test' |
awk 'match($0,/\<test\>/){mid=int(RLENGTH/2); $0=substr($0,RSTART,mid) "SOMETHING" substr($0,RSTART+mid,RLENGTH-mid)} 1'
teSOMETHINGst
$ echo 'These are tests' |
awk 'match($0,/\<tests\>/){mid=int(RLENGTH/2); $0=substr($0,RSTART,mid) "SOMETHING" substr($0,RSTART+mid,RLENGTH-mid)} 1'
teSOMETHINGsts
$ echo 'These contestants are in a test' |
awk 'match($0,/\<test\>/){mid=int(RLENGTH/2); $0=substr($0,RSTART,mid) "SOMETHING" substr($0,RSTART+mid,RLENGTH-mid)} 1'
teSOMETHINGst
Assuming your requirement is to match the column number containing test and do some operations over it, do a simple loop over the columns upto NF and match using the regex match operator ~ or for fixed strings do a equality match as $i == "test"
awk '
{
for(i=1;i<=NF;i++) {
if ($i ~ "test") {
halfLength=(length($i)/2)
$i=(substr($i,1,halfLength) "SOMETHING" substr($i,(halfLength+1),halfLength))
}
}
}1' <<<"This is a test"
This produces the output as expected. Note that I've made the substr() call for printing the 2nd part of the string as substr($i,(halfLength+1),halfLength). The +1 is needed which you have missed before. I've used the substr() result to be modify column number containing test i.e. as $i=..
Also when doing {..}1, each of the column fields are reconstructed based on the modifications if any, in our case only to the column containing the string you wanted.
Also note that the whole attempt will fail if the target string contains an odd number of characters or forms a sub string of another larger string ( could use the equality operator but regex approach would fail )
Another another one that grew from curiosity to personal vendetta (:
$ echo This is a contestant test |
awk -v s="test" '
BEGIN {
FS=OFS=""
}
{
if(i=match($0, "(^| )" s "( |$)")) { # match over index since regex support
j=(i+length(s)/2+!!(i-1)) # !!(i-1) detect beginning of record
$j="SOMETHING" $j
}
}1'
This is a contestant teSOMETHINGst
Another one using empty separators, mostly to satisfy personal curiosity:
$ echo This is a test |
awk -v s="test" '
BEGIN {
FS=OFS="" # empty separators
}
{
if(i=index($0,s)) { # index finds the beginning of test
j=(i+length(s)/2) # midpoint
$j="SOMETHING" $j # insert string
}
}1' # output
This is a teSOMETHINGst

How to use variable including special symbol in awk?

For my case, if a certain pattern is found as the second field of one line in a file, then I need print the first two fields. And it should be able to handle case with special symbol like backslash.
My solution is first using sed to replace \ with \\, then pass the new variable to awk, then awk will parse \\ as \ then match the field 2.
escaped_str=$( echo "$pattern" | sed 's/\\/\\\\/g')
input | awk -v awk_escaped_str="$escaped_str" '$2==awk_escaped_str { $0=$1 " " $2 " "}; { print } '
While this seems too complicated, and cannot handle various case.
Is there a better way which is more simpler and could cover all other special symbol?
The way to pass a shell variable to awk without backslashes being interpreted is to pass it in the arg list instead of populating an awk variable outside of the script:
$ shellvar='a\tb'
$ awk -v awkvar="$shellvar" 'BEGIN{ printf "<%s>\n",awkvar }'
<a b>
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]=""; printf "<%s>\n",awkvar }' "$shellvar"
<a\tb>
and then you can search a file for it as a string using index() or ==:
$ cat file
a b
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } index($0,awkvar)' "$shellvar" file
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } $0 == awkvar' "$shellvar" file
a\tb
You need to set ARGV[1]="" after populating the awk variable to avoid the shell variable value also being treated as a file name. Unlike any other way of passing in a variable, ALL characters used in a variable this way are treated literally with no "special" meaning.
There are three variations you can try without needing to escape your pattern:
This one tests literal strings. No regex instance is interpreted:
$2 == expr
This one tests if a literal string is a subset:
index($2, expr)
This one tests regex pattern:
$2 ~ pattern