gawk sub() with ampersand and toupper() not working - awk

I'm having trouble using toupper() inside a gawk sub(). I'm using the feature that & substitutes for the matched string.
$ gawk '{sub(/abc/, toupper("&")); print $0; }'
xabcx
xabcx
I expected:
xABCx
Variants with toupper() but without & and with & but without toupper() work:
$ gawk '{sub(/abc/, toupper("def")); print $0; }'
xabcx
xDEFx
$ gawk '{sub(/abc/, "-&-"); print $0; }'
xabcx
x-abc-x
It fails similarly with tolower(). Am I misunderstanding something about how & works?
(Tested with gawk 3.1.x and the latest, 4.1.3).

I think I see what's going on: the toupper function is being evaluated first, before sub constructs the replacement string.
So you get
sub(/abc/, toupper("def")) => sub(/abc/, "DEF")
and the not-so-useful
sub(/abc/, toupper("&")) => sub(/abc/, "&")
To get your desired results, you have to extract the match first, upper-case it, and then perform the substitution:
$ echo foobar | gawk '{sub(/o+/, toupper("&")); print}'
foobar
$ echo foobar | gawk '{
if (match($0, /o+/, m)) {
replacement = toupper(m[0])
sub(/o+/, replacement)
}
print
}'
fOObar
Alternatively, you don't need the sub, you can reconstruct the record thusly:
echo foobar | gawk '{
if (match($0, /o+/, m)) {
$0 = substr($0, 1, RSTART-1) toupper(m[0]) substr($0, RSTART+RLENGTH)
}
print
}'

Related

Multiple AWK Assignment of Variable from other action

I would like to create a single sentence (In order partially answer my question: AWK Assignment and execute operation with variables, using split and concatenation without space).
% awk 'BEGIN { str1 = "foo"; str2 = "bar"; str3 = str1 str2; print str3 }'
foobar
That is very easy. But, before is static!!!
Taking in account:
% echo $(echo "foo")
foo
Now, I would like to "calculate" the value of str1.
% awk 'BEGIN { str1 = $(echo "foo"); str2 = "bar"; str3 = str1 str2; print str3 }'
awk: illegal field $(foo), name "(null)"
source line number 1
Is it possible to do the assignment dynamically (product of other action/command) the value for str1 using AWK?
As #anubhava help me:
I get:
% awk -v str1="$(echo "foo")" 'BEGIN {str2 = "bar"; print str1 str2 }'
foobar
Now, How I can use the first variable as argument for assignment for second variable?
% awk -v str1="$(echo "foo")" -v str2="$(echo str1)bar" 'BEGIN {my operation with str2 }'
But Currently I get:
% awk -v str1="$(echo "foo")" -v str2="$(echo str1)bar" 'BEGIN {print str2 }'
str1bar
Partially:
% str1="$(echo 'foo')"; str2="$(echo ${str1}'bar')";awk -v result="$str2" 'BEGIN{print result}'
foobar
As I mentioned in my comment, do shell stuff in shell and awk stuff in awk. Don't try to hamfist your shell logic into your awk script.
Consider your attempt:
awk -v str1="$(echo "foo")" -v str2="$(echo str1)bar" 'BEGIN {my operation with str2 }'
You want, in shell, to echo "foo" into a variable. Then, in shell, you want to concatenate "bar" with that prior variable. So... do it in shell before calling your awk script:
str1="$(echo 'foo')"
str2="$(echo ${str1}'bar')"
awk -v foobar="$str2" '{BEGIN my operation with str2}'
All that -v flag does is say "Set my internal awk variable to this value" so there is no reason to try to hamfist logic into those flags.
You're overcomplicating things or the example you chose is too simplistic. There is no job for awk here, all can be done simply on command line.
$ str1=foo; str2=bar; echo ${str1}${str2}
foobar

Why doesn't awk and gsub remove only the dot?

This awk command:
awk -F ',' 'BEGIN {line=1} {print line "\n0" gsub(/\./, ",", $2) "0 --> 0" gsub(/\./, ",", $3) "0\n" $10 "\n"; line++}' file
is supposed to convert these lines:
Dialogue: 0,1:51:19.56,1:51:21.13,Default,,0000,0000,0000,,Hello!
into these:
1273
01:51:19.560 --> 01:51:21.130
Hello!
But somehow I'm not able to make gsub behave to replace the . by , and instead get 010 as both gsub results. Can anyone spot the issue?
Thanks
The return value from gsub is not the result from the substitution. It returns the number of substitutions it performed.
You want to gsub first, then print the modified string, which is the third argument you pass to gsub.
awk -F ',' 'BEGIN {line=1}
{ gsub(/\./, ",", $2);
gsub(/\./, ",", $3);
print line "\n0" $2 "0 --> 0" $3 "0\n" $10 "\n";
line++}' file
Another way is to use GNU awk's gensub instead of gsub:
$ awk -F ',' '
{
print NR ORS "0" gensub(/\./, ",","g", $2) "0 --> 0" gensub(/\./, ",","g",$3) "0" ORS $10 ORS
}' file
Output:
1
01:51:19,560 --> 01:51:21,130
Hello!
It's not as readable as the gsub solution by #tripleee but there is a place for it.
Also, I replace the line with builtin NR and \ns with ORS.

Awk column with pattern array

Is it possible to do this but use an actual array of strings where it says "array"
array=(cat
dog
mouse
fish
...)
awk -F "," '{ if ( $5!="array" ) { print $0; } }' file
I would like to use spaces in some of the strings in my array.
I would also like to be able to match partial matches, so "snow" in my array would match "snowman"
It should be case sensitive.
Example csv
s,dog,34
3,cat,4
1,african elephant,gd
A,African Elephant,33
H,snowman,8
8,indian elephant,3k
7,Fish,94
...
Example array
snow
dog
african elephant
Expected output
s,dog,34
H,snowman,8
1,african elephant,gd
Cyrus posted this which works well, but it doesn't allow spaces in the array strings and wont match partial matches.
echo "${array[#]}" | awk 'FNR==NR{len=split($0,a," "); next} {for(i=1;i<=len;i++) {if(a[i]==$2){next}} print}' FS=',' - file
The brief approach using a single regexp for all array contents:
$ array=('snow' 'dog' 'african elephant')
$ printf '%s\n' "${array[#]}" | awk -F, 'NR==FNR{r=r s $0; s="|"; next} $2~r' - example.csv
s,dog,34
1,african elephant,gd
H,snowman,8
Or if you prefer string comparisons:
$ cat tst.sh
#!/bin/env bash
array=('snow' 'dog' 'african elephant')
printf '%s\n' "${array[#]}" |
awk -F',' '
NR==FNR {
array[$0]
next
}
{
for (val in array) {
if ( index($2,val) ) { # or $2 ~ val for a regexp match
print
next
}
}
}
' - example.csv
$ ./tst.sh
s,dog,34
1,african elephant,gd
H,snowman,8
This prints no line from csv file which contains an element from array in column 5:
echo "${array[#]}" | awk 'FNR==NR{len=split($0,a," "); next} {for(i=1;i<=len;i++) {if(a[i]==$5){next}} print}' FS=',' - file

How to supress original line at gawk

why does gawk write the input line first?
ws#i7$ echo "8989889898 jAAA_ALL_filenames.txt" | gawk 'match($0, /([X0-9\\\-]{9,13})/, arr); {print arr[1];}'
my output
8989889898 jAAA_ALL_filenames.txt
8989889898
I do not want that the same first line is printed.
Thanks
Walter
You have a stray semicolon in there.
$ echo "8989889898 jAAA_ALL_filenames.txt" | gawk 'match($0, /([X0-9\\\-]{9,13})/, arr); {print arr[1];}'
8989889898 jAAA_ALL_filenames.txt
8989889898
$ echo "8989889898 jAAA_ALL_filenames.txt" | gawk 'match($0, /([X0-9\\\-]{9,13})/, arr) {print arr[1];}'
8989889898
The semicolon after match($0, /([X0-9\\\-]{9,13})/, arr) means that your script is effectively:
match($0, /([X0-9\\\-]{9,13})/, arr) { print $0 } # default action block inserted
1 {print arr[1];} # default condition inserted
match returns a "true" value so the whole line gets printed.
To fix it, remove the semicolon:
match($0, /([X0-9\\\-]{9,13})/, arr) {print arr[1];}
Now the code only has one condition { action } structure, as you intended, so it does what you want.

How to print out a specific field in AWK?

A very simple question, which a found no answer to. How do I print out a specific field in awk?
awk '/word1/', will print out the whole sentence, when I need just a word1. Or I need a chain of patterns (word1 + word2) to be printed out only from a text.
Well if the pattern is a single word (which you want to print and can't contaion FS (input field separator)) why not:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print MYPATTERN }' INPUTFILE
If your pattern is a regex:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print gensub(".*(" MYPATTERN ").*","\\1","1",$0) }' INPUTFILE
If your pattern must be checked in every single field:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN {
for (i=1;i<=NF;i++) {
if ($i ~ MYPATTERN) { print "Field " i " in " NR " row matches: " MYPATTERN }
}
}' INPUTFILE
Modify any of the above to your taste.
The fields in awk are represented by $1, $2, etc:
$ echo this is a string | awk '{ print $2 }'
is
$0 is the whole line, $1 is the first field, $2 is the next field ( or blank ),
$NF is the last field, $( NF - 1 ) is the 2nd to last field, etc.
EDIT (in response to comment).
You could try:
awk '/crazy/{ print substr( $0, match( $0, "crazy" ), RLENGTH )}'
i know you can do this with awk :
an alternative would be :
sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file
or you can use grep -o
Something like this perhaps:
awk '{split("bla1 bla2 bla3",a," "); print a[1], a[2], a[3]}'