How to translate a column value in the file using awk with tr command in unix - awk

Details:
Input file : file.txt
P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2
Expected output:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
If i try using a variable it is giving proper result.
(i.e) /tmp>echo "P123456789"|tr "0-9" "5-9"|tr "A-Z" "X-Z"
Z678999999
But if i do with awk command it is not giving result instead giving error:
/tmp>$ awk 'BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }' /tmp/file.txt >/tmp/file.txt.tmp
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
Can anyone help please?

just do what you wanted, without changing your logic:
awk line:
awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
with your data:
kent$ echo "P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2"|awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2

$ cat tst.awk
function tr(old,new,str, oldA,newA,strA,i,j) {
split(old,oldA,"")
split(new,newA,"")
split(str,strA,"")
str = ""
for (i=1;i in strA;i++) {
for (j=1;(j in oldA) && !sub(oldA[j],newA[j],strA[i]);j++)
;
str = str strA[i]
}
return str
}
BEGIN { FS=OFS="," }
{ print tr("P012345678","Z567899999",$1), $2 }
$ awk -f tst.awk file
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2

Unfortunately, AWK does not have a built in translation function. You could write one like Ed Morton has done, but I would reach for (and highly recommend) a more powerful tool. Perl, for example, can process fields using the autosplit (-a) command switch:
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the #F array is done as the first thing inside the
implicit while loop produced by the -n or -p.
You can type perldoc perlrun for more details.
Here's my solution:
perl -F, -lane '$F[0] =~ tr/0-9/5-9/; $F[0] =~ tr/A-Z/X-Z/; print join (",", #F)' file.txt
Results:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2

Related

Split multiple column with awk

I need to split a file with multiple columns that looks like this:
TCONS_00000001 q1:Ovary1.13|Ovary1.13.1|100|32.599877 q2:Ovary2.16|Ovary2.16.1|100|88.36
TCONS_00000002 q1:Ovary1.19|Ovary1.19.1|100|12.876644 q2:Ovary2.15|Ovary2.15.1|100|365.44
TCONS_00000003 q1:Ovary1.19|Ovary1.19.2|0|0.000000 q2:Ovary2.19|Ovary2.19.1|100|64.567
Output needed:
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
My attempt:
awk 'BEGIN {OFS=FS="\t"}{split($2,two,"|");split($3,thr,"|");print $1,two[2],two[4],thr[2],thr[4]}' in.file
Problem:
I have many more columns to split like 2 and 3, I would like to find a shorter solutions than splitting every column one by one.
While Sundeep's answer is great, if you are planning for a redundant action on a set of records, suggest using a function and run it on each record.
I would write an awk script as below
#!/usr/bin/env awk
function split_args(record) {
n=split(record,split_array,"[:|]")
return (split_array[3]"\t"split_array[n])
}
BEGIN { FS=OFS="\t" }
{
for (i=2;i<=NF;i++) {
$i=split_args($i)
}
print
}
and invoke it as
awk -f script.awk inputfile
An ugly command-line version of it would be
awk 'function split_args(record) {
n=split(record,split_array,"[:|]")
return (split_array[3]"\t"split_array[n])
}
BEGIN { FS=OFS="\t" }
{
for (i=2;i<=NF;i++) {
$i=split_args($i)
}
print
}
' newfile
$ # borrowing simplicity from #Inian's answer ;)
$ awk 'BEGIN{FS=OFS="\t"}
{for(i=2; i<=NF; i++){split($i,a,/[:|]/); $i=a[3] "\t" a[5]}} 1' ip.txt
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
$ # previous solution which leaves tab character at end
$ awk -F'\t' '{printf "%s\t",$1;
for(i=2; i<=NF; i++){split($i,a,/[:|]/); printf "%s\t%s\t",a[3],a[5]};
print ""}' ip.txt
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def

Redirecting output of command to file in awk

I need to redirect output of only a particular command within awk to a file.
I need to access the array outside the if.
What I've tried:
awk '
{
if (condition)
{
array[FNR]=$1;
print array[FNR];
df home/user/loc1 home/user/loc2 > file1.txt
}
fi
print array[]
}' /home/user/testfile.txt
Errors I'm getting:
awk: cmd. line:18: df home/user/loc1 home/user/loc2 > file1.txt
awk: cmd. line:18:
^ syntax error
awk: cmd. line:20: print array[]
awk: cmd. line:20: ^ syntax error
awk: cmd. line:20: fatal: invalid subscript expression
I believe this is what you are looking for,
awk '
{
if (condition)
{
array[FNR]=$1;
print array[FNR] # >> file2.txt (use this if you want to print to a file)
system("df home/user/loc1 home/user/loc2") >> file1.txt # Not sure why you want to run df shell command inside awk
}
} END { # after the entire test file is read you can do your calculations below
for (i in array) print array[i];
}' /home/user/testfile.txt

AWK - execute string as command?

This command prints:
$ echo "123456789" | awk '{ print substr ($1,1,4) }'
1234
Is it possible to execute a string as command? For example, this command:
echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
Result:
$ echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
awk: {a="substr"; print a ($1,1,4) }
awk: ^ syntax error
EDIT:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
bolek#bolek-desktop:~/Pulpit$ echo "123456789" | gawk -f tst.awk
gawk: tst.awk:3: { a="my_substr"; print #a($1,1,4) }
gawk: tst.awk:3: ^ nieprawidłowy znak '#' w wyrażeniu
bolek#bolek-desktop:~/Pulpit$
I don't think it is possible to do that in awk directly, but you can get a similar effect by using the shell. Recall that the awk program is given as a string, and strings are concatenated in the shell just by writing them next to one another. Thus, you can do this:
a=substr
echo "123456789" | awk '{ print '"$a"'($1, 1, 4) }'
resulting in 1234.
You can call user-defined functions via variables in GNU awk using indirect function calls, see http://www.gnu.org/software/gawk/manual/gawk.html#Indirect-Calls
$ cat tst.awk
function foo() { print "foo() called" }
function bar() { print "bar() called" }
BEGIN {
the_func = "foo"
#the_func()
the_func = "bar"
#the_func()
}
$ gawk -f tst.awk
foo() called
bar() called
Unfortunately due to internal implementation issues, if you want to call builtin functions that way then you need to write a wrapper for each:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
$ echo "123456789" | gawk -f tst.awk
1234

awk - how to specify field separator as binary value 0x1

Is it possible to specify the separator field FS in binary for awk?
I have data file with ascii data fields but separated by binary delimiter 0x1.
If it was character '1' it would look like this:
awk -F1 '/FIELD/ { print $1 }'
Or in script:
#!/bin/awk -f
BEGIN { FS = "1" }
/FIELD/ { print $1 }
How can I specify FS/F to be 0x1.
#!/bin/awk -f
BEGIN { FS = "\x01" }
/FIELD/ { print $1 }
See http://www.gnu.org/manual/gawk/html_node/Escape-Sequences.html.
awk -F '\x01' '/FIELD/ { print $1 }'
works on mawk, gawk, or nawk :
awk -F'\1' '{ … }'
awks can properly decipher the octal code without needing help from the shell like
awk -F$'\1' '{ … }'