awk - how to specify field separator as binary value 0x1 - awk

Is it possible to specify the separator field FS in binary for awk?
I have data file with ascii data fields but separated by binary delimiter 0x1.
If it was character '1' it would look like this:
awk -F1 '/FIELD/ { print $1 }'
Or in script:
#!/bin/awk -f
BEGIN { FS = "1" }
/FIELD/ { print $1 }
How can I specify FS/F to be 0x1.

#!/bin/awk -f
BEGIN { FS = "\x01" }
/FIELD/ { print $1 }
See http://www.gnu.org/manual/gawk/html_node/Escape-Sequences.html.

awk -F '\x01' '/FIELD/ { print $1 }'

works on mawk, gawk, or nawk :
awk -F'\1' '{ … }'
awks can properly decipher the octal code without needing help from the shell like
awk -F$'\1' '{ … }'

Related

Split multiple column with awk

I need to split a file with multiple columns that looks like this:
TCONS_00000001 q1:Ovary1.13|Ovary1.13.1|100|32.599877 q2:Ovary2.16|Ovary2.16.1|100|88.36
TCONS_00000002 q1:Ovary1.19|Ovary1.19.1|100|12.876644 q2:Ovary2.15|Ovary2.15.1|100|365.44
TCONS_00000003 q1:Ovary1.19|Ovary1.19.2|0|0.000000 q2:Ovary2.19|Ovary2.19.1|100|64.567
Output needed:
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
My attempt:
awk 'BEGIN {OFS=FS="\t"}{split($2,two,"|");split($3,thr,"|");print $1,two[2],two[4],thr[2],thr[4]}' in.file
Problem:
I have many more columns to split like 2 and 3, I would like to find a shorter solutions than splitting every column one by one.
While Sundeep's answer is great, if you are planning for a redundant action on a set of records, suggest using a function and run it on each record.
I would write an awk script as below
#!/usr/bin/env awk
function split_args(record) {
n=split(record,split_array,"[:|]")
return (split_array[3]"\t"split_array[n])
}
BEGIN { FS=OFS="\t" }
{
for (i=2;i<=NF;i++) {
$i=split_args($i)
}
print
}
and invoke it as
awk -f script.awk inputfile
An ugly command-line version of it would be
awk 'function split_args(record) {
n=split(record,split_array,"[:|]")
return (split_array[3]"\t"split_array[n])
}
BEGIN { FS=OFS="\t" }
{
for (i=2;i<=NF;i++) {
$i=split_args($i)
}
print
}
' newfile
$ # borrowing simplicity from #Inian's answer ;)
$ awk 'BEGIN{FS=OFS="\t"}
{for(i=2; i<=NF; i++){split($i,a,/[:|]/); $i=a[3] "\t" a[5]}} 1' ip.txt
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
$ # previous solution which leaves tab character at end
$ awk -F'\t' '{printf "%s\t",$1;
for(i=2; i<=NF; i++){split($i,a,/[:|]/); printf "%s\t%s\t",a[3],a[5]};
print ""}' ip.txt
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567

Using a field separator of the two-character string "\n" in awk

My input file has a plain-text representation of the newline character in it separating the fields:
First line\nSecond line\nThird line
I would expect the following to replace that text \n with a newline:
$ awk 'BEGIN { FS = "\\n"; OFS = "\n" } { print $1 }' test.txt
First line\nSecond line\nThird line
But it doesn't (gawk 4.0.1 / OpenBSD nawk 20110810).
I'm allowed to separate on just the \:
$ awk 'BEGIN { FS = "\\"; OFS = "\n" } { print $1, $2 }' test.txt
First line
nSecond line
I can also use a character class in gawk:
$ awk 'BEGIN { FS = "[[:punct:]]n"; OFS = "\n" } { $1 = $1; print $0 }' test.txt
First line
Second line
Third line
But I feel like I should be able to specify the exact separator.
A field separator is a type of regexp and when using a dynamic regexp you need to double escape everything:
$ awk 'BEGIN { FS = "\\\\n"; OFS = "\n" } { print $1 }' file
First line
See the man page for details.
Here sed might be a better tool for this task
sed 's/\\n/\n/g'

How to translate a column value in the file using awk with tr command in unix

Details:
Input file : file.txt
P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2
Expected output:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
If i try using a variable it is giving proper result.
(i.e) /tmp>echo "P123456789"|tr "0-9" "5-9"|tr "A-Z" "X-Z"
Z678999999
But if i do with awk command it is not giving result instead giving error:
/tmp>$ awk 'BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }' /tmp/file.txt >/tmp/file.txt.tmp
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
Can anyone help please?
just do what you wanted, without changing your logic:
awk line:
awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
with your data:
kent$ echo "P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2"|awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
$ cat tst.awk
function tr(old,new,str, oldA,newA,strA,i,j) {
split(old,oldA,"")
split(new,newA,"")
split(str,strA,"")
str = ""
for (i=1;i in strA;i++) {
for (j=1;(j in oldA) && !sub(oldA[j],newA[j],strA[i]);j++)
;
str = str strA[i]
}
return str
}
BEGIN { FS=OFS="," }
{ print tr("P012345678","Z567899999",$1), $2 }
$ awk -f tst.awk file
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
Unfortunately, AWK does not have a built in translation function. You could write one like Ed Morton has done, but I would reach for (and highly recommend) a more powerful tool. Perl, for example, can process fields using the autosplit (-a) command switch:
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the #F array is done as the first thing inside the
implicit while loop produced by the -n or -p.
You can type perldoc perlrun for more details.
Here's my solution:
perl -F, -lane '$F[0] =~ tr/0-9/5-9/; $F[0] =~ tr/A-Z/X-Z/; print join (",", #F)' file.txt
Results:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2

awk command to split nth field

I am learning AWK and was trying some exercises on built-in string functions.
Here's my exercise:
I have a file containing as below
RecordType:83
1,2,3,a|x|y|z,4,5
And my desired output is as below:
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
I wrote an awk command for the above output.
awk -F',' '$1 ~ /RecordType:83/{print $0}
$1 == 1{
split($4,splt,"|")
for(i in splt)
{
if(i==1)
print $1,$2,$3,splt[i],$5,$6
else
print $1,0,0,splt[i],$5,$6
}
}' OFS=, file_name
The above command looks so clumsy. Is there any way minimizing the command?
Thanks in advance
The shortest possible one-liner I could manage:
awk -F, 'NR>1{n=split($4,a,"|");for(;i++<n;){$4=a[i];print;$2=$3=0}}NR==1' OFS=, file
RecordType:83    
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
The much more readable script (recommended):
BEGIN {
FS=OFS="," # Comma delimiter
}
NR==1 { # If the first line in file
print $0 # Print the whole line
next # Skip to next line
}
{
n=split($4,a,"|") # Split field four on |
for(i=1;i<=n;i++) # For each sub-field
print $1,i==1?$2OFS$3:"0"OFS"0",a[i],$5,$6 # Print the output
}
another shorter one-liner
awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
with your example:
kent$ awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5

How to print out a specific field in AWK?

A very simple question, which a found no answer to. How do I print out a specific field in awk?
awk '/word1/', will print out the whole sentence, when I need just a word1. Or I need a chain of patterns (word1 + word2) to be printed out only from a text.
Well if the pattern is a single word (which you want to print and can't contaion FS (input field separator)) why not:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print MYPATTERN }' INPUTFILE
If your pattern is a regex:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print gensub(".*(" MYPATTERN ").*","\\1","1",$0) }' INPUTFILE
If your pattern must be checked in every single field:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN {
for (i=1;i<=NF;i++) {
if ($i ~ MYPATTERN) { print "Field " i " in " NR " row matches: " MYPATTERN }
}
}' INPUTFILE
Modify any of the above to your taste.
The fields in awk are represented by $1, $2, etc:
$ echo this is a string | awk '{ print $2 }'
is
$0 is the whole line, $1 is the first field, $2 is the next field ( or blank ),
$NF is the last field, $( NF - 1 ) is the 2nd to last field, etc.
EDIT (in response to comment).
You could try:
awk '/crazy/{ print substr( $0, match( $0, "crazy" ), RLENGTH )}'
i know you can do this with awk :
an alternative would be :
sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file
or you can use grep -o
Something like this perhaps:
awk '{split("bla1 bla2 bla3",a," "); print a[1], a[2], a[3]}'