awk and sprintf to zero fill - awk

Using awk and sprintf how can I zero fill both before and after a decimal point
input
11
12.2
9.6
output
110
122
096
I can get either using these, but not both
sprintf("%.1f", $1)
output
110
122
96
sprintf("%03d", $1)
output
011
012
096

x = sprintf("%06.3f", 1.23)
Output:
$ awk 'BEGIN{x = sprintf("%06.3f", 1.23); print x}'
01.230
$

I really can't tell from your question but maybe one of these does whatever it is you want:
$ cat file
11
12.2
9.6
$ awk '{ x=sprintf("%03d",$0*10); print x }' file
110
122
096
$ awk '{ x=sprintf("%04.1f",$0); print x }' file
11.0
12.2
09.6
Obviously you could just use printf with no intermediate variable but you asked for sprintf().

Related

combining and processing 2 tab separated files in awk and make a new one

I have 2 tab separated files with 2 columns. column1 1 is number and column 2 is ID. like these 2 examples:
example file1:
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
example file1:
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
I want to find the common rows of 2 files based on column 2 (column ID) and make a new tab separated file in which there are 4 columns: column1 is ID (common ID) column2 is the number from file1, column3 is the number from file2 and column4 is the log2 values of ratio of columns 2 and 3 (which means log2(column2/column3)). for example regarding the ID "TPT1": 1st column is TPT1, column2 is 188, column3 is 2217 and column 4 is log2(188/2217) which is equal to -3.561494.
here is a the expected output:
expected output:
TPT1 188 2217 -3.561494
ATP5C1 420 223 0.9133394
I am trying to do that in AWK using the following code:
awk 'NR==FNR { n[$2]=$0;next } ($2 in n) { print n[$2 '\t' $1] '\t' $1 '\t' log(n[$1]/$1)}' file1.txt file2.txt > result.txt
this code does not return what I expect. do you know how to fix it?
$ awk -v OFS="\t" 'NR==FNR {n[$2]=$1;next} ($2 in n) {print $2, $1, n[$2], log(n[$2]/$1)/log(2)}' file1 file2
TPT1 2217 188 -3.5598
ATP5C1 223 420 0.913346
I'd use join to actually merge the files instead of awk:
$ join -j2 <(sort -k2 file1.txt) <(sort -k2 file2.txt) |
awk -v OFS="\t" '{ print $1, $2, $3, log($2/$3)/log(2) }'
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
The join program, well, joins two files on a common value. It does require the files to be sorted based on the join column, but your examples aren't, hence the inline sorting of the data files. Its output is then piped to awk to compute the log2 of the numbers of each line and produce tab-delimited results.
Alternative using perl which gives you more default precision if you care about that (And don't want to mess with awk's CONVFMT variable):
$ join -j2 <(sort -k2 a.txt) <(sort -k2 b.txt) |
perl -lane 'print join("\t", #F, log($F[1]/$F[2])/log(2))'
ATP5C1 420 223 0.913345617745818
TPT1 188 2217 -3.55980420318967
awk + sort approach
awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
with the given inputs
$ cat ellyx.txt
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
$ cat ellyy.txt
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
$ awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
$

Use standard output as an input in awk [duplicate]

This question already has answers here:
Reading from stdin OR file using awk
(2 answers)
Closed 4 years ago.
I have a file that I cannot do any modification due to permission issues.
So I can only visualize the files with a tool and its view option which prints out its content to a standard output with 8 tab delimited columns :
foo-tool view file1.txt
ALICE . CANDY 1990 . 76 76 78
MARK . CARAMEL 1991 . 45 88 88
CLAIRE . SALTY XXX . 77 82 12
I do have another file that I want to compare its 1st,6th and 7th columns with the 1st,6th and 7th columns of file1.txt, and add the 3rd and 4th columns of file1.txt to file2.txt in case of any match in these columns.
file2.txt
ALICE . CANDY 1990 . 76 76 97
MARK . CARAMEL 1991 . 45 88 87
BLAINE . SALTY XXX . 77 82 10
If I would be able to open file1.txt rather than only standard output, I would do :
awk 'NR==FNR { a[$1,$6,$7] = $0; next }($1,$6,$7) in a { print a[$1,$6,$7], $3, $4 }' file1.txt file2.txt
So the output would be :
ALICE . CANDY 1990 . 76 76 78 CANDY 1990
MARK . 54 1991 . 45 88 88 CARAMEL 1991
But, since I cannot use the standard output of file1.txt as a file, I am stuck on how to proceed.
I tried to open it and direct its standard output but it did not work:
foo-tool view file1.txt | awk 'NR==FNR { a[$1,$6,$7] = $0; next }($1,$6,$7) in a { print a[$1,$6,$7], $3, $4 }' ARG[$1] file2.txt
How can I pass the standard output as a file input in awk as one of the files?
"-" (a dash) is a special filename that means standard input. This convention is used in many Unix tools, and especially awk
Your command line must then be:
$ foo-tool view file1.txt | awk '{ your_program }' - file2.txt
Alternatively, if your system supports it (Linux does), you can use the /dev/stdin file:
$ foo-tool view file1.txt | awk '{ your_program }' /dev/stdin file2.txt
You can also use "process substitution" if your shell supports it (bash, ksh and zsh do):
$ awk '{ your_program }' <(foo-tool view file1.txt) file2.txt
It may be useful if you have to process the output of several distinct commands, like:
$ awk '{ your_program }' <(foo-tool view file1.txt) <(foo-tool view file2.txt)

Splitting a column vertically using AWK

If i have +2, i want this to be + 2 as separate columns. I am doing this for a large column so I cannot do it manually.
Edit #1
cat maser_neg_test.txt | awk '{print NR, $0}' | awk '{print $1, $2, ((15 * $3)
+ ((1/4) * $4) + ((1/240) * $5)), (($6)+ ($7/60) + ($8/3600) ,$9}' | awk
'{printf "%s %-15s %-10s %-10s %-6s\n", $1, $2, $3, $4 , $5}' >
maser_neg_test2.txt
is my code, which transforms
RXSJ00001+0523 00 00 11.78 +05 23 17.4 11992 2016-02-12 51.3 3 10.9 10631 13365
KUG2358+330 00 00 58.10 +33 20 38.0 12921 2012-11-17 36.5 8 4.0 11461 14395
0001233+4733537 00 01 23.30 +47 33 53.7 5237 2010-11-02 39.5 10 3.6 3848 6639 3.5 6358 9196
NGC-7805 00 01 26.76 +31 26 01.4 4850 2006-01-05 43.8 5 6.0 3464 6248 5.6 5968 8799
into
1 RXSJ00001+0523 0.04908 5.38817 11992
2 KUG2358+330 0.24208 33.3439 12921
3 0001233+4733537 0.34708 47.5649 5237
4 NGC-7805 0.36150 31.4337 4850"
but my research advisor noted that in my conversion of
dec:
1*(hr) = degree_1
(1/60) * (min) = degree_2
(1/3600) * (sec) = degree_3
degree_1 + degree_2 + degree_3 = dec (degrees)
which is the data +05 23 17.4 as hr min sec, that just adding them when the sign is negative does not combine these right. So i'm trying to pull out the sign before doing my calculations and then re-apply it
Edit 2
Is an example of some of the negative cases; sorry this is my first post I wasn't really sure how to format it at first.
NGC-23 00 09 53.42 +25 55 25.5 4565 2005-12-18 44.2 30 2.5 3182 5961 2.3 5681 8506
UM207 00 10 06.63 -00 26 09.4 9648 2010-01-10 25.2 10 2.1 8218 11091 2.1 10802 13723
MARK937 00 10 09.99 -04 42 38.0 8846 2016-02-04 42.5 10 4.4 7512 10192
Mrk937 00 10 10.01 -04 42 37.9 8851 2003-11-01 60.4 24 4.1 7428 10286
NGC-26 00 10 25.86 +25 49 54.6 4589 2005-12-14 41.2 5 5.7 3205 5985 5.1 5705 8531
I think you are overcomplicating things a lot by using multiple layers of awk (and unnecessary cat), and thinking of how to "split columns vertically" rather than just solving the problem, which seems to be that for a negative sign you should subtract, rather than add, the minutes and seconds.
So, use intermediate variables and check for the sign ($5 ~ /^-/):
awk '{ deg = $6/60 + $7/3600; deg = ($5 ~ /^-/) ? $5 - deg : $5 + deg;
printf "%s %-15s %-10s %-10s %-6s\n",
NR, $1, ((15 * $2) + (1/4 * $3) + (1/240 * $4)), deg, $8
}' maser_neg_test.txt
(edit: As pointed out by the OP, the original test $5 < 0 would fail when that field was -0.)
Try something like this:
echo '+2' | awk -v FS="" '{print $1" "$2}'
Result:
+ 2
If you have a text file (test.txt) with information such as
+2
-3
+4
+5
and you need output like so:
+ 2
- 3
+ 4
+ 5
Try this:
awk -v FS="" '{print $1" "$2}' test.txt
As two commenters have mentioned, it would be good for you to add some example data and the output that you desire. The answer above is just one of the many ways you can format your data.
EDIT
In your particular example, you could just use sed instead of cat'ing the file like so:
sed 's_+__g' test.txt | awk '{print NR, $0}' | awk '{print $1, $2, 15*$3 + $4/4 + $5/240, $6 + $7/60 + $8/3600, $9}'
sed will replace + in your file with nothing and then send the output to awk. If you have - also, you can perhaps remove them by either using sed creatively or double-sed'ing like so:
sed 's_+__g' test.txt | sed 's_-__g' | awk '{print NR, $0}' | awk '{print $1, $2, 15*$3 + $4/4 + $5/240, $6 + $7/60 + $8/3600, $9}'
In the scenario above, you may end up removing + and - that are probably wanted in the first column (looks like same code).
You can split the field with the sign into an array. You can keep the first array element as the sign and the second array element as the value:
$ awk '{match($6,/([+-])(.*)/,m);print "m[1]=",m[1]," m[2]=",m[2];print m[1] m[2]+$7/60+$8/3600}' <<<"1 RXSJ00001+0523 00 00 11.78 -05 23 17.4"
#Output
m[1]= - m[2]= 05
-5.38817
Thus you can make all the calculations using m[2] instead of $6.
If you need to print the sign , you just need to print m[1] before m[2]
PS: By ommiting the coma in print and using space you force concatenation (see my example above)

extracting data from a column based on another column

I have some files as shown below. I would like to extract the values of $5 based on $1.
file1
sam 60.2 143 40.4 19.8
mathew 107.9 144 35.6 72.3
baby 48.1 145 17.8 30.3
rehna 47.2 146 21.2 26.0
sam 69.9 147 .0 69.9
file2
baby 58.9 503 47.5 11.4
daisy 20.8 504 20.4 .4
arch 61.1 505 12.3 48.8
sam 106.6 506 101.6 5.0
rehna 73.5 507 35.9 37.6
sam 92.0 508 61.1 30.9
I used the following code to extract $5.
awk '$1 == "rehna" { print $5 }' *
awk '$1 == "sam" { print $5 }' *
I would like to get the output as shown below
rehna sam
26.0 19.8
37.6 69.9
5.0
30.9
How do I achieve this? your suggestions would be appreciated!
The simplest is probably to paste the results together:
#!/bin/bash
function myawk {
awk -v name="$1" 'BEGIN {print name} $1 == name { print $5 }' file1 file2
}
paste <(myawk rehna) <(myawk sam)
Running this produces the results you requested (with TAB as the separator character). See paste documentation for other options.
Update: peak's answer has since wrapped this approach in a function, in the spirit of DRY. If you want more background information, read on.
Assuming Bash, Ksh, or Zsh as the shell:
printf '%s\t%s\n' 'rehna' 'sam'
paste \
<(awk '$1 == "rehna" { print $5 }' *) \
<(awk '$1 == "sam" { print $5 }' *)
The above produces tab-separated output.
paste is a POSIX utility that outputs corresponding lines from its input files, by default separated with tabs; e.g., paste fileA fileB yields:
<line 1 from fileA>\t<line 1 from fileB>
<line 2 from fileA>\t<line 2 from fileB>
...
If any input file runs out of lines, it supplies empty lines.
In the case at hand, the respective outputs from the awk commands are used as input files, using process substitution (<(...)).

Awk pass variable containing columns to be printed

I want to pass a variable to awk containing which columns to print from a file?
In this trivial case file.txt contains a single line
11 22 33 44 55
This is what I've tried:
awk -v a='2/4' -v b='$2/$4' '{print a"\n"$a"\n"b"\n"$b}' file.txt
output:
2/4
22
$2/$4
11 22 33 44 55
desired output:
0.5
Is there any way to do this type of "eval" of variable as a command?
Here is one method for dividing columns specified in variables:
$ awk -v num=2 -v denom=4 '{print $num/$denom}' file.txt
0.5
If you trust the person who creates the shell variable b, then here is a method that offers flexibility:
$ b='$2/$4'; awk "{print $b}" file.txt
0.5
$ b='$1*$2'; awk "{print $b}" file.txt
242
$ b='$2,$2/$4,$5'; awk "{print $b}" file.txt
22 0.5 55
The flexibility here is due to the fact that b can contain any awk code. This approach requires that you trust the creator of b.