Splitting a column vertically using AWK - awk

If i have +2, i want this to be + 2 as separate columns. I am doing this for a large column so I cannot do it manually.
Edit #1
cat maser_neg_test.txt | awk '{print NR, $0}' | awk '{print $1, $2, ((15 * $3)
+ ((1/4) * $4) + ((1/240) * $5)), (($6)+ ($7/60) + ($8/3600) ,$9}' | awk
'{printf "%s %-15s %-10s %-10s %-6s\n", $1, $2, $3, $4 , $5}' >
maser_neg_test2.txt
is my code, which transforms
RXSJ00001+0523 00 00 11.78 +05 23 17.4 11992 2016-02-12 51.3 3 10.9 10631 13365
KUG2358+330 00 00 58.10 +33 20 38.0 12921 2012-11-17 36.5 8 4.0 11461 14395
0001233+4733537 00 01 23.30 +47 33 53.7 5237 2010-11-02 39.5 10 3.6 3848 6639 3.5 6358 9196
NGC-7805 00 01 26.76 +31 26 01.4 4850 2006-01-05 43.8 5 6.0 3464 6248 5.6 5968 8799
into
1 RXSJ00001+0523 0.04908 5.38817 11992
2 KUG2358+330 0.24208 33.3439 12921
3 0001233+4733537 0.34708 47.5649 5237
4 NGC-7805 0.36150 31.4337 4850"
but my research advisor noted that in my conversion of
dec:
1*(hr) = degree_1
(1/60) * (min) = degree_2
(1/3600) * (sec) = degree_3
degree_1 + degree_2 + degree_3 = dec (degrees)
which is the data +05 23 17.4 as hr min sec, that just adding them when the sign is negative does not combine these right. So i'm trying to pull out the sign before doing my calculations and then re-apply it
Edit 2
Is an example of some of the negative cases; sorry this is my first post I wasn't really sure how to format it at first.
NGC-23 00 09 53.42 +25 55 25.5 4565 2005-12-18 44.2 30 2.5 3182 5961 2.3 5681 8506
UM207 00 10 06.63 -00 26 09.4 9648 2010-01-10 25.2 10 2.1 8218 11091 2.1 10802 13723
MARK937 00 10 09.99 -04 42 38.0 8846 2016-02-04 42.5 10 4.4 7512 10192
Mrk937 00 10 10.01 -04 42 37.9 8851 2003-11-01 60.4 24 4.1 7428 10286
NGC-26 00 10 25.86 +25 49 54.6 4589 2005-12-14 41.2 5 5.7 3205 5985 5.1 5705 8531

I think you are overcomplicating things a lot by using multiple layers of awk (and unnecessary cat), and thinking of how to "split columns vertically" rather than just solving the problem, which seems to be that for a negative sign you should subtract, rather than add, the minutes and seconds.
So, use intermediate variables and check for the sign ($5 ~ /^-/):
awk '{ deg = $6/60 + $7/3600; deg = ($5 ~ /^-/) ? $5 - deg : $5 + deg;
printf "%s %-15s %-10s %-10s %-6s\n",
NR, $1, ((15 * $2) + (1/4 * $3) + (1/240 * $4)), deg, $8
}' maser_neg_test.txt
(edit: As pointed out by the OP, the original test $5 < 0 would fail when that field was -0.)

Try something like this:
echo '+2' | awk -v FS="" '{print $1" "$2}'
Result:
+ 2
If you have a text file (test.txt) with information such as
+2
-3
+4
+5
and you need output like so:
+ 2
- 3
+ 4
+ 5
Try this:
awk -v FS="" '{print $1" "$2}' test.txt
As two commenters have mentioned, it would be good for you to add some example data and the output that you desire. The answer above is just one of the many ways you can format your data.
EDIT
In your particular example, you could just use sed instead of cat'ing the file like so:
sed 's_+__g' test.txt | awk '{print NR, $0}' | awk '{print $1, $2, 15*$3 + $4/4 + $5/240, $6 + $7/60 + $8/3600, $9}'
sed will replace + in your file with nothing and then send the output to awk. If you have - also, you can perhaps remove them by either using sed creatively or double-sed'ing like so:
sed 's_+__g' test.txt | sed 's_-__g' | awk '{print NR, $0}' | awk '{print $1, $2, 15*$3 + $4/4 + $5/240, $6 + $7/60 + $8/3600, $9}'
In the scenario above, you may end up removing + and - that are probably wanted in the first column (looks like same code).

You can split the field with the sign into an array. You can keep the first array element as the sign and the second array element as the value:
$ awk '{match($6,/([+-])(.*)/,m);print "m[1]=",m[1]," m[2]=",m[2];print m[1] m[2]+$7/60+$8/3600}' <<<"1 RXSJ00001+0523 00 00 11.78 -05 23 17.4"
#Output
m[1]= - m[2]= 05
-5.38817
Thus you can make all the calculations using m[2] instead of $6.
If you need to print the sign , you just need to print m[1] before m[2]
PS: By ommiting the coma in print and using space you force concatenation (see my example above)

Related

combining and processing 2 tab separated files in awk and make a new one

I have 2 tab separated files with 2 columns. column1 1 is number and column 2 is ID. like these 2 examples:
example file1:
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
example file1:
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
I want to find the common rows of 2 files based on column 2 (column ID) and make a new tab separated file in which there are 4 columns: column1 is ID (common ID) column2 is the number from file1, column3 is the number from file2 and column4 is the log2 values of ratio of columns 2 and 3 (which means log2(column2/column3)). for example regarding the ID "TPT1": 1st column is TPT1, column2 is 188, column3 is 2217 and column 4 is log2(188/2217) which is equal to -3.561494.
here is a the expected output:
expected output:
TPT1 188 2217 -3.561494
ATP5C1 420 223 0.9133394
I am trying to do that in AWK using the following code:
awk 'NR==FNR { n[$2]=$0;next } ($2 in n) { print n[$2 '\t' $1] '\t' $1 '\t' log(n[$1]/$1)}' file1.txt file2.txt > result.txt
this code does not return what I expect. do you know how to fix it?
$ awk -v OFS="\t" 'NR==FNR {n[$2]=$1;next} ($2 in n) {print $2, $1, n[$2], log(n[$2]/$1)/log(2)}' file1 file2
TPT1 2217 188 -3.5598
ATP5C1 223 420 0.913346
I'd use join to actually merge the files instead of awk:
$ join -j2 <(sort -k2 file1.txt) <(sort -k2 file2.txt) |
awk -v OFS="\t" '{ print $1, $2, $3, log($2/$3)/log(2) }'
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
The join program, well, joins two files on a common value. It does require the files to be sorted based on the join column, but your examples aren't, hence the inline sorting of the data files. Its output is then piped to awk to compute the log2 of the numbers of each line and produce tab-delimited results.
Alternative using perl which gives you more default precision if you care about that (And don't want to mess with awk's CONVFMT variable):
$ join -j2 <(sort -k2 a.txt) <(sort -k2 b.txt) |
perl -lane 'print join("\t", #F, log($F[1]/$F[2])/log(2))'
ATP5C1 420 223 0.913345617745818
TPT1 188 2217 -3.55980420318967
awk + sort approach
awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
with the given inputs
$ cat ellyx.txt
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
$ cat ellyy.txt
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
$ awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
$

Print Distinct Values from Field AWK

I'm looking for a way to print the distinct values in a field while in the command-prompt environment using AWK.
ID Title Promotion_ID Flag
12 Purse 7 Y
24 Wallet 7 Y
709 iPhone 1117 Y
74 Satchel 7 Y
283 Xbox 84 N
Ideally I'd like to return the promotion_ids: 7, 1117, 84.
I've researched the question on Google and have found some examples such as:
`cut -f 3 | uniq *filename.ext*` (returned error)
`awk cut -f 3| uniq *filename.ext*` (returned error)
`awk cut -d, -f3 *filename.ext* |sort| uniq` (returned error)
awk 'NR>1{a[$3]++} END{for(b in a) print b}' file
Output:
7
84
1117
Solution 1st: Simple awk may help.(Following will remove the header of Input_file)
awk 'FNR>1 && !a[$3]++{print $3}' Input_file
Solution 2nd: In case you need to keep the Header of the Input_file then following may help you on same.
awk 'FNR==1{print;next} !a[$3]++{print $3}' Input_file
with the pipe line
$ sed 1d file | # remove header
tr -s ' ' '\t' | # normalize space delimiters to tabs
cut -f3 | # isolate the field
sort -nu # sort numerically and report unique entries
7
84
1117
[root#test ~]# cat test
ID Title Promotion_ID Flag
12 Purse 7 Y
24 Wallet 7 Y
709 iPhone 1117 Y
74 Satchel 7 Y
283 Xbox 84 N
Output -:
[root#test ~]# awk -F" " '!s[$3]++' test
ID Title Promotion_ID Flag
12 Purse 7 Y
709 iPhone 1117 Y
283 Xbox 84 N
[root#test ~]#
mawk '!__[$!NF=$--NF]--^(!_<NR)'
or
gawk' !__[$!--NF=$NF]--^(!_<NR)'
or perhaps
gawk '!__[$!--NF=$NF]++^(NF<NR)'
or even
mawk '!__[$!--NF=$NF]++^(NR-!_)' # mawk-only
gawk '!__[$!--NF=$--NF]--^(NR-NF)' # gawk-equiv of similar idea
7
1117
84

extracting data from a column based on another column

I have some files as shown below. I would like to extract the values of $5 based on $1.
file1
sam 60.2 143 40.4 19.8
mathew 107.9 144 35.6 72.3
baby 48.1 145 17.8 30.3
rehna 47.2 146 21.2 26.0
sam 69.9 147 .0 69.9
file2
baby 58.9 503 47.5 11.4
daisy 20.8 504 20.4 .4
arch 61.1 505 12.3 48.8
sam 106.6 506 101.6 5.0
rehna 73.5 507 35.9 37.6
sam 92.0 508 61.1 30.9
I used the following code to extract $5.
awk '$1 == "rehna" { print $5 }' *
awk '$1 == "sam" { print $5 }' *
I would like to get the output as shown below
rehna sam
26.0 19.8
37.6 69.9
5.0
30.9
How do I achieve this? your suggestions would be appreciated!
The simplest is probably to paste the results together:
#!/bin/bash
function myawk {
awk -v name="$1" 'BEGIN {print name} $1 == name { print $5 }' file1 file2
}
paste <(myawk rehna) <(myawk sam)
Running this produces the results you requested (with TAB as the separator character). See paste documentation for other options.
Update: peak's answer has since wrapped this approach in a function, in the spirit of DRY. If you want more background information, read on.
Assuming Bash, Ksh, or Zsh as the shell:
printf '%s\t%s\n' 'rehna' 'sam'
paste \
<(awk '$1 == "rehna" { print $5 }' *) \
<(awk '$1 == "sam" { print $5 }' *)
The above produces tab-separated output.
paste is a POSIX utility that outputs corresponding lines from its input files, by default separated with tabs; e.g., paste fileA fileB yields:
<line 1 from fileA>\t<line 1 from fileB>
<line 2 from fileA>\t<line 2 from fileB>
...
If any input file runs out of lines, it supplies empty lines.
In the case at hand, the respective outputs from the awk commands are used as input files, using process substitution (<(...)).

awk and sprintf to zero fill

Using awk and sprintf how can I zero fill both before and after a decimal point
input
11
12.2
9.6
output
110
122
096
I can get either using these, but not both
sprintf("%.1f", $1)
output
110
122
96
sprintf("%03d", $1)
output
011
012
096
x = sprintf("%06.3f", 1.23)
Output:
$ awk 'BEGIN{x = sprintf("%06.3f", 1.23); print x}'
01.230
$
I really can't tell from your question but maybe one of these does whatever it is you want:
$ cat file
11
12.2
9.6
$ awk '{ x=sprintf("%03d",$0*10); print x }' file
110
122
096
$ awk '{ x=sprintf("%04.1f",$0); print x }' file
11.0
12.2
09.6
Obviously you could just use printf with no intermediate variable but you asked for sprintf().

awk + Need to print everything (all rest fields) except $1 and $2

I have the following file and I need to print everything except $1 and $2 by awk
File:
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
...
the desirable output:
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
...
Well, given your data, cut should be sufficient:
cut -d\ -f3- infile
Although it adds an extra space at the beginning of each line compared to yael's expected output, here is a shorter and simpler awk based solution than the previously suggested ones:
awk '{$1=$2=""; print}'
or even:
awk '{$1=$2=""}1'
$ cat t
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
$ awk '{for (i = 3; i <= NF; i++) printf $i " "; print ""}' t
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
danbens answer leaves a whitespace at the end of the resulting string. so the correct way to do it would be:
awk '{for (i=3; i<NF; i++) printf $i " "; print $NF}' filename
If the first two words don't change, probably the simplest thing would be:
awk -F 'INFORMATION DATA ' '{print $2}' t
Here's another awk solution, that's more flexible than the cut one and is shorter than the other awk ones. Assuming your separators are single spaces (modify the regex as necessary if they are not):
awk --posix '{sub(/([^ ]* ){2}/, ""); print}'
If Perl is an option:
perl -lane 'splice #F,0,2; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print it
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,2 cleanly removes columns 0 and 1 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Variation for csv input files:
perl -F, -lane 'splice #F,0,2; print join " ",#F' file
This uses the -F field separator option with a comma