Pad AWK columns - awk

Trying to parse BSD top output to show only PID - COMMAND - MEM for a specific process;
$ top -l 1 | grep -E '%CPU\ |coreaudio'
PID COMMAND %CPU TIME #TH #WQ #PORTS MEM PURG
354 com.apple.audio. 0.0 00:00.00 2 1 12 820K 0B
296 com.apple.audio. 0.0 00:00.03 2 1 38 2024K 0B
282 coreaudiod 0.0 03:25.05 94 2 736 21M 0B
Using awk to show only column $1 - $2
$ top -l 1 | grep -E '%CPU\ |coreaudio' | awk {'print $1" -- "$2'}
PID -- COMMAND
354 -- com.apple.audio.
296 -- com.apple.audio.
282 -- coreaudiod
Adding a 3th column messes with the 'columns' since the second columns is not being padded;
$ top -l 1 | grep -E '%CPU\ |coreaudio' | awk {'print $1" -- "$2" -- "$8'}
PID -- COMMAND -- MEM
354 -- com.apple.audio. -- 820K
296 -- com.apple.audio. -- 2024K
282 -- coreaudiod -- 21M
How would I 'pad' the 'column' to keep the 'layout' intact? Or should I use a different tool like sed ?
Note;
Using top -l 1 since I'm on a Mac

You can pad the strings with some "constant" amount of spaces:
<<<$input awk '{printf "%-20s %-10s %-4s\n", $1, $2, $8}'
# ^^ ^^ ^ field width
# ^ ^ ^ left justify
You can use column:
<<<$input awk '{print $1, $2, $8}' | column -t

Related

combining and processing 2 tab separated files in awk and make a new one

I have 2 tab separated files with 2 columns. column1 1 is number and column 2 is ID. like these 2 examples:
example file1:
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
example file1:
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
I want to find the common rows of 2 files based on column 2 (column ID) and make a new tab separated file in which there are 4 columns: column1 is ID (common ID) column2 is the number from file1, column3 is the number from file2 and column4 is the log2 values of ratio of columns 2 and 3 (which means log2(column2/column3)). for example regarding the ID "TPT1": 1st column is TPT1, column2 is 188, column3 is 2217 and column 4 is log2(188/2217) which is equal to -3.561494.
here is a the expected output:
expected output:
TPT1 188 2217 -3.561494
ATP5C1 420 223 0.9133394
I am trying to do that in AWK using the following code:
awk 'NR==FNR { n[$2]=$0;next } ($2 in n) { print n[$2 '\t' $1] '\t' $1 '\t' log(n[$1]/$1)}' file1.txt file2.txt > result.txt
this code does not return what I expect. do you know how to fix it?
$ awk -v OFS="\t" 'NR==FNR {n[$2]=$1;next} ($2 in n) {print $2, $1, n[$2], log(n[$2]/$1)/log(2)}' file1 file2
TPT1 2217 188 -3.5598
ATP5C1 223 420 0.913346
I'd use join to actually merge the files instead of awk:
$ join -j2 <(sort -k2 file1.txt) <(sort -k2 file2.txt) |
awk -v OFS="\t" '{ print $1, $2, $3, log($2/$3)/log(2) }'
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
The join program, well, joins two files on a common value. It does require the files to be sorted based on the join column, but your examples aren't, hence the inline sorting of the data files. Its output is then piped to awk to compute the log2 of the numbers of each line and produce tab-delimited results.
Alternative using perl which gives you more default precision if you care about that (And don't want to mess with awk's CONVFMT variable):
$ join -j2 <(sort -k2 a.txt) <(sort -k2 b.txt) |
perl -lane 'print join("\t", #F, log($F[1]/$F[2])/log(2))'
ATP5C1 420 223 0.913345617745818
TPT1 188 2217 -3.55980420318967
awk + sort approach
awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
with the given inputs
$ cat ellyx.txt
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
$ cat ellyy.txt
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
$ awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
$

Print Distinct Values from Field AWK

I'm looking for a way to print the distinct values in a field while in the command-prompt environment using AWK.
ID Title Promotion_ID Flag
12 Purse 7 Y
24 Wallet 7 Y
709 iPhone 1117 Y
74 Satchel 7 Y
283 Xbox 84 N
Ideally I'd like to return the promotion_ids: 7, 1117, 84.
I've researched the question on Google and have found some examples such as:
`cut -f 3 | uniq *filename.ext*` (returned error)
`awk cut -f 3| uniq *filename.ext*` (returned error)
`awk cut -d, -f3 *filename.ext* |sort| uniq` (returned error)
awk 'NR>1{a[$3]++} END{for(b in a) print b}' file
Output:
7
84
1117
Solution 1st: Simple awk may help.(Following will remove the header of Input_file)
awk 'FNR>1 && !a[$3]++{print $3}' Input_file
Solution 2nd: In case you need to keep the Header of the Input_file then following may help you on same.
awk 'FNR==1{print;next} !a[$3]++{print $3}' Input_file
with the pipe line
$ sed 1d file | # remove header
tr -s ' ' '\t' | # normalize space delimiters to tabs
cut -f3 | # isolate the field
sort -nu # sort numerically and report unique entries
7
84
1117
[root#test ~]# cat test
ID Title Promotion_ID Flag
12 Purse 7 Y
24 Wallet 7 Y
709 iPhone 1117 Y
74 Satchel 7 Y
283 Xbox 84 N
Output -:
[root#test ~]# awk -F" " '!s[$3]++' test
ID Title Promotion_ID Flag
12 Purse 7 Y
709 iPhone 1117 Y
283 Xbox 84 N
[root#test ~]#
mawk '!__[$!NF=$--NF]--^(!_<NR)'
or
gawk' !__[$!--NF=$NF]--^(!_<NR)'
or perhaps
gawk '!__[$!--NF=$NF]++^(NF<NR)'
or even
mawk '!__[$!--NF=$NF]++^(NR-!_)' # mawk-only
gawk '!__[$!--NF=$--NF]--^(NR-NF)' # gawk-equiv of similar idea
7
1117
84

Splitting a column vertically using AWK

If i have +2, i want this to be + 2 as separate columns. I am doing this for a large column so I cannot do it manually.
Edit #1
cat maser_neg_test.txt | awk '{print NR, $0}' | awk '{print $1, $2, ((15 * $3)
+ ((1/4) * $4) + ((1/240) * $5)), (($6)+ ($7/60) + ($8/3600) ,$9}' | awk
'{printf "%s %-15s %-10s %-10s %-6s\n", $1, $2, $3, $4 , $5}' >
maser_neg_test2.txt
is my code, which transforms
RXSJ00001+0523 00 00 11.78 +05 23 17.4 11992 2016-02-12 51.3 3 10.9 10631 13365
KUG2358+330 00 00 58.10 +33 20 38.0 12921 2012-11-17 36.5 8 4.0 11461 14395
0001233+4733537 00 01 23.30 +47 33 53.7 5237 2010-11-02 39.5 10 3.6 3848 6639 3.5 6358 9196
NGC-7805 00 01 26.76 +31 26 01.4 4850 2006-01-05 43.8 5 6.0 3464 6248 5.6 5968 8799
into
1 RXSJ00001+0523 0.04908 5.38817 11992
2 KUG2358+330 0.24208 33.3439 12921
3 0001233+4733537 0.34708 47.5649 5237
4 NGC-7805 0.36150 31.4337 4850"
but my research advisor noted that in my conversion of
dec:
1*(hr) = degree_1
(1/60) * (min) = degree_2
(1/3600) * (sec) = degree_3
degree_1 + degree_2 + degree_3 = dec (degrees)
which is the data +05 23 17.4 as hr min sec, that just adding them when the sign is negative does not combine these right. So i'm trying to pull out the sign before doing my calculations and then re-apply it
Edit 2
Is an example of some of the negative cases; sorry this is my first post I wasn't really sure how to format it at first.
NGC-23 00 09 53.42 +25 55 25.5 4565 2005-12-18 44.2 30 2.5 3182 5961 2.3 5681 8506
UM207 00 10 06.63 -00 26 09.4 9648 2010-01-10 25.2 10 2.1 8218 11091 2.1 10802 13723
MARK937 00 10 09.99 -04 42 38.0 8846 2016-02-04 42.5 10 4.4 7512 10192
Mrk937 00 10 10.01 -04 42 37.9 8851 2003-11-01 60.4 24 4.1 7428 10286
NGC-26 00 10 25.86 +25 49 54.6 4589 2005-12-14 41.2 5 5.7 3205 5985 5.1 5705 8531
I think you are overcomplicating things a lot by using multiple layers of awk (and unnecessary cat), and thinking of how to "split columns vertically" rather than just solving the problem, which seems to be that for a negative sign you should subtract, rather than add, the minutes and seconds.
So, use intermediate variables and check for the sign ($5 ~ /^-/):
awk '{ deg = $6/60 + $7/3600; deg = ($5 ~ /^-/) ? $5 - deg : $5 + deg;
printf "%s %-15s %-10s %-10s %-6s\n",
NR, $1, ((15 * $2) + (1/4 * $3) + (1/240 * $4)), deg, $8
}' maser_neg_test.txt
(edit: As pointed out by the OP, the original test $5 < 0 would fail when that field was -0.)
Try something like this:
echo '+2' | awk -v FS="" '{print $1" "$2}'
Result:
+ 2
If you have a text file (test.txt) with information such as
+2
-3
+4
+5
and you need output like so:
+ 2
- 3
+ 4
+ 5
Try this:
awk -v FS="" '{print $1" "$2}' test.txt
As two commenters have mentioned, it would be good for you to add some example data and the output that you desire. The answer above is just one of the many ways you can format your data.
EDIT
In your particular example, you could just use sed instead of cat'ing the file like so:
sed 's_+__g' test.txt | awk '{print NR, $0}' | awk '{print $1, $2, 15*$3 + $4/4 + $5/240, $6 + $7/60 + $8/3600, $9}'
sed will replace + in your file with nothing and then send the output to awk. If you have - also, you can perhaps remove them by either using sed creatively or double-sed'ing like so:
sed 's_+__g' test.txt | sed 's_-__g' | awk '{print NR, $0}' | awk '{print $1, $2, 15*$3 + $4/4 + $5/240, $6 + $7/60 + $8/3600, $9}'
In the scenario above, you may end up removing + and - that are probably wanted in the first column (looks like same code).
You can split the field with the sign into an array. You can keep the first array element as the sign and the second array element as the value:
$ awk '{match($6,/([+-])(.*)/,m);print "m[1]=",m[1]," m[2]=",m[2];print m[1] m[2]+$7/60+$8/3600}' <<<"1 RXSJ00001+0523 00 00 11.78 -05 23 17.4"
#Output
m[1]= - m[2]= 05
-5.38817
Thus you can make all the calculations using m[2] instead of $6.
If you need to print the sign , you just need to print m[1] before m[2]
PS: By ommiting the coma in print and using space you force concatenation (see my example above)

Awk pass variable containing columns to be printed

I want to pass a variable to awk containing which columns to print from a file?
In this trivial case file.txt contains a single line
11 22 33 44 55
This is what I've tried:
awk -v a='2/4' -v b='$2/$4' '{print a"\n"$a"\n"b"\n"$b}' file.txt
output:
2/4
22
$2/$4
11 22 33 44 55
desired output:
0.5
Is there any way to do this type of "eval" of variable as a command?
Here is one method for dividing columns specified in variables:
$ awk -v num=2 -v denom=4 '{print $num/$denom}' file.txt
0.5
If you trust the person who creates the shell variable b, then here is a method that offers flexibility:
$ b='$2/$4'; awk "{print $b}" file.txt
0.5
$ b='$1*$2'; awk "{print $b}" file.txt
242
$ b='$2,$2/$4,$5'; awk "{print $b}" file.txt
22 0.5 55
The flexibility here is due to the fact that b can contain any awk code. This approach requires that you trust the creator of b.

How do I print a range of data in awk?

I am reviewing my access_logs with a statment like:
cat access_log | grep 16/Sep/2012:17 | awk '{print $12 $13 $14 $15 $16}' | sort | uniq -c | sort -n | tail -40
The purpose is to see the user agent of the anyone that has been hitting my server for the last hour sorted by number of hits. My server has unusual activity to I want stop any unwanted spiders/etc.
But the part: awk '{print $12 $13 $14 $15 $16}' would be much preferred as something like: awk '{print $12-through-end-of-line}' so that I could see the whole user agent which is a different length for each one.
Is there a way to do this with awk?
Not extremely elegant, but this works:
grep 16/Sep/2012:17 access_log | awk '{for (i=12;i<=NF;++i) printf "%s ",$i;print ""}'
It has the side effect of condensing multiple spaces between fields down to one, and putting an extra one at the end of the line, though, which probably isn't critical.
I've never found one; in situations like this, I use cut (assuming I don't need awk's flexible handling of field separation):
# Assuming tab-separated fields, cut's default
grep 16/Sep/2012:17 access_log | cut -f12- | sort | uniq -c | sort -n | tail -40
# For space-separated fields (single spaces, not arbitrary amounts of whitespace)
grep 16/Sep/2012:17 access_log | cut -d' ' -f12- | sort | uniq -c | sort -n | tail -40
(Clarification: I've never found a good way. I've used #twalberg's for-loop when necessary, but prefer using cut if possible.)
$ echo somefields:; cat somefields ; echo from-to.awk: ; \
cat from-to.awk ; echo ;awk -f from-to.awk somefields
somefields:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from-to.awk:
{ for (i=12; i<=NF; i++) { printf "%s ", $i }; print "" }
l m n o p q r s t u v w x y z
12 13 14 15 16 17 18 19 20 21
from man awk:
NF The number of fields in the current input record.
So you basically loop through fields (separated by spaces) from 12 to the last one.
why not
#!/bin/bash
awk "/$1/"'{for (i=12;i<=NF;i++) printf("%s ", $i) ;printf "\n"}' log | sort | uniq -c | sort -n | tail -40
in a script file.
Then you can call it like
myMonitor.sh 16/Sep/2012:17
Don't have a way to test this right. Appologies for any formatting/syntax errors.
Hopefully you get the idea.
IHTH
awk '/16/Sep/2012:17/{for(i=1;i<12;i++){$i="";}print}' access_log| sort | uniq -c | sort -n | tail -40