How to print last two columns using awk - awk

All I want is the last two columns printed.

You can make use of variable NF which is set to the total number of fields in the input record:
awk '{print $(NF-1),"\t",$NF}' file
this assumes that you have at least 2 fields.

awk '{print $NF-1, $NF}' inputfile
Note: this works only if at least two columns exist. On records with one column you will get a spurious "-1 column1"

#jim mcnamara: try using parentheses for around NF, i. e. $(NF-1) and $(NF) instead of $NF-1 and $NF (works on Mac OS X 10.6.8 for FreeBSD awkand gawk).
echo '
1 2
2 3
one
one two three
' | gawk '{if (NF >= 2) print $(NF-1), $(NF);}'
# output:
# 1 2
# 2 3
# two three

using gawk exhibits the problem:
gawk '{ print $NF-1, $NF}' filename
1 2
2 3
-1 one
-1 three
# cat filename
1 2
2 3
one
one two three
I just put gawk on Solaris 10 M4000:
So, gawk is the cuplrit on the $NF-1 vs. $(NF-1) issue. Next question what does POSIX say?
per:
http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
There is no direction one way or the other. Not good. gawk implies subtraction, other awks imply field number or subtraction. hmm.

Please try this out to take into account all possible scenarios:
awk '{print $(NF-1)"\t"$NF}' file
or
awk 'BEGIN{OFS="\t"}' file
or
awk '{print $(NF-1), $NF} {print $(NF-1), $NF}' file

try with this
$ cat /tmp/topfs.txt
/dev/sda2 xfs 32G 10G 22G 32% /
awk print last column
$ cat /tmp/topfs.txt | awk '{print $NF}'
awk print before last column
$ cat /tmp/topfs.txt | awk '{print $(NF-1)}'
32%
awk - print last two columns
$ cat /tmp/topfs.txt | awk '{print $(NF-1), $NF}'
32% /

Related

Counting the number of unique values based on two columns in bash

I have a tab-separated file looking like this:
A 1234
A 123245
A 4546
A 1234
B 24234
B 4545
C 1234
C 1234
Output:
A 3
B 2
C 1
Basically I need counts of unique values that belong to the first column, all in one commando with pipelines. As you may see, there can be some duplicates like "A 1234". I had some ideas with awk or cut, but neither of the seem to work. They just print out all unique pairs, while I need count of unique values from the second column considering the value in the first one.
awk -F " "'{print $1}' file.tsv | uniq -c
cut -d' ' -f1,2 file.tsv | sort | uniq -ci
I'd really appreciate your help! Thank you in advance.
With complete awk solution could you please try following.
awk 'BEGIN{FS=OFS="\t"} !found[$0]++{val[$1]++} END{for(i in val){print i,val[i]}}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{
FS=OFS="\t"
}
!found[$0]++{ ##Checking condition if 1st and 2nd column is NOT present in found array then do following.
val[$1]++ ##Creating val with 1st column inex and keep increasing its value here.
}
END{ ##Starting END block of this progra from here.
for(i in val){ ##Traversing through array val here.
print i,val[i] ##Printing i and value of val with index i here.
}
}
' Input_file ##Mentioning Input_file name here.
Using GNU awk:
$ gawk -F\\t '{a[$1][$2]}END{for(i in a)print i,length(a[i])}' file
Output:
A 3
B 2
C 1
Explained:
$ gawk -F\\t '{ # using GNU awk and tab as delimiter
a[$1][$2] # hash to 2D array
}
END {
for(i in a) # for all values in first field
print i,length(a[i]) # output value and the size of related array
}' file
$ sort -u file | cut -f1 | uniq -c
3 A
2 B
1 C
Another way, using the handy GNU datamash utility:
$ datamash -g1 countunique 2 < input.txt
A 3
B 2
C 1
Requires the input file to be sorted on the first column, like your sample. If real file isn't, add -s to the options.
You could try this:
cat file.tsv | sort | uniq | awk '{print $1}' | uniq -c | awk '{print $2 " " $1}'
It works for your example. (But I'm not sure if it works for other cases. Let me know if it doesn't work!)

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

using awk to match and sum a file of multiple lines

I am trying to combine matching lines in file.txt $1 and then display the sum of `$2 for those matches. Thank you :).
File.txt
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:003
ENSMUSG00000000002:003
ENSMUSG00000000003:002
Desired output
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
awk -F':' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file > output.txt
$ awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
{x=$1;a[x]=a[x] + $2} END{for(x in a)print x,a[x]}
Just a typo I guess: instead of adding $0 add $2. That gives me the expected output. And the $1="" is not necessary. To make sure that there isn't anything funny with $2 you may consider 1.0*$2.

awk command to change field seperator from tilde to tab

I want to replace the delimter tilde into tab space in awk command, I have mentioned below how I would have expect.
input
~1~2~3~
Output
1 2 3
this wont work for me
awk -F"~" '{ OFS ="\t"; print }' inputfile
It's really a job for tr:
tr '~' '\t'
but in awk you just need to force the record to be recompiled by assigning one of the fields to its own value:
awk -F'~' -v OFS='\t' '{$1=$1}1'
awk NF=NF FS='~' OFS='\t'
Result
1 2 3
Code for sed:
$echo ~1~2~3~|sed 'y/~/\t/'
1 2 3

awk and log2 divisions

I have a tab delimited file that looks something like this:
foo 0 4
boo 3 2
blah 4 0
flah 1 1
I am trying to calculate log2 for between the two columns for each row. my problem is with the division by zero
What I have tried is this:
cat file.txt | awk -v OFS='\t' '{print $1, log($3/$2)log(2)}'
when there is a zero as the denominator, the awk will crash. What I would want to do is some sort of conditional statement that would print an "inf" as the result when the denominator is equal to 0.
I am really not sure how to go about this?
Any help would be appreciated
Thanks
You can implement that as follows (with a few additional tweaks):
awk 'BEGIN{OFS="\t"} {if ($2==0) {print $1, "inf"} else {print $1, log($3/$2)log(2)}} file.txt
Explanation:
if ($2==0) {print $1, "inf"} else {...} - First check to see if the 2nd field ($2) is zero. If so, print $1 and inf and move on to the next line; otherwise proceed as usual.
BEGIN{OFS="\t"} - Set OFS inside the awk script; mostly a preference thing.
... file.txt - awk can read from files when you specify it as an argument; this saves the use of a cat process. (See UUCA)
awk -F'\t' '{print $1,($2 ? log($3/$2)log(2) : "inf")}' file.txt