How to sum first 100 rows of a specific column using Awk? - awk

How to sum first 100 rows of a specific column using Awk? I wrote
awk 'BEGIN{FS="|"} NR<=100 {x+=$5}END {print x}' temp.txt
But this is taking lot of time to process; is there any other way which gives result quickly?

Just exit after the required first 100 records:
awk -v iwant=100 '{x+=$5} NR==iwant{exit} END{print x+0}' test.in
Take it out for a spin:
$ for i in {1..1000}; do echo 1 >> test.in ; done # thousand of records
$ awk -v iwant=100 '{x+=$1} NR==iwant{exit} END{print x+0}' test.in
100
'{x+=$5} NR==iwant{exit} END{print x+0}'

you can always trim the input and use the same script
head -100 file | awk ... your script here ...

Related

Filter logs with awk for last 100 lines

I can filter the last 500 lines using tail or grep
tail --line 500 my_log | grep "ERROR"
What is the equivalent command for using awk
How can I add no of lines in below command
awk '/ERROR/' my_log
awk don't know about end of file until it change of reading file but you can read twhice the file, first time to find the end, second to treat line that are in the scope. You could also keep the X last line in a buffer but it's a bit heavy in memory consuption and process. Notice that the file need to be mentionned twice at the end for this.
awk 'FNR==NR{L=NR-500;next};FNR>=L && /ERROR/{ print FNR":"$0}' my_log my_log
With explanaition
awk '# first reading
FNR==NR{
#last line is this minus 500
LL=NR-500
# go to next line (for this file)
next
}
# at second read (due to previous section filtering)
# if line number is after(included) LL AND error is on the line content, print it
FNR >= LL && /ERROR/ { print FNR ":" $0 }
' my_log my_log
on gnu sed
sed '$-500,$ {/ERROR/ p}' my_log
As you had no sample data to test with, I'll show with just numbers using seq 1 10. This one stores last n records and prints them out in the end:
$ seq 1 10 |
awk -v n=3 '{a[++c]=$0;delete a[c-n]}END{for(i=c-n+1;i<=c;i++)print a[i]}'
8
9
10
If you want to filter the data add for example /ERROR/ before {a[++c]=$0; ....
Explained:
awk -v n=3 '{ # set wanted amount of records
a[++c]=$0 # hash to a
delete a[c-n] # delete the ones outside of the window
}
END { # in the end
for(i=c-n+1;i<=c;i++) # in order
print a[i] # output records
}'
Could you please try following.
tac Input_file | awk 'FNR<=100 && /error/' | tac
In case you want to add number of lines in awk command then try following.
awk '/ERROR/{print FNR,$0}' Input_file

summarizing the contents of a text file to an other one using awk

I have a big text file with 2 tab separated fields. as you see in the small example every 2 lines have a number in common. I want to summarize my text file in this way.
1- look for the lines that have the number in common and sum up the second column of those lines.
small example:
ENST00000054666.6 2
ENST00000054666.6_2 15
ENST00000054668.5 4
ENST00000054668.5_2 10
ENST00000054950.3 0
ENST00000054950.3_2 4
expected output:
ENST00000054666.6 17
ENST00000054668.5 14
ENST00000054950.3 4
as you see the difference is in both columns. in the 1st column there is only one repeat of each common and without "_2" and in the 2nd column the values is sum up of both lines (which have common number in input file).
I tried this code but does not return what I want:
awk -F '\t' '{ col2 = $2, $2=col2; print }' OFS='\t' input.txt > output.txt
do you know how to fix it?
Solution 1st: Following awk may help you on same.
awk '{sub(/_.*/,"",$1)} {a[$1]+=$NF} END{for(i in a){print i,a[i]}}' Input_file
Solution 2nd: In case your Input_file is sorted by 1st field then following may help you.
awk '{sub(/_.*/,"",$1)} prev!=$1 && prev{print prev,val;val=""} {val+=$NF;prev=$1} END{if(val){print prev,val}}' Input_file
Use > output.txt at the end of the above codes in case you need the output in a output file too.
If order is not a concern, below may also help :
awk -v FS="\t|_" '{count[$1]+=$NF}
END{for(i in count){printf "%s\t%s%s",i,count[i],ORS;}}' file
ENST00000054668.5 14
ENST00000054950.3 4
ENST00000054666.6 17
Edit :
If the order of the output does matter, below approach using a flag helps :
$ awk -v FS="\t|_" '{count[$1]+=$NF;++i;
if(i==2){printf "%s\t%s%s",$1,count[$1],ORS;i=0}}' file
ENST00000054666.6 17
ENST00000054668.5 14
ENST00000054950.3 4

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

How do I pass a variable into AWK FNR?

#!/bin/bash
export num=50
echo $num
awk -v awk_num=$num 'FNR==2, FNR==$awknum {print $1;}' big_report > short_report
I have a big_report file. The desired output is to print rows 2 to 50 in column 1 of big_report into short_report. However, when I run above the result in short_report, includes all lines in column 1 instead of the specified rows 2-50.
I would really appreciate it if anyone could help! Thanks!!!
Like this:
awk -v awk_num=$num 'FNR==2, FNR==awk_num {print $1}' big_report > short_report

How to print last two columns using awk

All I want is the last two columns printed.
You can make use of variable NF which is set to the total number of fields in the input record:
awk '{print $(NF-1),"\t",$NF}' file
this assumes that you have at least 2 fields.
awk '{print $NF-1, $NF}' inputfile
Note: this works only if at least two columns exist. On records with one column you will get a spurious "-1 column1"
#jim mcnamara: try using parentheses for around NF, i. e. $(NF-1) and $(NF) instead of $NF-1 and $NF (works on Mac OS X 10.6.8 for FreeBSD awkand gawk).
echo '
1 2
2 3
one
one two three
' | gawk '{if (NF >= 2) print $(NF-1), $(NF);}'
# output:
# 1 2
# 2 3
# two three
using gawk exhibits the problem:
gawk '{ print $NF-1, $NF}' filename
1 2
2 3
-1 one
-1 three
# cat filename
1 2
2 3
one
one two three
I just put gawk on Solaris 10 M4000:
So, gawk is the cuplrit on the $NF-1 vs. $(NF-1) issue. Next question what does POSIX say?
per:
http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
There is no direction one way or the other. Not good. gawk implies subtraction, other awks imply field number or subtraction. hmm.
Please try this out to take into account all possible scenarios:
awk '{print $(NF-1)"\t"$NF}' file
or
awk 'BEGIN{OFS="\t"}' file
or
awk '{print $(NF-1), $NF} {print $(NF-1), $NF}' file
try with this
$ cat /tmp/topfs.txt
/dev/sda2 xfs 32G 10G 22G 32% /
awk print last column
$ cat /tmp/topfs.txt | awk '{print $NF}'
awk print before last column
$ cat /tmp/topfs.txt | awk '{print $(NF-1)}'
32%
awk - print last two columns
$ cat /tmp/topfs.txt | awk '{print $(NF-1), $NF}'
32% /