awk how to print the last value of a counter? - awk

I have a file with some lines starting with >
I want to count the number of such lines per file.
awk '{if(/>/){count += 1}{print count}}' file.text
1
1
2
2
Obviously here I just want the last "2". Basically I want awk to print the last value of count. It seems to me that should be easy to accomplish but can't find how.
I know there are solutions such as grep -c that would do the job but I am curious to have the awk version.
Thank you
EDIT: I have tried this
awk '{if(/>/){count += 1}END{print count}}' Scaffold_1_8558356-8558657.fa_transcripts.combined.filtered.fas
awk: cmd. line:1: {if(/>/){count += 1}END{print count}}
awk: cmd. line:1: ^ syntax error

With your tried code try following once.
awk '{if(/>/){count += 1}} END{print count+0}' file.text
OR you could shorten above to:
awk '/>/{count++} END{print count+0}' file.text

Beware, you want to
I have a file with some lines starting with > I want to count the
number of such lines per file.
but you are asking AWK to check
if(/>/)...
which will be true for > anywhere in line, for example if file.txt content is:
abc > def
ghi > jkl
mno > prs
then
awk '{if(/>/){print $0}}' file.txt
output
abc > def
ghi > jkl
mno > prs
You might limit to detecting only at start of line using ^ for example use '{if(/^>/){print $0}}' to print only lines which starts with >.
(tested in gawk 4.2.1)

Related

Filter logs with awk for last 100 lines

I can filter the last 500 lines using tail or grep
tail --line 500 my_log | grep "ERROR"
What is the equivalent command for using awk
How can I add no of lines in below command
awk '/ERROR/' my_log
awk don't know about end of file until it change of reading file but you can read twhice the file, first time to find the end, second to treat line that are in the scope. You could also keep the X last line in a buffer but it's a bit heavy in memory consuption and process. Notice that the file need to be mentionned twice at the end for this.
awk 'FNR==NR{L=NR-500;next};FNR>=L && /ERROR/{ print FNR":"$0}' my_log my_log
With explanaition
awk '# first reading
FNR==NR{
#last line is this minus 500
LL=NR-500
# go to next line (for this file)
next
}
# at second read (due to previous section filtering)
# if line number is after(included) LL AND error is on the line content, print it
FNR >= LL && /ERROR/ { print FNR ":" $0 }
' my_log my_log
on gnu sed
sed '$-500,$ {/ERROR/ p}' my_log
As you had no sample data to test with, I'll show with just numbers using seq 1 10. This one stores last n records and prints them out in the end:
$ seq 1 10 |
awk -v n=3 '{a[++c]=$0;delete a[c-n]}END{for(i=c-n+1;i<=c;i++)print a[i]}'
8
9
10
If you want to filter the data add for example /ERROR/ before {a[++c]=$0; ....
Explained:
awk -v n=3 '{ # set wanted amount of records
a[++c]=$0 # hash to a
delete a[c-n] # delete the ones outside of the window
}
END { # in the end
for(i=c-n+1;i<=c;i++) # in order
print a[i] # output records
}'
Could you please try following.
tac Input_file | awk 'FNR<=100 && /error/' | tac
In case you want to add number of lines in awk command then try following.
awk '/ERROR/{print FNR,$0}' Input_file

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

awk to output two files based on match or no match

In the below awk I am trying to print out the lines that match have the string FP or RFP $2 in the tab-delimited input. If a match is found in $2 then in result only the lines of file that do not have those keywords in them are printed. At the same time another file removed is printed that has those lines that did have those keywords in them. The awk has a syntax error in it when I try to print two files, if I only print one the awk runs. Thank you :).
input
12 aaa
123 FP bbb
11 ccc
10 RFP ddd
result
12 aaa
11 ccc
removed
123 FP bbb
10 RFP ddd
awk
awk -F'\t' 'BEGIN{d["FP"];d["RFP"]}!($2 in d) {print > "removed"}; else {print > "result"}' file
awk: cmd. line:1: BEGIN{d["FP"];d["RFP"]}!($2 in d) {print > "removed"}; else {print > "result"}
awk: cmd. line:1: ^ syntax error
else goes with if. Your script didn't have an if, just an else, hence the syntax error. All you need is:
awk -F'\t' '{print > ($2 ~ /^R?FP$/ ? "removed" : "result")}' file
or if you prefer the array approach you are trying to use:
awk -F'\t' '
BEGIN{ split("FP RFP",t,/ /); for (i in t) d[t[i]] }
{ print > ($2 in d ? "removed" : "result") }
' file
Read the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn awk syntax and semantics.
Btw when writing if/else code like you show in your question:
if ( !($2 in d) ) removed; else result
THINK about the fact you're using negative (!) logic which makes your code harder to understand right away AND opens you up to potential double negatives. Always try to express every condition in a positive way, in this case that'd be:
if ($2 in d) result; else removed

How to sum first 100 rows of a specific column using Awk?

How to sum first 100 rows of a specific column using Awk? I wrote
awk 'BEGIN{FS="|"} NR<=100 {x+=$5}END {print x}' temp.txt
But this is taking lot of time to process; is there any other way which gives result quickly?
Just exit after the required first 100 records:
awk -v iwant=100 '{x+=$5} NR==iwant{exit} END{print x+0}' test.in
Take it out for a spin:
$ for i in {1..1000}; do echo 1 >> test.in ; done # thousand of records
$ awk -v iwant=100 '{x+=$1} NR==iwant{exit} END{print x+0}' test.in
100
'{x+=$5} NR==iwant{exit} END{print x+0}'
you can always trim the input and use the same script
head -100 file | awk ... your script here ...

How to do calculations over lines of a file in awk

I've got a file that looks like this:
88.3055
45.1482
37.7202
37.4035
53.777
What I have to do is isolate the value from the first line and divide it by the values of the other lines (it's a speedup calculation). I thought of maybe storing the first line in a variable (using NR) and then iterate over the other lines to obtain the values from the divisions. Desired output is:
1,9559
2,3410
2,3608
1,6420
UPDATE
Sorry Ed, my mistake, the desired decimal point is .
I made some small changes to Ed's answer so that awk prints the division of 88.3055 by itself and outputs it to a file speedup.dat:
awk 'NR==1{n=$0} {print n/$0}' tavg.dat > speedup.dat
Is it possible to combine the contents of speedup.dat and the results from another awk command without using intermediate files and in one single awk command?
First command:
awk 'BEGIN { FS = \"[ \\t]*=[ \\t]*\" } /Total processes/ { if (! CP) CP = $2 } END {print CP}' cg.B.".n.".log ".(n == 1 ? ">" : ">>")." processes.dat
This first command outputs:
1
2
4
8
16
Paste of the two files:
paste processes.dat speedup.dat > prsp.dat
which gives the now desired output:
1 1
2 1.9559
4 2.34107
8 2.36089
16 1.64207
$ awk 'NR==1{n=$0;next} {print n/$0}' file
1.9559
2.34107
2.36089
1.64207
$ awk 'NR==1{n=$0;next} {printf "%.4f\n", n/$0}' file
1.9559
2.3411
2.3609
1.6421
$ awk 'NR==1{n=$0;next} {printf "%.4f\n", int(n*10000/$0)/10000}' file
1.9559
2.3410
2.3608
1.6420
$ awk 'NR==1{n=$0;next} {x=sprintf("%.4f",int(n*10000/$0)/10000); sub(/\./,",",x); print x}' file
1,9559
2,3410
2,3608
1,6420
Normally you'd just use the correct locale to have . or , as your decimal point but your input uses . while your output uses , so I don't think that's an option.
awk '{if(n=="") n=$1; else print n/$1}' inputFile