gawk: sum from a record to another - gawk

I have the following table:
login:numero:sobrenome:nome
INICIO
Alcala:1234:Thomas:Alcala
Baron:1235:Alexis:Baron
Bezier:1236:Pascal:Bezier
Boutier:1237:Damien:Boutier
Buard:1238:Jeremy:Buard
Fagour:1239:Dimitri:Fagour
Fagour:1240:Stephane:Fagour
Justice:1241:Jonathan:Justice
FIM
Numero de usuario = 15
I would like to return the sum from line Bezier to Buard.
I tried the following command:
gawk '/Bezier/{init=NR}/Buard/{fin=NR}NR>=$init{Sum1+=$2}NR>$fin{Sum2+=$2}END{Sum=Sum1-Sum2;print Sum}BEGIN{FS=":"}' arq_test_awq
But no way, Sum2 always begin with /Buard/ line. Even if I put "fin=NR+1", the result is the same. I can begin with /Fagour/ to solve the problem but I just can't understand why it doesn't work with this version.

check whether these are the records of interest
$ awk -F: '/Bezier/,/Buard/' file
Bezier:1236:Pascal:Bezier
Boutier:1237:Damien:Boutier
Buard:1238:Jeremy:Buard
sum up the second fields and print at the end
$ awk -F: '/Bezier/,/Buard/{sum+=$2} END{print sum}' file
3711

Related

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

AWK sum all values of a column keeping all floats

I'm using the following code to grep the lines that im interested in, keep only the last ones and sum over column nine:
grep -n -49 'FINAL BlaBla' output |tail -9 | awk 'BEGIN {SUM=0}; {SUM=SUM+$9}; END {printf "%.3f\n" SUM}.
However the sum over column 9 returns 0,000
the selected lines look as follows
84- C -3.42056726 +1 -0.82831327 +1 1.52743549 +1 0.5647
85- N -4.78612760 +1 -1.01185554 +1 1.58894854 +1 -0.5837
86- C -5.19047197 +1 -2.20130686 +1 2.06176295 +1 0.3890
87- N -4.42537785 +1 -3.22689397 +1 2.47304603 +1 -0.4775
88- C -3.03532546 +1 -2.98933854 +1 2.38795560 +1 0.3686
89- N -2.51737448 +1 -1.78267672 +1 1.92262528 +1 -0.5526
90- Cl -6.86455806 +1 -2.45050886 +1 2.15229544 +1 0.0934
91- N -2.24043582 +1 -3.93651444 +1 2.76082642 +1 0.0890
92- N -2.94053526 +1 0.36941710 +1 1.06455738 +1 -0.3274
I can't find out where the mistake is.
I also tried to sum over $1 and I correctly obtain 792,000 but when I sum over $3 I get 31,000 ...
what's wrong?
I think the problem lies in the missing comma in your printf expression:
$ awk 'BEGIN {SUM=0}; {SUM=SUM+$9}; END {printf "%.3f\n", SUM}' file
# ^
# comma!
-0.436
Note by the way that there is no need to set the variable to zero, since this is the default. So drop the BEGIN {} block and leave to just:
awk '{sum+=$9}; END {printf "%.3f\n", sum}' file
For the other fields:
$ awk '{sum+=$nvar}; END {printf "%.3f\n", sum}' nvar=1 file
792.000
$ awk '{sum+=$nvar}; END {printf "%.3f\n", sum}' nvar=3 file
-35.421
$ awk '{sum+=$nvar}; END {printf "%.3f\n", sum}' nvar=9 file
-0.436
Why wasn't it working?
From The GNU Awk user's guide 5.5.1 Introduction to the printf Statement:
A simple printf statement looks like this:
printf format, item1, item2, …
As for print, the entire list of arguments may optionally be enclosed
in parentheses
The difference between printf and print is the format argument. This
is an expression whose value is taken as a string; it specifies how to
output each of the other arguments. It is called the format string.
The format string is very similar to that in the ISO C library
function printf(). Most of format is text to output verbatim.
Scattered among this text are format specifiers—one per item. Each
format specifier says to output the next item in the argument list at
that place in the format.
So if you don't use commas to call the fields you get this error:
$ awk 'BEGIN {printf "%.3f" 3}'
awk: cmd. line:1: fatal: not enough arguments to satisfy format string
`%.3f3'
^ ran out for this one
Using them it works!
$ awk 'BEGIN {printf "%.3f", 3}'
3.000

How to do calculations over lines of a file in awk

I've got a file that looks like this:
88.3055
45.1482
37.7202
37.4035
53.777
What I have to do is isolate the value from the first line and divide it by the values of the other lines (it's a speedup calculation). I thought of maybe storing the first line in a variable (using NR) and then iterate over the other lines to obtain the values from the divisions. Desired output is:
1,9559
2,3410
2,3608
1,6420
UPDATE
Sorry Ed, my mistake, the desired decimal point is .
I made some small changes to Ed's answer so that awk prints the division of 88.3055 by itself and outputs it to a file speedup.dat:
awk 'NR==1{n=$0} {print n/$0}' tavg.dat > speedup.dat
Is it possible to combine the contents of speedup.dat and the results from another awk command without using intermediate files and in one single awk command?
First command:
awk 'BEGIN { FS = \"[ \\t]*=[ \\t]*\" } /Total processes/ { if (! CP) CP = $2 } END {print CP}' cg.B.".n.".log ".(n == 1 ? ">" : ">>")." processes.dat
This first command outputs:
1
2
4
8
16
Paste of the two files:
paste processes.dat speedup.dat > prsp.dat
which gives the now desired output:
1 1
2 1.9559
4 2.34107
8 2.36089
16 1.64207
$ awk 'NR==1{n=$0;next} {print n/$0}' file
1.9559
2.34107
2.36089
1.64207
$ awk 'NR==1{n=$0;next} {printf "%.4f\n", n/$0}' file
1.9559
2.3411
2.3609
1.6421
$ awk 'NR==1{n=$0;next} {printf "%.4f\n", int(n*10000/$0)/10000}' file
1.9559
2.3410
2.3608
1.6420
$ awk 'NR==1{n=$0;next} {x=sprintf("%.4f",int(n*10000/$0)/10000); sub(/\./,",",x); print x}' file
1,9559
2,3410
2,3608
1,6420
Normally you'd just use the correct locale to have . or , as your decimal point but your input uses . while your output uses , so I don't think that's an option.
awk '{if(n=="") n=$1; else print n/$1}' inputFile

awk: changing OFS without looping though variables

I'm working on an awk one-liner to substitute commas to tabs in a file ( and swap \\N for missing values in preparation for MySQL select into).
The following link http://www.unix.com/unix-for-dummies-questions-and-answers/211941-awk-output-field-separator.html (at the bottom) suggest the following approach to avoid looping through the variables:
echo a b c d | awk '{gsub(OFS,";")}1'
head -n1 flatfile.tab | awk -F $'\t' '{for(j=1;j<=NF;j++){gsub(" +","\\N",$j)}gsub(OFS,",")}1'
Clearly, the trailing 1 (can be a number, char) triggers the printing of the entire record. Could you please explain why this is working?
SO also has Print all Fields with AWK separated by OFS , but in that post it seems unclear why this is working.
Thanks.
Awk evaluates 1 or any number other than 0 as a true-statement. Since, true statements without the action statements part are equal to { print $0 }. It prints the line.
For example:
$ echo "hello" | awk '1'
hello
$ echo "hello" | awk '0'
$

awk scripting for simulation

hai friends good evening to all
i am the beginner to awk so i request you to please help me
i want to print total number of records(rows) in a text file.for that i use "print NR "command.when i use this command on BEGIN block it prints number of records instead of printing total.but when we use it in END block it returns total number of records.
for example i have a text file with 5 column and i tried like this
BEGIN {
print NR
} it returns
1
2
3
4
5
i want to print total number of records(rows) from BEGIN block itself,so please give me the answer.
NR is populated/incremented as the files are read, BEGIN is executed before any file is opened, so what you're specifically asking for can't be done.
A workaround is this:
awk 'BEGIN{ while ( (getline var < ARGV[1] > 0) ) nr++; print nr }' file
but on UNIX there are simpler ways if your records are newline-separated, e.g.:
awk -v nr="$(wc -l < file)" 'BEGIN{ print nr }' file
Also just in awk you could probably get the output you want using:
awk 'NR!=FNR && FNR==1 { print NR - FNR }' file file
You will need to pass a variable in order to access the number of records/lines in the BEGIN block, because the BEGIN block is executed before the file is processed:
awk -v count="$(wc -l < file.txt)" 'BEGIN { print count }' file.txt
BEGIN block will be executed before starting the file processing line by line.
so you cannot get the total records in BEGIN block.
it has to be done in some odd way like below:
awk 'BEGIN{"wc -l "FILENAME|getline result;print result}' your_file
here the above literally means that awk is doing nothing.its actually the shell which is doing most of the thing.