awk scripting for simulation - awk

hai friends good evening to all
i am the beginner to awk so i request you to please help me
i want to print total number of records(rows) in a text file.for that i use "print NR "command.when i use this command on BEGIN block it prints number of records instead of printing total.but when we use it in END block it returns total number of records.
for example i have a text file with 5 column and i tried like this
BEGIN {
print NR
} it returns
1
2
3
4
5
i want to print total number of records(rows) from BEGIN block itself,so please give me the answer.

NR is populated/incremented as the files are read, BEGIN is executed before any file is opened, so what you're specifically asking for can't be done.
A workaround is this:
awk 'BEGIN{ while ( (getline var < ARGV[1] > 0) ) nr++; print nr }' file
but on UNIX there are simpler ways if your records are newline-separated, e.g.:
awk -v nr="$(wc -l < file)" 'BEGIN{ print nr }' file
Also just in awk you could probably get the output you want using:
awk 'NR!=FNR && FNR==1 { print NR - FNR }' file file

You will need to pass a variable in order to access the number of records/lines in the BEGIN block, because the BEGIN block is executed before the file is processed:
awk -v count="$(wc -l < file.txt)" 'BEGIN { print count }' file.txt

BEGIN block will be executed before starting the file processing line by line.
so you cannot get the total records in BEGIN block.
it has to be done in some odd way like below:
awk 'BEGIN{"wc -l "FILENAME|getline result;print result}' your_file
here the above literally means that awk is doing nothing.its actually the shell which is doing most of the thing.

Related

How to use awk script to generate a file

I have a very large compressed file(dataFile.gz) that I want to generate another file using cat and awk. So using cat to view the contents and then piping it to awk to generate the new file.
The contents of compressed as like below
Time,SequenceNumber,MsgType,MsgLength,CityOrign,RTime
7:20:13,1,A,34,Tokyo,0
7:20:13,2,C,35,Nairobi,7:20:14
7:20:14,3,E,30,Berlin,7:20:15
7:20:16,4,A,34,Berlin,7:20:17
7:20:17,5,C,35,Denver,0
7:20:17,6,D,33,Helsinki,7:20:18
7:20:18,7,F,37,Tokyo,0
….
….
….
For the new file, I want to generate, I only want the Time, MsgType and RTime. Meaning columns 0,2 and 5. And for column 5, if the value is 0, replace it with the value at column 0. i.e replace RTime with Time
Time,MsgType,RTime
7:20:13,A,7:20:13
7:20:13,C,7:20:14
7:20:14,E,7:20:15
7:20:16,A,7:20:17
7:20:17,C,7:20:17
7:20:17,D,7:20:18
7:20:18,F,7:20:18
This is my script so far:
#!/usr/bin/awk -f
BEGIN {FS=","
print %0,%2,
if ($5 == "0") {
print $0
} else {
print $5
}
}
My question is, will this script work and how do I call it. Can I call it on the terminal like below?
zcat dataFile.gz | <awk script> > generatedFile.csv
awk index starts with 1 and $0 represents full record. So column numbers would be 1, 3, 6.
You may use this awk:
awk 'BEGIN{FS=OFS=","} !$6{$6=$1} {print $1, $3, $6}' file
Time,MsgType,RTime
7:20:13,A,7:20:13
7:20:13,C,7:20:14
7:20:14,E,7:20:15
7:20:16,A,7:20:17
7:20:17,C,7:20:17
7:20:17,D,7:20:18
7:20:18,F,7:20:18
Could you please try following. A bit shorter version of #anubhava sir's solution. This one is NOT having assignment to 6th field it only checks if that is zero or not and accordingly it prints the values.
awk 'BEGIN{FS=OFS=","} {print $1, $3, $6==0?$1:$6}' Input_file

Filter logs with awk for last 100 lines

I can filter the last 500 lines using tail or grep
tail --line 500 my_log | grep "ERROR"
What is the equivalent command for using awk
How can I add no of lines in below command
awk '/ERROR/' my_log
awk don't know about end of file until it change of reading file but you can read twhice the file, first time to find the end, second to treat line that are in the scope. You could also keep the X last line in a buffer but it's a bit heavy in memory consuption and process. Notice that the file need to be mentionned twice at the end for this.
awk 'FNR==NR{L=NR-500;next};FNR>=L && /ERROR/{ print FNR":"$0}' my_log my_log
With explanaition
awk '# first reading
FNR==NR{
#last line is this minus 500
LL=NR-500
# go to next line (for this file)
next
}
# at second read (due to previous section filtering)
# if line number is after(included) LL AND error is on the line content, print it
FNR >= LL && /ERROR/ { print FNR ":" $0 }
' my_log my_log
on gnu sed
sed '$-500,$ {/ERROR/ p}' my_log
As you had no sample data to test with, I'll show with just numbers using seq 1 10. This one stores last n records and prints them out in the end:
$ seq 1 10 |
awk -v n=3 '{a[++c]=$0;delete a[c-n]}END{for(i=c-n+1;i<=c;i++)print a[i]}'
8
9
10
If you want to filter the data add for example /ERROR/ before {a[++c]=$0; ....
Explained:
awk -v n=3 '{ # set wanted amount of records
a[++c]=$0 # hash to a
delete a[c-n] # delete the ones outside of the window
}
END { # in the end
for(i=c-n+1;i<=c;i++) # in order
print a[i] # output records
}'
Could you please try following.
tac Input_file | awk 'FNR<=100 && /error/' | tac
In case you want to add number of lines in awk command then try following.
awk '/ERROR/{print FNR,$0}' Input_file

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

awk sum addition math on fields & colums $ - strange results

Please run this script. I wonder why I can not assing $1, $2 and $3 in BEGIN in order to calculate and print them:
BEGIN {
OFS=FS=";" ;
# #include getopt.awk
geb = 4;
dis = 3;
$1 = 10;
$2 = 0.19;
$3 = 20;
summe = geb+dis+$1;
colsum = $1+$2+$3
}
{
print $1 FS $2 FS $3 FS "Fee" " "summe FS $1+$3 FS 3+4+$1 FS colsum}
For example I hoped that
print $1+$3
gives me 30 ?!
Can't I assign new values to fields?
The BEGIN block happens before awk begins processing the file, so it doesn't make sense to assign to the individual fields, as they will be overwritten once the first record is read.
If you wish to perform calculations on the records that awk reads, this should be done in a normal block, like the one you are using to print.
BEGIN blcoks are executed before any input is read (unless use getline), so consequently none of the variables that refer to input, like NR,FNR,NF, fields like $0, $1, $2 ...$10 will be defined in any of the BEGIN blocks.
In fact BEGIN block is for actions to execute before you read the first line
(unless use getline) - Not Recommended
akshay#db-3325:/tmp$ seq 1 5 >test
akshay#db-3325:/tmp$ cat test
1
2
3
4
5
# Default action by awk
akshay#db-3325:/tmp$ awk 'BEGIN{print $1}' test
# if you use getline
akshay#db-3325:/tmp$ awk 'BEGIN{getline;print $1}' test
1

How to do calculations over lines of a file in awk

I've got a file that looks like this:
88.3055
45.1482
37.7202
37.4035
53.777
What I have to do is isolate the value from the first line and divide it by the values of the other lines (it's a speedup calculation). I thought of maybe storing the first line in a variable (using NR) and then iterate over the other lines to obtain the values from the divisions. Desired output is:
1,9559
2,3410
2,3608
1,6420
UPDATE
Sorry Ed, my mistake, the desired decimal point is .
I made some small changes to Ed's answer so that awk prints the division of 88.3055 by itself and outputs it to a file speedup.dat:
awk 'NR==1{n=$0} {print n/$0}' tavg.dat > speedup.dat
Is it possible to combine the contents of speedup.dat and the results from another awk command without using intermediate files and in one single awk command?
First command:
awk 'BEGIN { FS = \"[ \\t]*=[ \\t]*\" } /Total processes/ { if (! CP) CP = $2 } END {print CP}' cg.B.".n.".log ".(n == 1 ? ">" : ">>")." processes.dat
This first command outputs:
1
2
4
8
16
Paste of the two files:
paste processes.dat speedup.dat > prsp.dat
which gives the now desired output:
1 1
2 1.9559
4 2.34107
8 2.36089
16 1.64207
$ awk 'NR==1{n=$0;next} {print n/$0}' file
1.9559
2.34107
2.36089
1.64207
$ awk 'NR==1{n=$0;next} {printf "%.4f\n", n/$0}' file
1.9559
2.3411
2.3609
1.6421
$ awk 'NR==1{n=$0;next} {printf "%.4f\n", int(n*10000/$0)/10000}' file
1.9559
2.3410
2.3608
1.6420
$ awk 'NR==1{n=$0;next} {x=sprintf("%.4f",int(n*10000/$0)/10000); sub(/\./,",",x); print x}' file
1,9559
2,3410
2,3608
1,6420
Normally you'd just use the correct locale to have . or , as your decimal point but your input uses . while your output uses , so I don't think that's an option.
awk '{if(n=="") n=$1; else print n/$1}' inputFile