Awk Script to process data from a trace file

Awk Script to process data from a trace file - awk

I have a table (.tr file) with different rows (events).
**Event** **Time** **PacketLength** PacketId
sent 1 100 1
dropped 2 100 1
sent 3 100 2
sent 4.5 100 3
dropped 5 100 2
sent 6 100 4
sent 7 100 5
sent 8 100 6
sent 10 100 7
And I would like to create a new table as the following and I don't know how to it in AWK.
**SentTime** **PacketLength Dropped**
1 100 Yes
3 100 Yes
4.5 100
6 100
7 100
8 100
10 100
I have a simple code to find dropped or sent packets, time and id but I do not know how to create a column in my table with the results for dropped packets.
BEGIN{}
{
Event = $1;
Time = $2;
Packet = $6;
Node = $10;
id = $11;
if (Event=="s" && Node=="1.0.1.2"){
printf ("%f\t %d\n", $2, $6);
}
}
END {}

You have to save all the information in an array to postprocess it at the end of the file. Obviously, if the file is huge, this could cause memory problems.
BEGIN {
template="#sentTime\t#packetLength\t#dropped";
}
{
print $0;
event = $1;
time = $2;
packet_length = $3;
packet_id = $4;
# save all the info in an array
packet_info[packet_id] = packet_info[packet_id] "#" packet_length "#" time "#" event;
}
END {
# traverse the information of the array
for( time in packet_info )
{
print "the time is: " time " = " packet_info[time];
# for every element in the array (= packet),
# the data has this format "#100#1#sent#100#2#dropped"
split( packet_info[time], info, "#" );
# info[2] <-- 100
# info[3] <-- 1
# info[4] <-- sent
# info[5] <-- 100
# info[6] <-- 2
# info[7] <-- dropped
line = template;
line = gensub( "#sentTime", info[3], "g", line );
line = gensub( "#packetLength", info[2], "g", line );
if( info[4] == "dropped" )
line = gensub( "#dropped", "yes", "g", line );
if( info[7] == "dropped" )
line = gensub( "#dropped", "yes", "g", line );
line = gensub( "#dropped", "", "g", line );
print line;
} # for
}

I would say...
awk '/sent/{pack[$4]=$2; len[$4]=$3}
/dropped/{drop[$4]}
END {print "Sent time", "PacketLength", "Dropped";
for (p in pack)
print pack[p], len[p], ((p in drop)?"yes":"")
}' file
This stores the packages in pack[], the lengths in len[] and the dropped in drop[], so that they are fetched later on.
Test
$ awk '/sent/{pack[$4]=$2; len[$4]=$3} /dropped/{drop[$4]} END {print "Sent time", "PacketLength", "Dropped"; for (p in pack) print pack[p], len[p], ((p in drop)?"yes":"")}' a
Sent time PacketLength Dropped
1 100 yes
3 100 yes
4.5 100
6 100
7 100
8 100
10 100

Related

Replacing sequences of space-delimited numbers in huge input with awk

How can I replace sequences of space-delimited numbers when those sequences span multiple lines and the input is too big to fit in RAM.
A sample input would be:
edit: I re-worked the sample input and input parameters for introducing border cases (excluding ones that have to do with the length of the matched sequence or replacement priorities)
3 12 3 4
0 6 7 10
8 9 12 3
4 6 7 8
10 6 6 7
9 199 10 11
11
note: the number of fields per line is homogeneous but not known in advance; the last line might contain less fields
From that input I would like to:
replace 3 4 with &
replace 6 7 8 with 9 9
replace 6 7 9 with 8 8
replace 7 10 with 11 12
replace 0 with nothing
replace 10 with 13 10
replace 8 9 12 3 5 with #
The expected output would have one number or replacement per line:
3
12
&
6
11 12
8
9
12
&
9 9
13 10
6
8 8
199
13 10
11
11
I'm trying to do the task with awk but I'm having a hard time implementing a dynamic state machine with a pseudo B-Tree:
tr -s '[:space:]' '\n' < input.txt |
awk '
BEGIN {
for (i = 2; i < ARGC; i += 2) {
n = split(ARGV[i], arr)
k = ""
for (j = 1; j <= n; j++) {
k = j SUBSEP k SUBSEP arr[j]
Tree[k]
}
Tree[k] = "$" ARGV[i+1] #=> now can test "if (Tree[k])"
delete ARGV[i]
delete ARGV[i+1]
}
}
{
Key = (int(Key) + 1) SUBSEP Key SUBSEP $1
if ( Key in Tree ) {
if (Tree[Key]) {
print substr(Tree[Key],2)
Buffer = ""
Key = ""
}
else
Buffer = Buffer $1 "\n"
} else {
print Buffer $1
Buffer = ""
Key = ""
}
}
END { if (Buffer != "") printf ("%s", Buffer) }
' - \
'3 4' '&' \
'6 7 8' '9 9' \
'6 7 9' '8 8' \
'7 10' '11 12' \
'0' '' \
'10' '13 10' \
'8 9 12 3 5' '#'
edit: I realised that the code doesn't backtrack after failing to find a complete match in the B-tree, so it's wrong...
How I'm planning to tackle the problem
I'm emulating a B-tree with an array and keys in the following format:
from the middle to the left of the key are the consecutive depths
from the middle to the right of the key are the consecutive values
When a key exists in Tree:
if it doesn't have an associated value then it's a node
if there's a value then it's a leaf
So, for the current input parameters, the content of the Tree array will be:
# from param: "3 4" => "&"
Tree[ 1,"",3 ]
Tree[2,1,"",3,4] = "$&"
# from param: "6 7 8" => "9 9"
Tree[ 1,"",6 ]
Tree[ 2,1,"",6,7 ]
Tree[3,2,1,"",6,7,8] = "$9 9"
# from param: "6 7 9" => "8 8"
Tree[ 1,"",6 ]
Tree[ 2,1,"",6,7 ]
Tree[3,2,1,"",6,7,9] = "$8 8"
# from param: "7 10" => "11 12"
Tree[ 1,"",7 ]
Tree[2,1,"",7,10] = "$11 12"
# from param: "0" => ""
Tree[1,"",0] = "$"
# from param: "10" => "13 10"
Tree[1,"",10] = "$13 10"
# from param: "8 9 12 3 5" => "#"
Tree[ 1,"",8 ]
Tree[ 2,1,"",8,9 ]
Tree[ 3,2,1,"",8,9,12 ]
Tree[ 4,3,2,1,"",8,9,12,3 ]
Tree[5,4,3,2,1,"",8,9,12,3,5] = "$#"

FWIW I'd approach this by figuring out the max number of records that you might need to search in based on the mappings you want, keep a rolling buffer of that number of records, and then do the comparison part on each buffer, e.g.:
$ cat tst.awk
BEGIN {
RS = "[[:space:]]+"
map("3,4" , "&")
map("6,7,8" , "9")
map("9" , "")
map("0" , "\\000")
map("13,10" , "10")
}
{ buf[((NR-1) % maxRecs) + 1] = $0 }
NR >= maxRecs { prt() }
END { prt() }
function prt( nr,sep,str) {
for ( nr=NR-maxRecs+1; nr<=NR; nr++ ) {
str = str sep buf[((nr-1) % maxRecs) + 1]
sep = ORS
}
print ">>>>" ORS str ORS "<<<<"
# Replace the above with something that loops through the
# strings you want replaced, e.g.
#
# for ( mapNr=1; mapNr<=numMaps; mapNr++ ) {
# old = olds[mapNr]
# if ( str ~ old ) { # add something to avoid partial matches
# new = news[mapNr]
# replace old with new in the output
# }
# }
}
function map(old,new, numRecs) {
++numMaps
numRecs = gsub(/,/,ORS,old) + 1
maxRecs = ( numRecs > maxRecs ? numRecs : maxRecs )
olds[numMaps] = old
news[numMaps] = new
}
$ awk -f tst.awk file
>>>>
112
3
4
<<<<
>>>>
3
4
6
<<<<
>>>>
4
6
7
<<<<
>>>>
6
7
8
<<<<
>>>>
7
8
9
<<<<
>>>>
8
9
12
<<<<
>>>>
9
12
0
<<<<
>>>>
12
0
3
<<<<
>>>>
0
3
4
<<<<
>>>>
3
4
15
<<<<
>>>>
4
15
255
<<<<
>>>>
15
255
13
<<<<
>>>>
255
13
10
<<<<
>>>>
13
10
6
<<<<
>>>>
10
6
7
<<<<
>>>>
6
7
8
<<<<
>>>>
7
8
199
<<<<
>>>>
8
199
9
<<<<
>>>>
199
9
0
<<<<
>>>>
9
0
13
<<<<
>>>>
9
0
13
<<<<
The above is just printing the buff-sized strings, the part to be added is replacing the target strings with the new ones in a way that the next target doesn't match the replaced part which is a common problem with, I expect, lots of solutions online so it's left as an exercise.
You'll also need to tweak it to make sure it doesn't revisit lines at the end of the input.
The above uses GNU awk for multi-char RS, if you don't have GNU awk then just pipe the input from tr -s '[:space:]' '\n' as shown in the question.

UPDATE:
previous answer (see edit revisions) was woefully slow (several minutes) when run against a ramped up input (7K mappings in map.txt; 25M tokens
in input.txt1)
new answer (below) is a complete rewrite and processes the 7K-mappings/25M-tokens in ~45 seconds
The main component of this design centers around a tree-like node structure used to manage the series of tokens (lines of input from map.txt):
tree [ParentNodeNbr] [token] [NodeType] = value
Where:
ParentNodeNbr == 0 for the root
token from map.txt
NodeType has one of two values 'node' or 'leaf'
for NodeType = 'node' the value stored in the array is a numeric node number (implemented as an counter that's incremented each time a new node is added to the tree); this node number becomes the ParentNodeNbr for the next token in the series
for NodeType = 'leaf' this designates the 'end' of a series of tokens (line of input from map.txt) and the value stored in the array is the line number (aka FNR) from map.txt; this line number (FNR) is used as an index into a couple other arrays and to determine precendence when an input sequence (from input.txt) has multiple matches from map.txt
when processing a series of tokens from a map.txt line of input we start at ParentNodeNbr == 0 looking for a series of matching nodes, adding new nodes as needed
Setup: storing replacements in a comma-delimited file (map.txt), and adding one additional line to input.txt:
$ head map.txt input.txt
==> map.txt <==
2 3 4,X # "2 3 4" has precendence over ...
2 3,Y # "2 3"
3 4,&
6 7 8,9
9,
0,\000
13 10,10
==> input.txt <==
2 3 4 # keep eye on "2 3" vs "2 3 4" precendence
112 3
4 6 7
8 9 12 0 3
4 15 255 13
10 6
7 8 199 9
0 13
NOTE: here's what tree[][][] looks like when populated from map.txt:
tree [Parent] [Token] [NodeType] = NodeVal
Parent Token NodeType NodeVal MapTo ** MapTo only applies to NodeType = leaf
====== ===== ======== ======= =====
0 0 leaf 6 "\000"
0 2 node 1
0 3 node 3
0 6 node 4
0 9 leaf 5 ""
0 13 node 6
1 3 node 2
1 3 leaf 1 "Y"
2 4 leaf 2 "X"
3 4 leaf 3 "&"
4 7 node 5
5 8 leaf 4 "9"
6 10 leaf 7 "10"
One GNU awk (for multidimensional arrrays):
awk '
function replace(op) {
while ( ((maxToken - minToken + 1) >= maxlen) || op == "flush" ) {
NodeNbr=root
minOrd=maxOrd
for (j=0 ; j<maxlen; j++) { # loop through tokens in buffer[]
token=buffer[ ((minToken + j - 1) % maxlen) + 1 ]
# if we find a matching "leaf" node then keep track of the ordering (ie, FNR from map.txt; lower order == higher precedence)
if ( token in tree [NodeNbr] && "leaf" in tree[NodeNbr][token] )
minOrd= ( tree[NodeNbr][token]["leaf"] < minOrd ) ? tree[NodeNbr][token]["leaf"] : minOrd
# if we find a matching "node" node then grab the next node to compare against the next token from buffer[]
if ( token in tree[NodeNbr] && "node" in tree[NodeNbr][token] ) {
NodeNbr=tree[NodeNbr][token]["node"]
continue
}
break # if we get here we have a token from buffer[] that does not match any of our replacement mappings so abort checking rest of buffer[]
}
if (minOrd < maxOrd) { # if we found at least one complete match (ie, hit a "leaf" node) then ...
print map[minOrd] # use the associated "ord"er to print the associated replacement string and ...
minToken=minToken + len[minOrd] # update the pointer into the buffer[] array
}
else { # otherwise we did not find a match so ...
print buffer[ ((minToken - 1) % maxlen) + 1 ] # print the first token from buffer[] and ...
minToken++ # update the pointer into the buffer[] array
}
if (minToken > maxToken)
break
}
}
BEGIN { root=maxNodeNbr=maxToken=0
minToken=1
maxOrd=9999999999
}
FNR==NR { split($0,a,",")
map[FNR]=a[2] # save replacement string for this input line from map.txt
n=split(a[1],b) # break our matching pattern into tokens
len[FNR]=n # make note of number of tokens in this line of input
maxlen=(n > maxlen) ? n : maxlen # keep track of longest series of tokens
NodeNbr=root # initiate our tree search
for (i=1 ; i<=n ; i++) { # loop through our list of tokens
token=b[i]
if (i==n) # if the last token for this line then create a "leaf" node and store the line number (aka "order")
tree[NodeNbr][token]["leaf"]=FNR
else
if ( tree[NodeNbr][token]["node"] ) # else if we already have a node at this point in the tree then grab its associated node number for the next level in the tree
NodeNbr=tree[NodeNbr][token]["node"]
else { # else create a new "node" node and populate with the next available node number
tree[NodeNbr][token]["node"]=++maxNodeNbr
NodeNbr=maxNodeNbr # use this as the next level in our tree traversal
}
}
maxrec=FNR # keep track of total number of replacement sets from map.txt (only used if we decide to print the contents of map[] to stdout
next
}
FNR==1 {
# Uncomment following to display the contents of the map[] array:
# for (i=1;i<=maxrec;i++)
# print "map:" i ":" map[i] ":"
#
# Uncomment following to display the contents of the tree[][][] array:
# fmt="%6s%8s%10s%10s%10s\n"
# fmt="%6s%8s%10s%10s%10s\n"
# printf "tree [Parent] [Token] [NodeType]\n\n"
# printf fmt, "Parent", "Token", "NodeType", "NodeVal", "MapTo"
# printf fmt, "======", "=====", "========", "=======", "====="
#
# for (NodeNbr=root ; NodeNbr<=maxNodeNbr ; NodeNbr++)
# for (token in tree[NodeNbr])
# for (NodeType in tree[NodeNbr][token]) { # ??
# NodeVal=tree[NodeNbr][token][NodeType]
# printf fmt, NodeNbr, token, NodeType, NodeVal, (NodeType=="leaf") ? "\"" map[NodeVal] "\"" : ""
# }
}
{ for (i=1 ; i<=NF ; i++) { # loop through tokens in current line from input.txt
maxToken++
buffer[ ((maxToken - 1) % maxlen) + 1 ] = $i
if ( (maxToken - minToken + 1) >= maxlen ) # if we have a "full" buffer then ...
replace() # look for replacement match
}
}
END { replace("flush") } # flush the rest of buffer[]
' map.txt input.txt
This generates:
X # "2 3 4" has precendence over "2 3"
112
&
9
12
\000
&
15
255
10
9
199
\000
13
If we switch the first 2 lines of map.txt like such:
==> map.txt <==
2 3,Y # "2 3" has precendence over ...
2 3 4,X # "2 3 4"
We now generate:
Y # "2 3" has precendence over "2 3 4" thus ...
4 # leaving "4" by itself
112
&
9
12
\000
&
15
255
10
9
199
\000
13

Successive averaging of repeating data but different number of lines

I have the following format of data:
1 3
1.723608 0.8490000
1.743011 0.8390000
1.835833 0.7830000
2 5
1.751377 0.8350000
1.907603 0.7330000
1.780053 0.8190000
1.601427 0.9020000
1.950540 0.6970000
3 2
1.993951 0.6610000
1.796519 0.8090000
4 4
1.734961 0.8430000
1.840741 0.7800000
1.818444 0.7950000
1.810717 0.7980000
5 1
2.037940 0.6150000
6 7
1.738221 0.8330000
1.767678 0.8260000
1.788517 0.8140000
2.223586 0.4070000
1.667492 0.8760000
2.039232 0.6130000
1.758823 0.8300000
...
Data consists of data blocks. Each data block has the same format as follows:
The very first line is the header line. The header line contains the ID number and the total number of lines of each data block. For example, the first data block's ID is 1, and the total number of lines is 3. For the third data block, ID is 3, and the total number of lines is 2. All data blocks have this header line.
Then, the "real data" follows. As I explained, the number of lines of "real data" is designated in the second integer of the header line.
Accordingly, the total number of lines for each data block will be number_of_lines+1. In this example, the total number of lines for data block 1 is 4, and data block 2 costs 6 lines...
This format repeats all the way up to 10000 number of data blocks in my current data, but I can provide this 10000 as a variable in the bash or awk script as an input value. I know the total number of data blocks.
Now, what I wish to do is, I want to get the average of data of each two columns and print it out with data block ID number and a total number of lines. The output text will have:
ID_number number_of_lines average_of_column_1 average_of_column_2
using 5 spaces between columns with 6 decimal places format. The result will have 10000 lines, and each line will have ID, number of lines, avg of column 1 of data, and avg of column 2 of data for each data block. The result of this example will look like
1 3 1.767484 0.823666
2 5 1.798200 0.797200
3 2 1.895235 0.735000
...
I know how to get the average of a simple data column in awk and bash. These are already answered in StackOverflow a lot of times. For example, I really favor using
awk '{ total += $2; count++ } END { print total/count }' data.txt
So, I wish to this using awk or bash. But I really have no clue how can I approach and even start to get this kind of average of multiple repeating data blocks, but with a different number of lines for each data block.
I was trying based on awk, following
Awk average of n data in each column
and
https://www.unix.com/shell-programming-and-scripting/135829-partial-average-column-awk.html
But I'm not sure how can I use NR or FNR for the average of data with a varying number of total lines of data, for each data block.

You may try this awk:
awk -v OFS='\t' '$2 ~ /\./ {s1 += $1; s2 += $2; next} {if (id) {print id, num, s1/num, s2/num; s1=s2=0} id=$1; num=$2} END {print id, num, s1/num, s2/num}' file
1 3 1.76748 0.823667
2 5 1.7982 0.7972
3 2 1.89524 0.735
4 4 1.80122 0.804
5 1 2.03794 0.615
6 7 1.85479 0.742714
If you have gnu awk then use OFMT for getting fixed size decimal numbers like this:
awk -v OFMT="%.6f" -v OFS='\t' '$2 ~ /\./ {s1 += $1; s2 += $2; next} {if (id) {print id, num, s1/num, s2/num; s1=s2=0} id=$1; num=$2} END {print id, num, s1/num, s2/num}' file
1 3 1.767484 0.823667
2 5 1.798200 0.797200
3 2 1.895235 0.735000
4 4 1.801216 0.804000
5 1 2.037940 0.615000
6 7 1.854793 0.742714
An expanded form:
awk OFMT='%.6f' -v OFS='\t' '
$2 ~ /\./ {
s1 += $1
s2 += $2
next
}
{
if (id) {
print id, num, s1/num, s2/num
s1 = s2 = 0
}
id = $1
num = $2
}
END {
print id, num, s1/num, s2/num
}' file

And yet another one:
awk -v num_blocks=10000 '
BEGIN {
OFS = "\t"
OFMT = "%.6f"
}
num_lines == 0 {
id = $1
num_lines = $2
sum1 = sum2 = 0
next
}
lines_read < num_lines {
sum1 += $1
sum2 += $2
lines_read++
}
lines_read >= num_lines {
print id, num_lines,
sum1 / num_lines,
sum2 / num_lines
num_lines = lines_read = 0
num_blocks--;
}
num_blocks <= 0 {
exit
}' file

You could try
awk -v qnt=none 'qnt == "none" {id = $1; qnt = $2; s1 = s2 = line = 0;next}{s1 += $1; s2 += $2; ++line} line == qnt{printf "%d %d %.6f %.6f\n", id, qnt, s1/qnt, s2/qnt; qnt="none"}'
The above is expanded as follows:
qnt == "none"
{
id = $1;
qnt = $2;
s1 = s2 = line = 0;
next
}
{
s1 += $1;
s2 += $2;
++line
}
line == qnt
{
printf "%d %d %.6f %.6f\n", id, qnt, s1/qnt, s2/qnt;
qnt="none"
}
After a data block is processed (or at the beginning), record header info.
Otherwise, add to sum and print the result when we've done with all lines in this block.

AWK to print min max values of unique value in column

I am trying to use awk to do the following:
Input file:
6:28866209 NA NA NA 8.51368e-06 Y
6:28856689 1 0.007828 1 1.50247e-06 X
6:28856740 2 0.007828 1 1.50247e-06 Y
6:28856889 3 7.51E-08 3 1.50247e-06 X
I want to:
Get min and max of column 5 for each independent value in column 6
Print the min max in the file for each column 5 at the end of the file
The file can have different N columns, but all have at least columns 1-8, which are the same in each of my files.
Output:
6:28866209 NA NA NA 8.51368e-06 Y 8.51368e-06 1.50247e-06
6:28856689 1 0.007828 1 1.50247e-06 X 1.50247e-061.50247e-06
6:28856740 2 0.007828 1 1.50247e-06 Y 8.51368e-06 1.50247e-06
6:28856889 3 7.51E-08 3 1.50247e-06 X 1.50247e-06 1.50247e-06
I have attempted this using the following awk command, but I am only getting back the first value in column 6...
awk 'BEGIN{OFS="\t";FS="\t"}{if (a[$6] == "") a[$6]=$5; if (a[$6] > $5) {a[$6]=$5}} {if (b[$6] == "") b[$6]=$5; if (b[$6] < $5) {b[$6]=$5}} END {if (i=$6) print $0,i,a[i],b[i]}' FILE

I believe the easiest to do is using a double pass of the file :
awk '(NR==FNR) && !($6 in min) { min[$6] = $5; max[$6] = $5; next }
(NR==FNR) { m=min[$6]; M=max[$6];
min[$6] = $5<m ? $5 : m;
max[$6] = $5>M ? $5 : M;
next; }
{print $0,min[$6],max[$6] }' <file> <file>
Your original code has the following flaw. The END statement is only executed when the end of the file is reached. You attempt to print the full file, but you did not store any lines in the parsing.
A correction to your original idea is :
awk 'BEGIN{OFS="\t";FS="\t"}
{if (a[$6] == "") a[$6]=$5;
if (a[$6] > $5) {a[$6]=$5}
}
{if (b[$6] == "") b[$6]=$5;
if (b[$6] < $5) {b[$6]=$5}
}
{ c[NR]=$0; d[NR]=$6 }
END { for (i=1;i<=NR;i++) print c[i],a[d[i]],b[d[i]] }' FILE
Here, I stored the full FILE in array c which is indexed by the line-number NR. I also store the index $6 in array d. At the end, I loop trough all lines I stored and print what is expected.
The downside of this approach is that you have to store the full file in memory.
The downside of my proposal, is that you have to read the full file twice from disk.

awk 'FNR<NR{$7=m[$6];$8=M[$6];print;next} (!M[$6])||$5>M[$6]{M[$6]=$5}(!m[$6])||$5<m[$6]{m[$6]=$5}' file file
with comment
awk '
# optional format
BEGIN { OFS=FS="\t"}
# for second pass (second file read)
FNR<NR{
# add a column 7 and 8 with value of min and max correponsing to column 5
$7=m[$6];$8=M[$6]
# print it and reda next line (don't go further in script)
print;next}
# this point is only reach by first file read
# if Max is unknow or value 5 bigger than max
(!M[$6])||$5>M[$6]{
# set new max
M[$6]=$5}
# do the same for min
(!m[$6])||$5<m[$6]{m[$6]=$5}
# read 2 times the same file (first to find min/max, second to print it)
' sample.txt sample.txt

How to append column to existing file in awk?

I have a file named bt.B.1.log that looks like this:
.
.
.
Time in seconds = 260.37
.
.
.
Compiled procs = 1
.
.
.
Time in seconds = 260.04
.
.
.
Compiled procs = 1
.
.
.
and so on for 40 records of Time in seconds and Compiled procs (dots represent useless lines).
How do I add a single column with the value of Compiled procs (which is 1) to the result of the following two commands:
This prints the average of Time in seconds values (thanks to dawg for this one)
awk -F= '/Time in seconds/ {s+=$2; c++} END {print s/c}' bt.B.1.log > t1avg.dat
Desired output:
260.20 1
This prints all of the ocurrences of Time in seconds, but there is a small problem with it; it is printing an extra blank line at the beginning of the list.
awk 'BEGIN { FS = "Time in seconds =" } ; { printf $2 } {printf " "}' bt.B.1.log > t1.dat
Desired output:
260.37 1
260.04
.
.
.
In both cases I need the value of Compiled procs to appear only once, preferrably in the first line, and no use of intermediate files.
What I managed to do so far prints all values of Time in seconds with the Compiled procs column appearing in every line and with a strange identation:
awk '/seconds/ {printf $5} {printf " "} /procs/ {print $4}' bt.B.1.log > t1.dat
Please help!
UPDATE
Contents of file bt.B.1.log:
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Start in 16:40:51--25/12/2014
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of active processes: 1
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.7507984505732E-14
2 0.9933052259015E+02 0.9933052259015E+02 0.3147459568137E-14
3 0.3564602564454E+03 0.3564602564454E+03 0.4783990739472E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.2309751522921E-13
5 0.3270754125466E+04 0.3270754125466E+04 0.8481098651866E-14
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.2682819657265E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1989963674771E-15
3 0.1312257334221E+02 0.1312257334221E+02 0.4060995034457E-15
4 0.1200692532356E+02 0.1200692532356E+02 0.2958887128106E-15
5 0.1245957615104E+03 0.1245957615104E+03 0.2281113665977E-15
Verification Successful
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 260.37
Total processes = 1
Compiled procs = 1
Mop/s total = 2696.83
Mop/s/process = 2696.83
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 25 Dec 2014
Compile options:
MPIF77 = mpif77
FLINK = $(MPIF77)
FMPI_LIB = -L/usr/lib/openmpi/lib -lmpi -lopen-rte -lo...
FMPI_INC = -I/usr/lib/openmpi/include -I/usr/lib/openm...
FFLAGS = -O
FLINKFLAGS = -O
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb#nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
Finish in 16:45:14--25/12/2014
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Start in 16:58:50--25/12/2014
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of active processes: 1
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.7507984505732E-14
2 0.9933052259015E+02 0.9933052259015E+02 0.3147459568137E-14
3 0.3564602564454E+03 0.3564602564454E+03 0.4783990739472E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.2309751522921E-13
5 0.3270754125466E+04 0.3270754125466E+04 0.8481098651866E-14
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.2682819657265E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1989963674771E-15
3 0.1312257334221E+02 0.1312257334221E+02 0.4060995034457E-15
4 0.1200692532356E+02 0.1200692532356E+02 0.2958887128106E-15
5 0.1245957615104E+03 0.1245957615104E+03 0.2281113665977E-15
Verification Successful
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 260.04
Total processes = 1
Compiled procs = 1
Mop/s total = 2700.25
Mop/s/process = 2700.25
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 25 Dec 2014
Compile options:
MPIF77 = mpif77
FLINK = $(MPIF77)
FMPI_LIB = -L/usr/lib/openmpi/lib -lmpi -lopen-rte -lo...
FMPI_INC = -I/usr/lib/openmpi/include -I/usr/lib/openm...
FFLAGS = -O
FLINKFLAGS = -O
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb#nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
Finish in 17:03:12--25/12/2014
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
There are 40 entries in the log, but I've provided only 2 for abbreviation purposes.

To fix the first issue, replace:
awk -F= '/Time in seconds/ {s+=$2; c++} END {print s/c}' bt.B.1.log > t1avg.dat
with:
awk 'BEGIN { FS = "[ \t]*=[ \t]*" } /Time in seconds/ { s += $2; c++ } /Compiled procs/ { if (! CP) CP = $2 } END { print s/c, CP }' bt.B.1.log >t1avg.dat
A potential minor issue is that 260.205 1 might be output but the question does not address this as a weakness of the given script. Rounding with something like printf "%.2f %s\n", s/c, CP gives 260.21 1 though. To truncate the extra digit, use something like printf "%.2f %s\n", int (s/c * 100) / 100, CP.
To fix the second issue, replace:
awk 'BEGIN { FS = "Time in seconds =" } ; { printf $2 } {printf " "}' bt.B.1.log > t1.dat
with:
awk 'BEGIN { FS = "[ \t]*[=][ \t]" } /Time in seconds/ { printf "%s", $2 } /Compiled procs/ { if (CP) { printf "\n" } else { CP = $2; printf " %s\n", $2 } }' bt.B.1.log > t1.dat
BTW, the "strange indentation" is a result of failing to output a newline when using printf and failing to filter unwanted input lines from processing.

calculate the difference from flat file

I have a text file and the last 2 lines look like this...
Uptime: 822832 Threads: 32 Questions: 13591705 Slow queries: 722 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.518
Uptime: 822893 Threads: 31 Questions: 13592768 Slow queries: 732 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.618
How do I find the difference between the two values of each parameter?
The expected output is:
61 -1 1063 10 0 0 0 0.1
In other words I will like to deduct the current uptime value from the earlier uptime.
Find the difference between the threads and Questions and so on.
The purpose of this exercise is to watch this file and alert the user when the difference is too high. For e.g. if the slow queries are more than 500 or the "Questions" parameter is too low (<100)
(It is the MySQL status but has nothing to do with it, so mysql tag does not apply)

Just a slight variation on ghostdog74's (original) answer:
tail -2 file | awk ' {
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ");
for (i=1;i<=m;i++)
if (NR==1) b[i]=a[i]; else print a[i] - b[i]
} '

here's one way. tail is used to get the last 2 lines, especially useful in terms of efficiency if you have a big file.
tail -2 file | awk '
{
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ")
if (f) {
for (i=1;i<=m;i++){
print -(b[i]-a[i])
}
# to check for Questions, slow queries etc
if ( -(b[3]-a[3]) < 100 ){
print "Questions parameter too low"
}else if ( -(b[4]-a[4]) > 500 ){
print "Slow queries more than 500"
}else if ( a[1] - b[1] < 0 ){
print "mysql ...... "
}
exit
}
for(i=1;i<=m;i++ ){ b[i]=a[i] ;f=1 }
} '
output
$ ./shell.sh
61
-1
1063
10
0
0
0
0.1

gawk:
BEGIN {
arr[1] = "0"
}
length(arr) > 1 {
print $2-arr[1], $4-arr[2], $6-arr[3], $9-arr[4], $11-arr[5], $14-arr[6], $17-arr[7], $22-arr[8]
}
{
arr[1] = $2
arr[2] = $4
arr[3] = $6
arr[4] = $9
arr[5] = $11
arr[6] = $14
arr[7] = $17
arr[8] = $22
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas