awk code to calculate throughput for some nodes

awk code to calculate throughput for some nodes - awk

In ns-2 trace files for wireless nodes, I want to calculate throughput for some nodes at once. How to make if condition for specific nodes? The problem is because of "_" between node number. It is very time consuming if I check node one by one because of thousands number of nodes. Here the example of trace files :
r 1.092948833 _27_ MAC --- 171 tcp 1560 [0 1a 19 0] ------- [4194333:0 4194334:0 32 4194334] [37 0] 1 0 [-]
r 1.092948833 _28_ MAC --- 172 tcp 1560 [0 1a 19 0] ------- [4194333:0 4194334:0 32 4194334] [38 0] 1 0 [-]
r 1.092948833 _25_ MAC --- 173 tcp 1560 [0 1a 19 0] ------- [4194333:0 4194334:0 32 4194334] [39 0] 1 0 [-]
r 1.092948833 _21_ MAC --- 174 tcp 1560 [0 1a 19 0] ------- [4194333:0 4194334:0 32 4194334] [40 0] 1 0 [-]
r 1.092948833 _36_ MAC --- 175 tcp 1560 [0 1a 19 0] ------- [4194333:0 4194334:0 32 4194334] [41 0] 1 0 [-]
r 1.092948833 _29_ MAC --- 176 tcp 1560 [0 1a 19 0] ------- [4194333:0 4194334:0 32 4194334] [42 0] 1 0 [-]
My awk code :
action = $1;
sta = $3;
time = $2;
dest = $4;
app = $7;
pkt_size = $8;
if ( action == "r" && dest == "MAC" && app == "tcp" && time > 1 && ((sta >= "_6_"
&& sta <= "_30_") || (sta >= "_36_"))) {
if (start_ == 0) start_ = time;
if (time > end_) end_ = time;
sum_ = sum_ + pkt_size;
}
But it doesn't work

You appear to be doing something like:
if ($1=="r" && $7=="cbr" && $3=="_1_") {
sec[int($2)]+=$8;
};
from the fil2.awk in a paper titled "ns-2 for the impatient".
Though you want to match on different fields, and match on a range of nodes, I'm going to assume you want output similar to:
sec[i] throughput[i]
from the same paper where sec[i] is the second and throughput[i] here would be like your sums for each second's worth of packet sizes.
The sec array from that snippet is storing packet sums for rounded seconds (rounding provided by int()) which means you don't need a start_ or end_.
Since you want to compare multiple nodes, _ can be added to FS to make comparisons easier. Here's a version of your script modified to produce an output over seconds for the nodes you want to compare binned by seconds:
#!/usr/bin/awk -f
BEGIN {FS="[[:space:]]|_"} # use posix space or underscore for FS
{
action = $1
time = $2
sta = $4 # shifted here because underscores are delimiters
dest = $6
app = $10
pkt_size = $11
if( action == "r" && dest == "MAC" && app == "tcp" &&
time > 1 && ((sta >= 6 && sta <= 30) || (sta >= 36))) {
sec[int(time)]+=pkt_size
}
}
END { for( i in sec ) print i, sec[i] }
Notice that the sta tests are numeric now instead of string comparisons. If I put that into an executable awk file called awko and run as awko data against your data I get:
1 9360
as the output.

Related

Create bins with totals and percentage

I would like to create bins to get histogram with totals and percentage, e.g. starting from 0.
If possible to set the minimum and maximum value in the bins ( in my case value min=0 and max=20 )
Input file
8 5
10 1
11 4
12 4
12 4
13 5
16 7
18 9
16 9
17 7
18 5
19 5
20 1
21 7
output desired
0 0 0.0%
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
---------------------
Total: 73
I use this code from Mr Ed Morton, it works perfectly but the percentage is missed.
awk 'BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%0.1f %0.1f %d\n", beg, end, cnt[bucketNr]
beg = end
}
}' file
Thanks in advance

Your expected output doesn't seem to correspond to your sample input data, but try this variation of that awk code in your question (Intended to be put in an executable file to run as a script, not a a one-liner due to size):
#!/usr/bin/awk -f
BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
max[bucketNr] = max[bucketNr] < $2 ? $2 : max[bucketNr]
sum += $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%d-%d %d %.1f\n", beg, end, max[bucketNr],
(cnt[bucketNr] / NR) * 100
beg = end
}
print "-------------"
print "Total " sum
}
It adds tracking the maximum of the second column for each bin the first column falls in, and prints out a percentage instead of a count of how many rows were in each bin. Plus some tweaks to the output format to better match your desired output.

Output the result of each loop in different columns

price.txt file has two columns: (name and value)
Mary 134
Lucy 56
Jack 88
range.txt file has three columns: (fruit and min_value and max_value)
apple 57 136
banana 62 258
orange 88 99
blueberry 98 121
My aim is to test whether the value in price.txt file is between the min_value and max_value in range.txt. If yes, putout 1, If not, output "x".
I tried:
awk 'FNR == NR { name=$1; price[name]=$2; next} {
for (name in price) {
if ($2<=price[name] && $3>=price[name]) {print 1} else {print "x"}
}
}' price.txt range.txt
But my results are all in one column, just like follows:
1
1
x
x
x
x
x
x
1
1
1
x
Actually, I want my result to be like: (Each name has one column)
1 x 1
1 x 1
x x 1
x x x
Because I need to use paste to add the output file and range.txt file together. The final result should be like:
apple 57 136 1 x 1
banana 62 258 1 x 1
orange 88 99 x x 1
blueberry 98 121 x x x
So, how can I get the result of each loop in different columns? And is there anyway to output the final result without paste based on my current code? Thank you.

This builds on what you provided,
# load prices by index to maintain read order
FNR == NR {
price[names++]=$2
next
}
# save max index to avoid using non-standard length(array)
END {
names=NR
}
{
l = $1 " " $2 " " $3
for (i=0; i < names; i++) {
if ($2 <= price[i] && $3 >= price[i]) {
l = l " 1"
} else {
l = l " x"
}
}
print l
}
and generates output,
apple 57 136 1 x 1
banana 62 258 1 x 1
orange 88 99 x x 1
blueberry 98 121 x x x
However, you don't have the person name for the score (anonymous results) - maybe that's intentional?
The change here is to explicitly index array populated in first block to maintain order.

Awk space-delimited file content

I have a file whose lines I want to split using either space or "_".
Its format is
f 5.287102213 _10_ RTR --- 312 cbr 120 [13a a 6 800] ------- [6:0 20:0 29 20] [15] 1 0
s 5.288000000 _0_ AGT --- 322 cbr 100 [0 0 0 0] ------- [0:0 2:0 32 0] [18]
My awk script is as follows:
`#!/usr/bin/awk -f
BEGIN {FS="[[:space:]]|_"} # use posix space or underscore for FS
{
action = $1;
time = $2;
sta = $4 ; # shifted here because underscores are delimiters
dest = $6;
app = $10;
pkt_size = $11;
#print $1
#print $2
print $5
#print $4
#print $5
#print $6
#print $7
#print $8
#print $9
#print $10
if( action == "s" && dest == "MAC" && app == "cbr"){
startTime+=time ;
count++;
}
if( action == "r" && dest == "MAC" && app == "cbr"){
endTime+=time ;
receivedSize+=pkt_size ;
}
}`
As seen in the above script, from the above script I was expecting RTR to be in $4.
But I find that the output of $3 is as follows:
RTR --- 312 cbr 120 [13a a 6 800] ------- [6:0 20:0 29 20] [15] 1 0
AGT --- 322 cbr 100 [0 0 0 0] ------- [0:0 2:0 32 0] [18] 0 0
RTR --- 322 cbr 100 [0 0 0 0] ------- [0:0 2:0 32 0] [18] 0 0
What am I doing wrong? Am new to awk.

Change your FS value to [[:space:]_]+ to get the tokenization (splitting into fields) you want.
Test it with this statement to see the fields recognized:
awk -F'[[:space:]_]+' '{for(i=1;i<=NF;++i){print i ": " $i}}' \
<<<'f 5.287102213 _10_ RTR --- 312 cbr 120 [13a a 6 800] ------- [6:0 20:0 29 20] [15] 1 0'
The problem with your FS value, [[:space:]]|_, is that
it only recognizes 1 character at a time as the separator
it only recognizes either whitespace or _ as the separator.
Note that specifying an explicit FS value other than ' ' (a single space) causes awk to look for a single instance of that separator, and interprets multiple adjacent instances as separating multiple - and thus empty - fields.
Thus, in your case, the spans <space>_ and _<space> each represent not a single separator, but two separators abutting an empty field.
If you want spans (runs) of a given character or characters from a set to be interpreted as a single separator instance, use duplication symbol +.
However, the proposed FS value, [[:space:]_]+, may be too permissive, as it would recognize a run of any mix of whitespace and _ chars. as a separator.
To be more restrictive, you could use the following FS value:
[[:space:]]+_?|_?[[:space:]]+
That said, if the _ chars in your input function more like delimiters enclosing only one field, a better solution may be:
to use the DEFAULT value of FS, which recognizes runs of whitespace as delimiters
to strip the _ delimiters from field $3: gsub("^_|_$", "", $3)

AWK Combine based on area of file

I have a file like
Sever Name aad98722RHEL 20120630 075022
CPU
1 sec 10 sec 15 sec 1 min 1 hour
5 8 0 1 19
TX kbits/sec:
interface 10 sec 1 min 10 min 1 hour 1 day
--------- ------ ----- ------ ------ -----
eth0 32 33 39 40 33
eth1 6 186 321 199 18
eth2 0 0 0 0 0
mgt0 0 0 0 0 0
RX kbits/sec:
interface 10 sec 1 min 10 min 1 hour 1 day
--------- ------ ----- ------ ------ -----
eth0 19 19 25 26 23
eth1 9 26 40 28 10
eth2 0 0 0 0 0
mgt0 0 0 0 0 0
Total memory usage: 1412916 kB
Resident set size : 1256360 kB
Heap usage : 1368212 kB
Stack usage : 84 kB
Library size : 16316 kB
What I would like to produce is
aad98722RHE 20120630 075022 CPU 5 8 0 1 19
aad98722RHE 20120630 075022 TX kbits/sec: 32 33 39 40 33 6 186 321 199 18 0 0 0 0 0 0 0 0 0 0
aad98722RHE 20120630 075022 RX kbits/sec: 19 19 25 26 23 9 26 40 28 10 0 0 0 0 0 0 0 0 0 0
aad98722RHE 20120630 075022 Total memory usage: 1412916 kB Resident set size : 1256360 kB Heap usage : 1368212 kB Stack usage : 84 kB Library size : 16316 kB
Can this be done in Awk/Sed and how?

Perhaps it not better solution, but it work.
file: a.awk:
function print_cpu( server_name, cpu )
{
while ( $0 !~ cpu )
{
getline
}
getline
getline
printf "%s %s ", server_name, cpu
for ( i = 1; i < NF + 1; i++ )
{
printf "%s ", $i
}
printf "\n"
}
function print_rx_or_tx( server_name, rx_or_tx )
{
while ( $0 !~ rx_or_tx )
{
getline
}
getline
getline
getline
printf "%s %s ", server_name, rx_or_tx
while ( $0 != "" )
{
getline
for ( i = 2; i < NF; i++ )
{
printf "%s ", $i
}
}
printf "\n"
}
function print_stuff( server_name )
{
while ( $0 == "" )
{
getline
}
printf "%s ", server_name
while ( $0 != "" )
{
printf "%s ", $0
if ( getline <= 0 )
{
break
}
}
printf "\n"
}
BEGIN { server = "Server Name"; cpu = "CPU"; tx = "TX kbits/sec:"; rx = "RX kbits/sec:" }
server { server_name = $3 " " $4 " " $5 }
! server
{
print_cpu( server_name, cpu )
print_rx_or_tx( server_name, tx )
print_rx_or_tx( server_name, rx )
print_stuff( server_name )
}
run: awk -f a.awk your_input_file

One way using perl:
Assuming infile has content of your question and next content in script.pl:
use warnings;
use strict;
my ($header, $newlines, $trans, #nums);
## Read input in paragraph mode.
local $/ = qq||;
while ( my $par = <> ) {
chomp $par;
## Save data of the header.
if ( $. == 1 ) {
my #header = $par =~ m/\ASer?ver\s+Name\s+(\S+)\s+(\S+)\s+(\S+)\s*\Z/s;
last unless #header;
$header = join qq| |, #header;
next;
}
## Number of '\n' in each paragraph (number of lines minus one).
$newlines = $par =~ tr/\n/\n/;
## Three lines, the CPU info. Extract what I need and print.
if ( $newlines == 2 ) {
printf qq|%s %s %s\n|, $header, $par =~ m/\A([^\n]+).*\n([^\n]+)\Z/s;
next;
}
## Transmission string.
if ( $newlines == 0 ) {
$trans = $par;
next;
}
## Transmission info. Extract numbers and print.
if ( $newlines == 5 ) {
my #lines = split /\n/, $par;
for my $i ( 0 .. $#lines ) {
my #f = split /\s+/, $lines[ $i ];
if ( grep { m/\D/ } #f[ 1 .. $#f ] ) {
next;
}
else {
push #nums, #f[ 1 .. $#f ];
}
}
printf qq|%s %s\n|, $header, join qq| |, #nums;
#nums = ();
}
## Resume info. Extract and print.
if ( $newlines == 4 ) {
$par =~ s/\n/\t/gs;
printf qq|%s %s\n|, $header, $par;
}
}
Run it like:
perl script.pl infile
With following output:
aad98722RHEL 20120630 075022 CPU 5 8 0 1 19
aad98722RHEL 20120630 075022 32 33 39 40 33 6 186 321 199 18 0 0 0 0 0 0 0 0 0 0
aad98722RHEL 20120630 075022 19 19 25 26 23 9 26 40 28 10 0 0 0 0 0 0 0 0 0 0
aad98722RHEL 20120630 075022 Total memory usage: 1412916 kB Resident set size : 1256360 kB Heap usage : 1368212 kB Stack usage : 84 kB Library size : 16316 kB

calculate the difference from flat file

I have a text file and the last 2 lines look like this...
Uptime: 822832 Threads: 32 Questions: 13591705 Slow queries: 722 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.518
Uptime: 822893 Threads: 31 Questions: 13592768 Slow queries: 732 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.618
How do I find the difference between the two values of each parameter?
The expected output is:
61 -1 1063 10 0 0 0 0.1
In other words I will like to deduct the current uptime value from the earlier uptime.
Find the difference between the threads and Questions and so on.
The purpose of this exercise is to watch this file and alert the user when the difference is too high. For e.g. if the slow queries are more than 500 or the "Questions" parameter is too low (<100)
(It is the MySQL status but has nothing to do with it, so mysql tag does not apply)

Just a slight variation on ghostdog74's (original) answer:
tail -2 file | awk ' {
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ");
for (i=1;i<=m;i++)
if (NR==1) b[i]=a[i]; else print a[i] - b[i]
} '

here's one way. tail is used to get the last 2 lines, especially useful in terms of efficiency if you have a big file.
tail -2 file | awk '
{
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ")
if (f) {
for (i=1;i<=m;i++){
print -(b[i]-a[i])
}
# to check for Questions, slow queries etc
if ( -(b[3]-a[3]) < 100 ){
print "Questions parameter too low"
}else if ( -(b[4]-a[4]) > 500 ){
print "Slow queries more than 500"
}else if ( a[1] - b[1] < 0 ){
print "mysql ...... "
}
exit
}
for(i=1;i<=m;i++ ){ b[i]=a[i] ;f=1 }
} '
output
$ ./shell.sh
61
-1
1063
10
0
0
0
0.1

gawk:
BEGIN {
arr[1] = "0"
}
length(arr) > 1 {
print $2-arr[1], $4-arr[2], $6-arr[3], $9-arr[4], $11-arr[5], $14-arr[6], $17-arr[7], $22-arr[8]
}
{
arr[1] = $2
arr[2] = $4
arr[3] = $6
arr[4] = $9
arr[5] = $11
arr[6] = $14
arr[7] = $17
arr[8] = $22
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk code to calculate throughput for some nodes - awk

Related

Create bins with totals and percentage

Output the result of each loop in different columns

Awk space-delimited file content

AWK Combine based on area of file

calculate the difference from flat file

Categories

Resources