Awk Conditional Test Statement - awk

I would really appreciate some help. I spent almost the whole morning on it.
I have a data of structure field 1 to 16 as follows
4572 1307084940 RDCSWE 2006 1 5 0.28125 0.5 0.125 0.09375 0 0 0 0 0 0
4573 1307101627 RDCSWE 2006 1 5 0.6875 0.125 0.1875 0 0 0 0 0 0 0
4574 1307101642 RDCSWE 2006 1 5 0.5625 0.25 0.03125 0.15625 0 0 0 0 0 0
4575 1307101662 RDCSWE 2006 1 5 0.53125 0.25 0.1875 0.03125 0 0 0 0 0 0
4576 1307127329 RDCSWE 2006 1 5 0.4375 0.34375 0.09375 0.125 0 0 0 0 0 0
From field 7 to 10 I need a test on the elements (ranging fro 0-1) and the field number.
i.e. for every record, check the fields 7-10 for maximum value,
if found and its in field 7 print $0, $6-4
if found and its in field 8 print $0, $6-3
if found and its in field 9 print $0, $6-2
if found and its in field 10 print $0, $6-1
I'll be so grateful for the help. Thank you in advance
Edit (by belisarius)
Just transcripting a comment from #Tumi2002 (author)
Sorry, my 6th field (i.e. $6) has values 1-5.
I am trying to reclassify records where field 6=5 back into 1-4 classes in the same fieid).
So that instead of 5 classes I have 4.
Awk '$6==5
{for i=7; i<11; i++)
if ($i==max) && NF==7) print $0,$6-4;
if ($i==max) && NF==8) print $0,$6-3;
if ($i==max) && NF==9) print $0,$6-2;
if ($i==max) && NF==10) print $0,$6-1
I am struggling with the syntax in awk

{
max=0; maxindex=0;
for (i=7; i<=10; i++)
{
if ($i>max){
maxindex=i;
max=$i;
# print i;
}
}
if (maxindex > 0){
print $6-11+maxindex;
}
}
Running at ideone
Output for your example data:
2
1
1
1
1
Edit
Modified answering your comment:
($6 == 5){
max=0; maxindex=0;
for (i=7; i<=10; i++)
{
if ($i>max){
maxindex=i;
max=$i;
# print i;
}
}
if (maxindex > 0){
print $0,"-->",$6-11+maxindex;
}
}
Output:
4572 1307084940 RDCSWE 2006 1 5 0.28125 0.5 0.125 0.09375 0 0 0 0 0 0 --> 2
4573 1307101627 RDCSWE 2006 1 5 0.6875 0.125 0.1875 0 0 0 0 0 0 0 --> 1
4574 1307101642 RDCSWE 2006 1 5 0.5625 0.25 0.03125 0.15625 0 0 0 0 0 0 --> 1
4575 1307101662 RDCSWE 2006 1 5 0.53125 0.25 0.1875 0.03125 0 0 0 0 0 0 --> 1
4576 1307127329 RDCSWE 2006 1 5 0.4375 0.34375 0.09375 0.125 0 0 0 0 0 0 --> 1
Running at ideone here

First of all, thanks for belisarius for pointing to ideone.
My (updated) solution is working correctly now:
# max value in an array, copied verbatim from the gawk manual (credit)
function maxelt(vec, i, ret)
{
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
# Load all fields of each record into nums.
{
delete nums
for(i = 7; i <= 10; i++)
{ nums[NR, i] = $i }
### DEBUG print NR, maxelt(nums)
if ( $7 == maxelt(nums) ) { print $0, ($6-4) }
if ( $8 == maxelt(nums) ) { print $0, ($6-3) }
if ( $9 == maxelt(nums) ) { print $0, ($6-2) }
if ( $10 == maxelt(nums) ) { print $0, ($6-1) }
}
HTH

Related

Using awk to count number of row group

I have a data set: (file.txt)
X Y
1 a
2 b
3 c
10 d
11 e
12 f
15 g
20 h
25 i
30 j
35 k
40 l
41 m
42 n
43 o
46 p
I want to add two columns which are Up10 and Down10,
Up10: From (X) to (X-10) count of row.
Down10 : From (X) to (X+10)
count of row
For example:
X Y Up10 Down10
35 k 3 5
For Up10; 35-10 X=35 X=30 X=25 Total = 3 row
For Down10; 35+10 X=35 X=40 X=41 X=42 X=42 Total = 5 row
Desired Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
3 c 3 4
10 d 4 5
11 e 5 4
12 f 5 3
15 g 4 3
20 h 5 3
25 i 3 3
30 j 3 3
35 k 3 5
40 l 3 5
41 m 3 4
42 n 4 3
43 o 5 2
46 p 5 1
This is the Pierre François' solution: Thanks again #Pierre François
awk '
BEGIN{OFS="\t"; print "X\tY\tUp10\tDown10"}
(NR == FNR) && (FNR > 1){a[$1] = $1 + 0}
(NR > FNR) && (FNR > 1){
up = 0; upl = $1 - 10
down = 0; downl = $1 + 10
for (i in a) { i += 0 # tricky: convert i to integer
if ((i >= upl) && (i <= $1)) {up++}
if ((i >= $1) && (i <= downl)) {down++}
}
print $1, $2, up, down;
}
' file.txt file.txt > file-2.txt
But when i use this command for 13GB data, it takes too long.
I have used this way for 13GB data again:
awk 'BEGIN{ FS=OFS="\t" }
NR==FNR{a[NR]=$1;next} {x=y=FNR;while(--x in a&&$1-10<a[x]){} while(++y in a&&$1+10>a[y]){} print $0,FNR-x,y-FNR}
' file.txt file.txt > file-2.txt
When file-2.txt reaches 1.1GB it is frozen. I am waiting several hours, but i can not see finish of command and final output file.
Note: I am working on Gogole cloud. Machine type
e2-highmem-8 (8 vCPUs, 64 GB memory)
A single pass awk that keeps the sliding window of 10 last records and uses that to count the ups and downs. For symmetricy's sake there should be deletes in the END but I guess a few extra array elements in memory isn't gonna make a difference:
$ awk '
BEGIN {
FS=OFS="\t"
}
NR==1 {
print $1,$2,"Up10","Down10"
}
NR>1 {
a[NR]=$1
b[NR]=$2
for(i=NR-9;i<=NR;i++) {
if(a[i]>=a[NR]-10&&i>=2)
up[NR]++
if(a[i]<=a[NR-9]+10&&i>=2)
down[NR-9]++
}
}
NR>10 {
print a[NR-9],b[NR-9],up[NR-9],down[NR-9]
delete a[NR-9]
delete b[NR-9]
delete up[NR-9]
delete down[NR-9]
}
END {
for(nr=NR+1;nr<=NR+9;nr++) {
for(i=nr-9;i<=nr;i++)
if(a[i]<=a[nr-9]+10&&i>=2&&i<=NR)
down[nr-9]++
print a[nr-9],b[nr-9],up[nr-9],down[nr-9]
}
}' file
Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
...
35 k 3 5
...
43 o 5 2
46 p 5 1
Another single pass approach with a sliding window
awk '
NR == 1 { next } # skip the header
NR == 2 { min = max = cur = 1; X[cur] = $1; Y[cur] = $2; next }
{ X[++max] = $1; Y[max] = $2
if (X[cur] >= $1 - 10) next
for (; X[cur] + 10 < X[max]; ++cur) {
for (; X[min] < X[cur] - 10; ++min) {
delete X[min]
delete Y[min]
}
print X[cur], Y[cur], cur - min + 1, max - cur
}
}
END {
for (; cur <= max; ++cur) {
for (; X[min] < X[cur] - 10; ++min);
for (i = max; i > cur && X[cur] + 10 < X[i]; --i);
print X[cur], Y[cur], cur - min + 1, i - cur + 1
}
}
' file
The script assumes the X column is ordered numerically.

Using awk, how to average numbers in column between two strings in a text file

A text file containing multiple tabular delimited columns between strings with an example below.
Code 1 (3)
5 10 7 1 1
6 10 9 1 1
7 10 10 1 1
Code 2 (2)
9 11 3 1 3
10 8 5 2 1
Code 3 (1)
12 10 2 1 1
Code 4 (2)
14 8 1 1 3
15 8 7 5 1
I would like to average the numbers in the third column for each code block. The example below is what the output should look like.
8.67
4
2
4
Attempt 1
awk '$3~/^[[:digit:]]/ {i++; sum+=$3; print $3} $3!~/[[:digit:]]/ {print sum/i; sum=0;i=0}' in.txt
Returned fatal: division by zero attempted.
Attempt 2
awk -v OFS='\t' '/^Code/ { if (NR > 1) {i++; sum+=$3;} {print sum/i;}}' in.txt
Returned another division by zero error.
Attempt 3
awk -v OFS='\t' '/^Code/ { if (NR > 1) { print s/i; s=0; i=0; } else { s += $3; i += 1; }}' in.txt
Returned 1 value: 0.
Attempt 4
awk -v OFS='\t' '/^Code/ {
if (NR > 1)
i++
print sum += $3/i
}
END {
i++
print sum += $3/i
}'
Returned:
0
0
0
0.3
I am not sure where that last number is coming from, but this has been the closest solution so far. I am getting a number for each block, but not the average.
Could you please try following.
awk '
/^Code/{
if(value!=0 && value){
print sum/value
}
sum=value=""
next
}
{
sum+=$NF;
value++
}
END{
if(value!=0 && value){
print sum/value
}
}
' Input_file

Create bins with totals and percentage

I would like to create bins to get histogram with totals and percentage, e.g. starting from 0.
If possible to set the minimum and maximum value in the bins ( in my case value min=0 and max=20 )
Input file
8 5
10 1
11 4
12 4
12 4
13 5
16 7
18 9
16 9
17 7
18 5
19 5
20 1
21 7
output desired
0 0 0.0%
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
---------------------
Total: 73
I use this code from Mr Ed Morton, it works perfectly but the percentage is missed.
awk 'BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%0.1f %0.1f %d\n", beg, end, cnt[bucketNr]
beg = end
}
}' file
Thanks in advance
Your expected output doesn't seem to correspond to your sample input data, but try this variation of that awk code in your question (Intended to be put in an executable file to run as a script, not a a one-liner due to size):
#!/usr/bin/awk -f
BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
max[bucketNr] = max[bucketNr] < $2 ? $2 : max[bucketNr]
sum += $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%d-%d %d %.1f\n", beg, end, max[bucketNr],
(cnt[bucketNr] / NR) * 100
beg = end
}
print "-------------"
print "Total " sum
}
It adds tracking the maximum of the second column for each bin the first column falls in, and prints out a percentage instead of a count of how many rows were in each bin. Plus some tweaks to the output format to better match your desired output.

awk - writing values from file

How to write in awk a script that will write 10 values from the input file plase? Thank you.
I tried this:
BEGIN
$2 == 0 && $3 == 2 { print $7}
$2 == 0 && $3 == 4 { print $7}
$2 == 0 && $3 == 5 { print $7}
$2 == 2 && $3 == 2 { print $7}
$2 == 2 && $3 == 4 { print $7}
$2 == 2 && $3 == 5 { print $7}
$2 == 3 && $3 == 2 { print $7}
$2 == 3 && $3 == 4 { print $7}
$2 == 3 && $3 == 5 { print $7}
$1 == "achil" { print $3}
Should I write everything in one row?
When is it necessary to write BEGIN in code and when not?
Input file is:
achil 1 197524.72437205614 197524.72437205614 0.43066284286002637
o 0 1 0 1 1 5.732821000
o 0 2 0 1 1 54002.804084586
o 0 3 0 1 1 0.088300000
o 0 4 0 1 1 150.924210421
o 0 5 0 1 1 108.520740945
o 0 6 0 1 1 0.380000000
o 0 7 0 1 1 0.004220000
o 0 8 0 1 1 0.000000000
o 0 9 0 1 1 0.000000000
o 0 10 0 1 1 0.000000000
o 0 11 0 1 1 0.000000000
o 2 1 0 1 1 73413.000000000
o 2 2 0 1 1 36176.166543543
o 2 3 0 1 1 0.560000000
o 2 4 0 1 1 229.480202654
o 2 5 0 1 1 7.032947038
o 2 6 0 1 1 0.480000000
o 2 7 0 1 1 0.000000000
o 2 8 0 1 1 0.000000000
o 2 9 0 1 1 0.000000000
o 2 10 0 1 1 0.000000000
o 2 11 0 1 1 0.000000000
o 3 1 0 1 1 365.256360000
o 3 2 0 1 1 51550.294729034
o 3 3 0 1 1 0.016710220
o 3 4 0 1 1 299.430719769
o 3 5 0 1 1 0.001070537
o 3 6 0 1 1 0.000036626
o 3 7 0 1 1 0.000009111
o 3 8 0 1 1 0.000000000
o 3 9 0 1 1 0.000000000
o 3 10 0 1 1 0.000000000
o 3 11 0 1 1 0.000000000
I would like an output consisting of these 10 numbers:
197524.72437205614
54002.804084586
150.924210421
108.520740945
36176.166543543
229.480202654
7.032947038
51550.294729034
299.430719769
0.001070537
The script you have is just about right, but you don't need the BEGIN clause when using it as a script. Because any actions you include in the BEGIN clause gets executed before any of the input lines are processed. For example consider a case when you had to print a title for your output, you can just print it as
BEGIN { print "my-title-string-in-double quotes" }
Writing it one line or in multiple lines is purely a matter of style and what you have looks neat and much readable. So all you need to do now is define a awk script with content as
#!/usr/bin/awk -f
$2 == 0 && $3 == 2 { print $7}
$2 == 0 && $3 == 4 { print $7}
$2 == 0 && $3 == 5 { print $7}
$2 == 2 && $3 == 2 { print $7}
$2 == 2 && $3 == 4 { print $7}
$2 == 2 && $3 == 5 { print $7}
$2 == 3 && $3 == 2 { print $7}
$2 == 3 && $3 == 4 { print $7}
$2 == 3 && $3 == 5 { print $7}
$1 == "achil" { print $3}
Add execute permissions to it,
chmod +x script.awk
and run it as
awk -f script.awk input-file
But that said, your conditions could very well simply be written using pattern matching operators as
$2 ~ /^(0|2|3)$/ && $3 ~ /^(2|4|5)$/ { print $7; next } $1 == "achil" { print $3 }

AWK Combine based on area of file

I have a file like
Sever Name aad98722RHEL 20120630 075022
CPU
1 sec 10 sec 15 sec 1 min 1 hour
5 8 0 1 19
TX kbits/sec:
interface 10 sec 1 min 10 min 1 hour 1 day
--------- ------ ----- ------ ------ -----
eth0 32 33 39 40 33
eth1 6 186 321 199 18
eth2 0 0 0 0 0
mgt0 0 0 0 0 0
RX kbits/sec:
interface 10 sec 1 min 10 min 1 hour 1 day
--------- ------ ----- ------ ------ -----
eth0 19 19 25 26 23
eth1 9 26 40 28 10
eth2 0 0 0 0 0
mgt0 0 0 0 0 0
Total memory usage: 1412916 kB
Resident set size : 1256360 kB
Heap usage : 1368212 kB
Stack usage : 84 kB
Library size : 16316 kB
What I would like to produce is
aad98722RHE 20120630 075022 CPU 5 8 0 1 19
aad98722RHE 20120630 075022 TX kbits/sec: 32 33 39 40 33 6 186 321 199 18 0 0 0 0 0 0 0 0 0 0
aad98722RHE 20120630 075022 RX kbits/sec: 19 19 25 26 23 9 26 40 28 10 0 0 0 0 0 0 0 0 0 0
aad98722RHE 20120630 075022 Total memory usage: 1412916 kB Resident set size : 1256360 kB Heap usage : 1368212 kB Stack usage : 84 kB Library size : 16316 kB
Can this be done in Awk/Sed and how?
Perhaps it not better solution, but it work.
file: a.awk:
function print_cpu( server_name, cpu )
{
while ( $0 !~ cpu )
{
getline
}
getline
getline
printf "%s %s ", server_name, cpu
for ( i = 1; i < NF + 1; i++ )
{
printf "%s ", $i
}
printf "\n"
}
function print_rx_or_tx( server_name, rx_or_tx )
{
while ( $0 !~ rx_or_tx )
{
getline
}
getline
getline
getline
printf "%s %s ", server_name, rx_or_tx
while ( $0 != "" )
{
getline
for ( i = 2; i < NF; i++ )
{
printf "%s ", $i
}
}
printf "\n"
}
function print_stuff( server_name )
{
while ( $0 == "" )
{
getline
}
printf "%s ", server_name
while ( $0 != "" )
{
printf "%s ", $0
if ( getline <= 0 )
{
break
}
}
printf "\n"
}
BEGIN { server = "Server Name"; cpu = "CPU"; tx = "TX kbits/sec:"; rx = "RX kbits/sec:" }
server { server_name = $3 " " $4 " " $5 }
! server
{
print_cpu( server_name, cpu )
print_rx_or_tx( server_name, tx )
print_rx_or_tx( server_name, rx )
print_stuff( server_name )
}
run: awk -f a.awk your_input_file
One way using perl:
Assuming infile has content of your question and next content in script.pl:
use warnings;
use strict;
my ($header, $newlines, $trans, #nums);
## Read input in paragraph mode.
local $/ = qq||;
while ( my $par = <> ) {
chomp $par;
## Save data of the header.
if ( $. == 1 ) {
my #header = $par =~ m/\ASer?ver\s+Name\s+(\S+)\s+(\S+)\s+(\S+)\s*\Z/s;
last unless #header;
$header = join qq| |, #header;
next;
}
## Number of '\n' in each paragraph (number of lines minus one).
$newlines = $par =~ tr/\n/\n/;
## Three lines, the CPU info. Extract what I need and print.
if ( $newlines == 2 ) {
printf qq|%s %s %s\n|, $header, $par =~ m/\A([^\n]+).*\n([^\n]+)\Z/s;
next;
}
## Transmission string.
if ( $newlines == 0 ) {
$trans = $par;
next;
}
## Transmission info. Extract numbers and print.
if ( $newlines == 5 ) {
my #lines = split /\n/, $par;
for my $i ( 0 .. $#lines ) {
my #f = split /\s+/, $lines[ $i ];
if ( grep { m/\D/ } #f[ 1 .. $#f ] ) {
next;
}
else {
push #nums, #f[ 1 .. $#f ];
}
}
printf qq|%s %s\n|, $header, join qq| |, #nums;
#nums = ();
}
## Resume info. Extract and print.
if ( $newlines == 4 ) {
$par =~ s/\n/\t/gs;
printf qq|%s %s\n|, $header, $par;
}
}
Run it like:
perl script.pl infile
With following output:
aad98722RHEL 20120630 075022 CPU 5 8 0 1 19
aad98722RHEL 20120630 075022 32 33 39 40 33 6 186 321 199 18 0 0 0 0 0 0 0 0 0 0
aad98722RHEL 20120630 075022 19 19 25 26 23 9 26 40 28 10 0 0 0 0 0 0 0 0 0 0
aad98722RHEL 20120630 075022 Total memory usage: 1412916 kB Resident set size : 1256360 kB Heap usage : 1368212 kB Stack usage : 84 kB Library size : 16316 kB