AWK Combine based on area of file - awk

I have a file like
Sever Name aad98722RHEL 20120630 075022
CPU
1 sec 10 sec 15 sec 1 min 1 hour
5 8 0 1 19
TX kbits/sec:
interface 10 sec 1 min 10 min 1 hour 1 day
--------- ------ ----- ------ ------ -----
eth0 32 33 39 40 33
eth1 6 186 321 199 18
eth2 0 0 0 0 0
mgt0 0 0 0 0 0
RX kbits/sec:
interface 10 sec 1 min 10 min 1 hour 1 day
--------- ------ ----- ------ ------ -----
eth0 19 19 25 26 23
eth1 9 26 40 28 10
eth2 0 0 0 0 0
mgt0 0 0 0 0 0
Total memory usage: 1412916 kB
Resident set size : 1256360 kB
Heap usage : 1368212 kB
Stack usage : 84 kB
Library size : 16316 kB
What I would like to produce is
aad98722RHE 20120630 075022 CPU 5 8 0 1 19
aad98722RHE 20120630 075022 TX kbits/sec: 32 33 39 40 33 6 186 321 199 18 0 0 0 0 0 0 0 0 0 0
aad98722RHE 20120630 075022 RX kbits/sec: 19 19 25 26 23 9 26 40 28 10 0 0 0 0 0 0 0 0 0 0
aad98722RHE 20120630 075022 Total memory usage: 1412916 kB Resident set size : 1256360 kB Heap usage : 1368212 kB Stack usage : 84 kB Library size : 16316 kB
Can this be done in Awk/Sed and how?

Perhaps it not better solution, but it work.
file: a.awk:
function print_cpu( server_name, cpu )
{
while ( $0 !~ cpu )
{
getline
}
getline
getline
printf "%s %s ", server_name, cpu
for ( i = 1; i < NF + 1; i++ )
{
printf "%s ", $i
}
printf "\n"
}
function print_rx_or_tx( server_name, rx_or_tx )
{
while ( $0 !~ rx_or_tx )
{
getline
}
getline
getline
getline
printf "%s %s ", server_name, rx_or_tx
while ( $0 != "" )
{
getline
for ( i = 2; i < NF; i++ )
{
printf "%s ", $i
}
}
printf "\n"
}
function print_stuff( server_name )
{
while ( $0 == "" )
{
getline
}
printf "%s ", server_name
while ( $0 != "" )
{
printf "%s ", $0
if ( getline <= 0 )
{
break
}
}
printf "\n"
}
BEGIN { server = "Server Name"; cpu = "CPU"; tx = "TX kbits/sec:"; rx = "RX kbits/sec:" }
server { server_name = $3 " " $4 " " $5 }
! server
{
print_cpu( server_name, cpu )
print_rx_or_tx( server_name, tx )
print_rx_or_tx( server_name, rx )
print_stuff( server_name )
}
run: awk -f a.awk your_input_file

One way using perl:
Assuming infile has content of your question and next content in script.pl:
use warnings;
use strict;
my ($header, $newlines, $trans, #nums);
## Read input in paragraph mode.
local $/ = qq||;
while ( my $par = <> ) {
chomp $par;
## Save data of the header.
if ( $. == 1 ) {
my #header = $par =~ m/\ASer?ver\s+Name\s+(\S+)\s+(\S+)\s+(\S+)\s*\Z/s;
last unless #header;
$header = join qq| |, #header;
next;
}
## Number of '\n' in each paragraph (number of lines minus one).
$newlines = $par =~ tr/\n/\n/;
## Three lines, the CPU info. Extract what I need and print.
if ( $newlines == 2 ) {
printf qq|%s %s %s\n|, $header, $par =~ m/\A([^\n]+).*\n([^\n]+)\Z/s;
next;
}
## Transmission string.
if ( $newlines == 0 ) {
$trans = $par;
next;
}
## Transmission info. Extract numbers and print.
if ( $newlines == 5 ) {
my #lines = split /\n/, $par;
for my $i ( 0 .. $#lines ) {
my #f = split /\s+/, $lines[ $i ];
if ( grep { m/\D/ } #f[ 1 .. $#f ] ) {
next;
}
else {
push #nums, #f[ 1 .. $#f ];
}
}
printf qq|%s %s\n|, $header, join qq| |, #nums;
#nums = ();
}
## Resume info. Extract and print.
if ( $newlines == 4 ) {
$par =~ s/\n/\t/gs;
printf qq|%s %s\n|, $header, $par;
}
}
Run it like:
perl script.pl infile
With following output:
aad98722RHEL 20120630 075022 CPU 5 8 0 1 19
aad98722RHEL 20120630 075022 32 33 39 40 33 6 186 321 199 18 0 0 0 0 0 0 0 0 0 0
aad98722RHEL 20120630 075022 19 19 25 26 23 9 26 40 28 10 0 0 0 0 0 0 0 0 0 0
aad98722RHEL 20120630 075022 Total memory usage: 1412916 kB Resident set size : 1256360 kB Heap usage : 1368212 kB Stack usage : 84 kB Library size : 16316 kB

Related

Using awk to count number of row group

I have a data set: (file.txt)
X Y
1 a
2 b
3 c
10 d
11 e
12 f
15 g
20 h
25 i
30 j
35 k
40 l
41 m
42 n
43 o
46 p
I want to add two columns which are Up10 and Down10,
Up10: From (X) to (X-10) count of row.
Down10 : From (X) to (X+10)
count of row
For example:
X Y Up10 Down10
35 k 3 5
For Up10; 35-10 X=35 X=30 X=25 Total = 3 row
For Down10; 35+10 X=35 X=40 X=41 X=42 X=42 Total = 5 row
Desired Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
3 c 3 4
10 d 4 5
11 e 5 4
12 f 5 3
15 g 4 3
20 h 5 3
25 i 3 3
30 j 3 3
35 k 3 5
40 l 3 5
41 m 3 4
42 n 4 3
43 o 5 2
46 p 5 1
This is the Pierre François' solution: Thanks again #Pierre François
awk '
BEGIN{OFS="\t"; print "X\tY\tUp10\tDown10"}
(NR == FNR) && (FNR > 1){a[$1] = $1 + 0}
(NR > FNR) && (FNR > 1){
up = 0; upl = $1 - 10
down = 0; downl = $1 + 10
for (i in a) { i += 0 # tricky: convert i to integer
if ((i >= upl) && (i <= $1)) {up++}
if ((i >= $1) && (i <= downl)) {down++}
}
print $1, $2, up, down;
}
' file.txt file.txt > file-2.txt
But when i use this command for 13GB data, it takes too long.
I have used this way for 13GB data again:
awk 'BEGIN{ FS=OFS="\t" }
NR==FNR{a[NR]=$1;next} {x=y=FNR;while(--x in a&&$1-10<a[x]){} while(++y in a&&$1+10>a[y]){} print $0,FNR-x,y-FNR}
' file.txt file.txt > file-2.txt
When file-2.txt reaches 1.1GB it is frozen. I am waiting several hours, but i can not see finish of command and final output file.
Note: I am working on Gogole cloud. Machine type
e2-highmem-8 (8 vCPUs, 64 GB memory)
A single pass awk that keeps the sliding window of 10 last records and uses that to count the ups and downs. For symmetricy's sake there should be deletes in the END but I guess a few extra array elements in memory isn't gonna make a difference:
$ awk '
BEGIN {
FS=OFS="\t"
}
NR==1 {
print $1,$2,"Up10","Down10"
}
NR>1 {
a[NR]=$1
b[NR]=$2
for(i=NR-9;i<=NR;i++) {
if(a[i]>=a[NR]-10&&i>=2)
up[NR]++
if(a[i]<=a[NR-9]+10&&i>=2)
down[NR-9]++
}
}
NR>10 {
print a[NR-9],b[NR-9],up[NR-9],down[NR-9]
delete a[NR-9]
delete b[NR-9]
delete up[NR-9]
delete down[NR-9]
}
END {
for(nr=NR+1;nr<=NR+9;nr++) {
for(i=nr-9;i<=nr;i++)
if(a[i]<=a[nr-9]+10&&i>=2&&i<=NR)
down[nr-9]++
print a[nr-9],b[nr-9],up[nr-9],down[nr-9]
}
}' file
Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
...
35 k 3 5
...
43 o 5 2
46 p 5 1
Another single pass approach with a sliding window
awk '
NR == 1 { next } # skip the header
NR == 2 { min = max = cur = 1; X[cur] = $1; Y[cur] = $2; next }
{ X[++max] = $1; Y[max] = $2
if (X[cur] >= $1 - 10) next
for (; X[cur] + 10 < X[max]; ++cur) {
for (; X[min] < X[cur] - 10; ++min) {
delete X[min]
delete Y[min]
}
print X[cur], Y[cur], cur - min + 1, max - cur
}
}
END {
for (; cur <= max; ++cur) {
for (; X[min] < X[cur] - 10; ++min);
for (i = max; i > cur && X[cur] + 10 < X[i]; --i);
print X[cur], Y[cur], cur - min + 1, i - cur + 1
}
}
' file
The script assumes the X column is ordered numerically.

Create bins with totals and percentage

I would like to create bins to get histogram with totals and percentage, e.g. starting from 0.
If possible to set the minimum and maximum value in the bins ( in my case value min=0 and max=20 )
Input file
8 5
10 1
11 4
12 4
12 4
13 5
16 7
18 9
16 9
17 7
18 5
19 5
20 1
21 7
output desired
0 0 0.0%
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
---------------------
Total: 73
I use this code from Mr Ed Morton, it works perfectly but the percentage is missed.
awk 'BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%0.1f %0.1f %d\n", beg, end, cnt[bucketNr]
beg = end
}
}' file
Thanks in advance
Your expected output doesn't seem to correspond to your sample input data, but try this variation of that awk code in your question (Intended to be put in an executable file to run as a script, not a a one-liner due to size):
#!/usr/bin/awk -f
BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
max[bucketNr] = max[bucketNr] < $2 ? $2 : max[bucketNr]
sum += $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%d-%d %d %.1f\n", beg, end, max[bucketNr],
(cnt[bucketNr] / NR) * 100
beg = end
}
print "-------------"
print "Total " sum
}
It adds tracking the maximum of the second column for each bin the first column falls in, and prints out a percentage instead of a count of how many rows were in each bin. Plus some tweaks to the output format to better match your desired output.

rearrange from specific string into respective column

I'm trying to rearrange from specific string into respective column.
etc:
126N (will be sorted into "Normal" column)
Value 1 (the integer will be concatenated with 126)
Resulting :
N=Normal
126 # 1
Here is the input
(N=Normal, W=Weak)
Value 1
126N,
Value 3
18N,
Value 4
559N, 562N, 564N,
Value 6
553W, 565A, 553N,
Value 5
490W,
Value 9
564N,
And the output should be
W=Weak
490 # 5
553 # 6
A=Absolute
565 # 6
N=Normal
126 # 1
18 # 3
559 # 4
562 # 4
564 # 4
553 # 6
564 # 9
Let me know your thought on this.
I've tried this script, I'm still figuring out to concatenating the value
cat input.txt | sed '/^\s*$/d' | awk 'BEGIN{RS=","};match($0,/N/){print $3"c"$2}' | sed ':a;N;$!ba;s/\n/;/g' | sed 's/W//g;s/N//g;s/S//g'
And some of it, are missing
This should give you what you want using gnu awk
IT will work with any number of letters, not just A N W
awk -F, '
!/Value/ {
for (i=1;i<NF;i++) {
hd=substr($i,length($i),1);
arr[hd][++cnt[hd]]=($i+0" # "f)}
}
{split($0,b," ");f=b[2];}
END {
for (i in arr) { print "\n"i"\n---";
for (j in arr[i]) {
print arr[i][j]}}
}' file
A
---
565 # 6
N
---
562 # 4
564 # 4
553 # 6
564 # 9
126 # 1
18 # 3
559 # 4
W
---
553 # 6
490 # 5
Another alternative in awk would be:
awk -F',| ' '
$1 == "Value" {value = $2; next}
{ for (i=1; i<=NF; i++) {
if ($i~"N$")
N[substr($i, 1, length($i) - 1)] = value
if ($i~"W$")
W[substr($i, 1, length($i) - 1)] = value
}
}
END {
print "W=Weak"
for (i in W)
print i, "#", W[i]
print "\nN=Normal"
for (i in N)
print i, "#", N[i]
}
' file
(note: this relies on knowing the wanted headers are W=Weak and N=Normal. If would take a few additional expression if the headers are subject to change.)
Output
$ awk -F',| ' '
> $1 == "Value" {value = $2; next}
> { for (i=1; i<=NF; i++) {
> if ($i~"N$")
> N[substr($i, 1, length($i) - 1)] = value
> if ($i~"W$")
> W[substr($i, 1, length($i) - 1)] = value
> }
> }
> END {
> print "W=Weak"
> for (i in W)
> print i, "#", W[i]
> print "\nN=Normal"
> for (i in N)
> print i, "#", N[i]
> }
> ' file
W=Weak
490 # 5
N=Normal
18 # 3
126 # 1
559 # 4
562 # 4
564 # 9
$ cat tst.awk
NR%2 { val = $NF; next }
{
for (i=1; i<=NF; i++) {
num = $i+0
abbr = $i
gsub(/[^[:alpha:]]/,"",abbr)
list[abbr] = list[abbr] num " # " val ORS
}
}
END {
n = split("Weak Absolute Normal",types)
for (i=1; i<=n; i++) {
name = types[i]
abbr = substr(name,1,1)
print abbr "=" name ORS list[abbr]
}
}
.
$ awk -f tst.awk file
W=Weak
553 # 6
490 # 5
A=Absolute
565 # 6
N=Normal
126 # 1
18 # 3
559 # 4
562 # 4
564 # 4
553 # 6
564 # 9

AWK/SED/getline - How to simplify/improve this example?

I'm trying to take a 3 column input file and separate it based on a condition in column 3. I think it'll be easier to show you than explain:
Input File:
outputfile1.txt
26 NCC 1 # First Start
38 NME 2
44 NSC 1 # Start2
56 NME 2
62 NCC 1 # Start3
...
314 NCC 1 # Start17
326 NME 2
332 NSC 1 # Start18
344 NME 2
349 NME 2 # Final End
(The hashed comments aren't part of the file, I've added to make things clearer).
Column 3 is used to determine a new "START" entry
"START/END" values are from Column 1
"TITLE" I would like to be all values from Column 2 between consecutive "STARTS"
Desired Output
outputfile2.txt
START=26 ; END=43 ; TITLE=NCC_NME
START=44 ; END=61 ; TITLE=NSC_NME
START=62 ; END=79 ; TITLE=NCC_...
...
START=314 ; END=331 ; TITLE=NCC_NME
START=332 ; END=349 ; TITLE=NSC_NME
Crude script that 'almost' does this but makes 5 single column temporary files in the process.
awk '{ print $1 }' outputfile1.txt | sed '$d' > tempfile1.txt
awk '{ print $1-1 }' outputfile1.txt | sed '$d' > tempfile2.txt
sed '$d' outputfile1.txt | awk 'NR{print $3-p}{p=$3}' > tempfile3.txt
awk ' { getline value < "tempfile1.txt" }
{ if (NR==1)
print value ;
else if( $1 != 1 )
print value }' tempfile3.txt > tempfile4.txt
awk ' { getline value < "tempfile2.txt" }
{ if (NR==1)
print value ;
else if ( $1 != 1 )
print value }' tempfile3.txt | sed '1d' > tempfile5.txt
awk 'END{print $1}' outputfile1.txt >> tempfile5.txt
awk ' { getline value < "tempfile5.txt" }
{print "START="$0 " ; END="value}' tempfile4.txt > outputfile2.txt
Contents of temp files
| temp1 temp2 temp3
NR=1 | 26 25 1
NR=2 | 38 37 1
NR=3 | 44 43 -1
NR=4 | 56 55 1
NR=5 | 62 61 -1
... | ... ... ...
NR=33 | 314 313 -1
NR=34 | 326 325 1
NR=35 | 332 331 -1
NR=36 | 344 343 1
----------------------------------
| temp4 temp5
NR=1 | 26 43
NR=2 | 44 61
NR=3 | 62 79
... | ... ...
NR=17 | 314 331
NR=18 | 332 359
Current output
outputfile2.txt
START=26 ; END=43
START=44 ; END=61
START=62 ; END=79
...
START=314 ; END=331
START=332 ; END=349
Try:
awk '
function print_range() {
printf "START=%s ; END=%s ; TITLE=%s\n", start, end-1, title
}
{
end=$1
}
# if column 3 is equal to 1, then there is a new start
$3==1 {
if(title) print_range()
start=$1
title=$2
next
}
# if the label in field 2 is not part of the title then add it
title!~"(^|_)" $2 "(_|$)" {
title=title"_"$2
}
END {
end++
print_range()
}
' file
You can do everything in one go using:
awk '{
if(NR==1){
# if we are the first record we initialize our variables
PREVIOUS_ONE=$1
TITLE=$2
PREVIOUS_THIRD=$3
} else {
# as long as the new third column is larger we update our variables
if(PREVIOUS_THIRD < $3) {
TITLE=TITLE"_"$2
PREVIOUS_THIRD=$3
} else {
# this means the third column was smaller
# we print out the data and reinitialize our variables
print "START="PREVIOUS_ONE" ; END="$1-1" ; TITLE= "TITLE;
PREVIOUS_ONE=$1
TITLE=$2
PREVIOUS_THIRD=$3
}
}
}' outputfile1.txt

Awk Conditional Test Statement

I would really appreciate some help. I spent almost the whole morning on it.
I have a data of structure field 1 to 16 as follows
4572 1307084940 RDCSWE 2006 1 5 0.28125 0.5 0.125 0.09375 0 0 0 0 0 0
4573 1307101627 RDCSWE 2006 1 5 0.6875 0.125 0.1875 0 0 0 0 0 0 0
4574 1307101642 RDCSWE 2006 1 5 0.5625 0.25 0.03125 0.15625 0 0 0 0 0 0
4575 1307101662 RDCSWE 2006 1 5 0.53125 0.25 0.1875 0.03125 0 0 0 0 0 0
4576 1307127329 RDCSWE 2006 1 5 0.4375 0.34375 0.09375 0.125 0 0 0 0 0 0
From field 7 to 10 I need a test on the elements (ranging fro 0-1) and the field number.
i.e. for every record, check the fields 7-10 for maximum value,
if found and its in field 7 print $0, $6-4
if found and its in field 8 print $0, $6-3
if found and its in field 9 print $0, $6-2
if found and its in field 10 print $0, $6-1
I'll be so grateful for the help. Thank you in advance
Edit (by belisarius)
Just transcripting a comment from #Tumi2002 (author)
Sorry, my 6th field (i.e. $6) has values 1-5.
I am trying to reclassify records where field 6=5 back into 1-4 classes in the same fieid).
So that instead of 5 classes I have 4.
Awk '$6==5
{for i=7; i<11; i++)
if ($i==max) && NF==7) print $0,$6-4;
if ($i==max) && NF==8) print $0,$6-3;
if ($i==max) && NF==9) print $0,$6-2;
if ($i==max) && NF==10) print $0,$6-1
I am struggling with the syntax in awk
{
max=0; maxindex=0;
for (i=7; i<=10; i++)
{
if ($i>max){
maxindex=i;
max=$i;
# print i;
}
}
if (maxindex > 0){
print $6-11+maxindex;
}
}
Running at ideone
Output for your example data:
2
1
1
1
1
Edit
Modified answering your comment:
($6 == 5){
max=0; maxindex=0;
for (i=7; i<=10; i++)
{
if ($i>max){
maxindex=i;
max=$i;
# print i;
}
}
if (maxindex > 0){
print $0,"-->",$6-11+maxindex;
}
}
Output:
4572 1307084940 RDCSWE 2006 1 5 0.28125 0.5 0.125 0.09375 0 0 0 0 0 0 --> 2
4573 1307101627 RDCSWE 2006 1 5 0.6875 0.125 0.1875 0 0 0 0 0 0 0 --> 1
4574 1307101642 RDCSWE 2006 1 5 0.5625 0.25 0.03125 0.15625 0 0 0 0 0 0 --> 1
4575 1307101662 RDCSWE 2006 1 5 0.53125 0.25 0.1875 0.03125 0 0 0 0 0 0 --> 1
4576 1307127329 RDCSWE 2006 1 5 0.4375 0.34375 0.09375 0.125 0 0 0 0 0 0 --> 1
Running at ideone here
First of all, thanks for belisarius for pointing to ideone.
My (updated) solution is working correctly now:
# max value in an array, copied verbatim from the gawk manual (credit)
function maxelt(vec, i, ret)
{
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
# Load all fields of each record into nums.
{
delete nums
for(i = 7; i <= 10; i++)
{ nums[NR, i] = $i }
### DEBUG print NR, maxelt(nums)
if ( $7 == maxelt(nums) ) { print $0, ($6-4) }
if ( $8 == maxelt(nums) ) { print $0, ($6-3) }
if ( $9 == maxelt(nums) ) { print $0, ($6-2) }
if ( $10 == maxelt(nums) ) { print $0, ($6-1) }
}
HTH