Append spaces in the end of line by using printf function in unix - printf

I have the following script running in unix which suppose padding 9 spaces in the end of a line. However, the output result were not as I expected.
# define SPACE " "
void main {
printf("%13s%9s\n", "Output 1: 123", SPACE);
printf("%13s%-9s\n", "Output 2: 123", SPACE);
printf("%13s%9s%1s\n", "Output 3: 123", SPACE, "|");
printf("%13s%-9s%1s\n", "Output 4: 123", SPACE, "|");
}
The result:
Output 1: 123
Output 2: 123
Output 3: 123 |
Output 4: 123 |
The output 1 and 2 trim the spaces in the end, the output 3 and 4 has the 9 spaces if I put something behind.
How to show "Output 1: 123 "?
cheer

Related

Replacing sequences of space-delimited numbers in huge input with awk

How can I replace sequences of space-delimited numbers when those sequences span multiple lines and the input is too big to fit in RAM.
A sample input would be:
edit: I re-worked the sample input and input parameters for introducing border cases (excluding ones that have to do with the length of the matched sequence or replacement priorities)
3 12 3 4
0 6 7 10
8 9 12 3
4 6 7 8
10 6 6 7
9 199 10 11
11
note: the number of fields per line is homogeneous but not known in advance; the last line might contain less fields
From that input I would like to:
replace 3 4 with &
replace 6 7 8 with 9 9
replace 6 7 9 with 8 8
replace 7 10 with 11 12
replace 0 with nothing
replace 10 with 13 10
replace 8 9 12 3 5 with #
The expected output would have one number or replacement per line:
3
12
&
6
11 12
8
9
12
&
9 9
13 10
6
8 8
199
13 10
11
11
I'm trying to do the task with awk but I'm having a hard time implementing a dynamic state machine with a pseudo B-Tree:
tr -s '[:space:]' '\n' < input.txt |
awk '
BEGIN {
for (i = 2; i < ARGC; i += 2) {
n = split(ARGV[i], arr)
k = ""
for (j = 1; j <= n; j++) {
k = j SUBSEP k SUBSEP arr[j]
Tree[k]
}
Tree[k] = "$" ARGV[i+1] #=> now can test "if (Tree[k])"
delete ARGV[i]
delete ARGV[i+1]
}
}
{
Key = (int(Key) + 1) SUBSEP Key SUBSEP $1
if ( Key in Tree ) {
if (Tree[Key]) {
print substr(Tree[Key],2)
Buffer = ""
Key = ""
}
else
Buffer = Buffer $1 "\n"
} else {
print Buffer $1
Buffer = ""
Key = ""
}
}
END { if (Buffer != "") printf ("%s", Buffer) }
' - \
'3 4' '&' \
'6 7 8' '9 9' \
'6 7 9' '8 8' \
'7 10' '11 12' \
'0' '' \
'10' '13 10' \
'8 9 12 3 5' '#'
edit: I realised that the code doesn't backtrack after failing to find a complete match in the B-tree, so it's wrong...
How I'm planning to tackle the problem
I'm emulating a B-tree with an array and keys in the following format:
from the middle to the left of the key are the consecutive depths
from the middle to the right of the key are the consecutive values
When a key exists in Tree:
if it doesn't have an associated value then it's a node
if there's a value then it's a leaf
So, for the current input parameters, the content of the Tree array will be:
# from param: "3 4" => "&"
Tree[ 1,"",3 ]
Tree[2,1,"",3,4] = "$&"
# from param: "6 7 8" => "9 9"
Tree[ 1,"",6 ]
Tree[ 2,1,"",6,7 ]
Tree[3,2,1,"",6,7,8] = "$9 9"
# from param: "6 7 9" => "8 8"
Tree[ 1,"",6 ]
Tree[ 2,1,"",6,7 ]
Tree[3,2,1,"",6,7,9] = "$8 8"
# from param: "7 10" => "11 12"
Tree[ 1,"",7 ]
Tree[2,1,"",7,10] = "$11 12"
# from param: "0" => ""
Tree[1,"",0] = "$"
# from param: "10" => "13 10"
Tree[1,"",10] = "$13 10"
# from param: "8 9 12 3 5" => "#"
Tree[ 1,"",8 ]
Tree[ 2,1,"",8,9 ]
Tree[ 3,2,1,"",8,9,12 ]
Tree[ 4,3,2,1,"",8,9,12,3 ]
Tree[5,4,3,2,1,"",8,9,12,3,5] = "$#"
FWIW I'd approach this by figuring out the max number of records that you might need to search in based on the mappings you want, keep a rolling buffer of that number of records, and then do the comparison part on each buffer, e.g.:
$ cat tst.awk
BEGIN {
RS = "[[:space:]]+"
map("3,4" , "&")
map("6,7,8" , "9")
map("9" , "")
map("0" , "\\000")
map("13,10" , "10")
}
{ buf[((NR-1) % maxRecs) + 1] = $0 }
NR >= maxRecs { prt() }
END { prt() }
function prt( nr,sep,str) {
for ( nr=NR-maxRecs+1; nr<=NR; nr++ ) {
str = str sep buf[((nr-1) % maxRecs) + 1]
sep = ORS
}
print ">>>>" ORS str ORS "<<<<"
# Replace the above with something that loops through the
# strings you want replaced, e.g.
#
# for ( mapNr=1; mapNr<=numMaps; mapNr++ ) {
# old = olds[mapNr]
# if ( str ~ old ) { # add something to avoid partial matches
# new = news[mapNr]
# replace old with new in the output
# }
# }
}
function map(old,new, numRecs) {
++numMaps
numRecs = gsub(/,/,ORS,old) + 1
maxRecs = ( numRecs > maxRecs ? numRecs : maxRecs )
olds[numMaps] = old
news[numMaps] = new
}
$ awk -f tst.awk file
>>>>
112
3
4
<<<<
>>>>
3
4
6
<<<<
>>>>
4
6
7
<<<<
>>>>
6
7
8
<<<<
>>>>
7
8
9
<<<<
>>>>
8
9
12
<<<<
>>>>
9
12
0
<<<<
>>>>
12
0
3
<<<<
>>>>
0
3
4
<<<<
>>>>
3
4
15
<<<<
>>>>
4
15
255
<<<<
>>>>
15
255
13
<<<<
>>>>
255
13
10
<<<<
>>>>
13
10
6
<<<<
>>>>
10
6
7
<<<<
>>>>
6
7
8
<<<<
>>>>
7
8
199
<<<<
>>>>
8
199
9
<<<<
>>>>
199
9
0
<<<<
>>>>
9
0
13
<<<<
>>>>
9
0
13
<<<<
The above is just printing the buff-sized strings, the part to be added is replacing the target strings with the new ones in a way that the next target doesn't match the replaced part which is a common problem with, I expect, lots of solutions online so it's left as an exercise.
You'll also need to tweak it to make sure it doesn't revisit lines at the end of the input.
The above uses GNU awk for multi-char RS, if you don't have GNU awk then just pipe the input from tr -s '[:space:]' '\n' as shown in the question.
UPDATE:
previous answer (see edit revisions) was woefully slow (several minutes) when run against a ramped up input (7K mappings in map.txt; 25M tokens
in input.txt1)
new answer (below) is a complete rewrite and processes the 7K-mappings/25M-tokens in ~45 seconds
The main component of this design centers around a tree-like node structure used to manage the series of tokens (lines of input from map.txt):
tree [ParentNodeNbr] [token] [NodeType] = value
Where:
ParentNodeNbr == 0 for the root
token from map.txt
NodeType has one of two values 'node' or 'leaf'
for NodeType = 'node' the value stored in the array is a numeric node number (implemented as an counter that's incremented each time a new node is added to the tree); this node number becomes the ParentNodeNbr for the next token in the series
for NodeType = 'leaf' this designates the 'end' of a series of tokens (line of input from map.txt) and the value stored in the array is the line number (aka FNR) from map.txt; this line number (FNR) is used as an index into a couple other arrays and to determine precendence when an input sequence (from input.txt) has multiple matches from map.txt
when processing a series of tokens from a map.txt line of input we start at ParentNodeNbr == 0 looking for a series of matching nodes, adding new nodes as needed
Setup: storing replacements in a comma-delimited file (map.txt), and adding one additional line to input.txt:
$ head map.txt input.txt
==> map.txt <==
2 3 4,X # "2 3 4" has precendence over ...
2 3,Y # "2 3"
3 4,&
6 7 8,9
9,
0,\000
13 10,10
==> input.txt <==
2 3 4 # keep eye on "2 3" vs "2 3 4" precendence
112 3
4 6 7
8 9 12 0 3
4 15 255 13
10 6
7 8 199 9
0 13
NOTE: here's what tree[][][] looks like when populated from map.txt:
tree [Parent] [Token] [NodeType] = NodeVal
Parent Token NodeType NodeVal MapTo ** MapTo only applies to NodeType = leaf
====== ===== ======== ======= =====
0 0 leaf 6 "\000"
0 2 node 1
0 3 node 3
0 6 node 4
0 9 leaf 5 ""
0 13 node 6
1 3 node 2
1 3 leaf 1 "Y"
2 4 leaf 2 "X"
3 4 leaf 3 "&"
4 7 node 5
5 8 leaf 4 "9"
6 10 leaf 7 "10"
One GNU awk (for multidimensional arrrays):
awk '
function replace(op) {
while ( ((maxToken - minToken + 1) >= maxlen) || op == "flush" ) {
NodeNbr=root
minOrd=maxOrd
for (j=0 ; j<maxlen; j++) { # loop through tokens in buffer[]
token=buffer[ ((minToken + j - 1) % maxlen) + 1 ]
# if we find a matching "leaf" node then keep track of the ordering (ie, FNR from map.txt; lower order == higher precedence)
if ( token in tree [NodeNbr] && "leaf" in tree[NodeNbr][token] )
minOrd= ( tree[NodeNbr][token]["leaf"] < minOrd ) ? tree[NodeNbr][token]["leaf"] : minOrd
# if we find a matching "node" node then grab the next node to compare against the next token from buffer[]
if ( token in tree[NodeNbr] && "node" in tree[NodeNbr][token] ) {
NodeNbr=tree[NodeNbr][token]["node"]
continue
}
break # if we get here we have a token from buffer[] that does not match any of our replacement mappings so abort checking rest of buffer[]
}
if (minOrd < maxOrd) { # if we found at least one complete match (ie, hit a "leaf" node) then ...
print map[minOrd] # use the associated "ord"er to print the associated replacement string and ...
minToken=minToken + len[minOrd] # update the pointer into the buffer[] array
}
else { # otherwise we did not find a match so ...
print buffer[ ((minToken - 1) % maxlen) + 1 ] # print the first token from buffer[] and ...
minToken++ # update the pointer into the buffer[] array
}
if (minToken > maxToken)
break
}
}
BEGIN { root=maxNodeNbr=maxToken=0
minToken=1
maxOrd=9999999999
}
FNR==NR { split($0,a,",")
map[FNR]=a[2] # save replacement string for this input line from map.txt
n=split(a[1],b) # break our matching pattern into tokens
len[FNR]=n # make note of number of tokens in this line of input
maxlen=(n > maxlen) ? n : maxlen # keep track of longest series of tokens
NodeNbr=root # initiate our tree search
for (i=1 ; i<=n ; i++) { # loop through our list of tokens
token=b[i]
if (i==n) # if the last token for this line then create a "leaf" node and store the line number (aka "order")
tree[NodeNbr][token]["leaf"]=FNR
else
if ( tree[NodeNbr][token]["node"] ) # else if we already have a node at this point in the tree then grab its associated node number for the next level in the tree
NodeNbr=tree[NodeNbr][token]["node"]
else { # else create a new "node" node and populate with the next available node number
tree[NodeNbr][token]["node"]=++maxNodeNbr
NodeNbr=maxNodeNbr # use this as the next level in our tree traversal
}
}
maxrec=FNR # keep track of total number of replacement sets from map.txt (only used if we decide to print the contents of map[] to stdout
next
}
FNR==1 {
# Uncomment following to display the contents of the map[] array:
# for (i=1;i<=maxrec;i++)
# print "map:" i ":" map[i] ":"
#
# Uncomment following to display the contents of the tree[][][] array:
# fmt="%6s%8s%10s%10s%10s\n"
# fmt="%6s%8s%10s%10s%10s\n"
# printf "tree [Parent] [Token] [NodeType]\n\n"
# printf fmt, "Parent", "Token", "NodeType", "NodeVal", "MapTo"
# printf fmt, "======", "=====", "========", "=======", "====="
#
# for (NodeNbr=root ; NodeNbr<=maxNodeNbr ; NodeNbr++)
# for (token in tree[NodeNbr])
# for (NodeType in tree[NodeNbr][token]) { # ??
# NodeVal=tree[NodeNbr][token][NodeType]
# printf fmt, NodeNbr, token, NodeType, NodeVal, (NodeType=="leaf") ? "\"" map[NodeVal] "\"" : ""
# }
}
{ for (i=1 ; i<=NF ; i++) { # loop through tokens in current line from input.txt
maxToken++
buffer[ ((maxToken - 1) % maxlen) + 1 ] = $i
if ( (maxToken - minToken + 1) >= maxlen ) # if we have a "full" buffer then ...
replace() # look for replacement match
}
}
END { replace("flush") } # flush the rest of buffer[]
' map.txt input.txt
This generates:
X # "2 3 4" has precendence over "2 3"
112
&
9
12
\000
&
15
255
10
9
199
\000
13
If we switch the first 2 lines of map.txt like such:
==> map.txt <==
2 3,Y # "2 3" has precendence over ...
2 3 4,X # "2 3 4"
We now generate:
Y # "2 3" has precendence over "2 3 4" thus ...
4 # leaving "4" by itself
112
&
9
12
\000
&
15
255
10
9
199
\000
13

AWK to print min max values of unique value in column

I am trying to use awk to do the following:
Input file:
6:28866209 NA NA NA 8.51368e-06 Y
6:28856689 1 0.007828 1 1.50247e-06 X
6:28856740 2 0.007828 1 1.50247e-06 Y
6:28856889 3 7.51E-08 3 1.50247e-06 X
I want to:
Get min and max of column 5 for each independent value in column 6
Print the min max in the file for each column 5 at the end of the file
The file can have different N columns, but all have at least columns 1-8, which are the same in each of my files.
Output:
6:28866209 NA NA NA 8.51368e-06 Y 8.51368e-06 1.50247e-06
6:28856689 1 0.007828 1 1.50247e-06 X 1.50247e-061.50247e-06
6:28856740 2 0.007828 1 1.50247e-06 Y 8.51368e-06 1.50247e-06
6:28856889 3 7.51E-08 3 1.50247e-06 X 1.50247e-06 1.50247e-06
I have attempted this using the following awk command, but I am only getting back the first value in column 6...
awk 'BEGIN{OFS="\t";FS="\t"}{if (a[$6] == "") a[$6]=$5; if (a[$6] > $5) {a[$6]=$5}} {if (b[$6] == "") b[$6]=$5; if (b[$6] < $5) {b[$6]=$5}} END {if (i=$6) print $0,i,a[i],b[i]}' FILE
I believe the easiest to do is using a double pass of the file :
awk '(NR==FNR) && !($6 in min) { min[$6] = $5; max[$6] = $5; next }
(NR==FNR) { m=min[$6]; M=max[$6];
min[$6] = $5<m ? $5 : m;
max[$6] = $5>M ? $5 : M;
next; }
{print $0,min[$6],max[$6] }' <file> <file>
Your original code has the following flaw. The END statement is only executed when the end of the file is reached. You attempt to print the full file, but you did not store any lines in the parsing.
A correction to your original idea is :
awk 'BEGIN{OFS="\t";FS="\t"}
{if (a[$6] == "") a[$6]=$5;
if (a[$6] > $5) {a[$6]=$5}
}
{if (b[$6] == "") b[$6]=$5;
if (b[$6] < $5) {b[$6]=$5}
}
{ c[NR]=$0; d[NR]=$6 }
END { for (i=1;i<=NR;i++) print c[i],a[d[i]],b[d[i]] }' FILE
Here, I stored the full FILE in array c which is indexed by the line-number NR. I also store the index $6 in array d. At the end, I loop trough all lines I stored and print what is expected.
The downside of this approach is that you have to store the full file in memory.
The downside of my proposal, is that you have to read the full file twice from disk.
awk 'FNR<NR{$7=m[$6];$8=M[$6];print;next} (!M[$6])||$5>M[$6]{M[$6]=$5}(!m[$6])||$5<m[$6]{m[$6]=$5}' file file
with comment
awk '
# optional format
BEGIN { OFS=FS="\t"}
# for second pass (second file read)
FNR<NR{
# add a column 7 and 8 with value of min and max correponsing to column 5
$7=m[$6];$8=M[$6]
# print it and reda next line (don't go further in script)
print;next}
# this point is only reach by first file read
# if Max is unknow or value 5 bigger than max
(!M[$6])||$5>M[$6]{
# set new max
M[$6]=$5}
# do the same for min
(!m[$6])||$5<m[$6]{m[$6]=$5}
# read 2 times the same file (first to find min/max, second to print it)
' sample.txt sample.txt

Awk - Substring comparison

Working native bash code :
while read line
do
a=${line:112:7}
b=${line:123:7}
if [[ $a != "0000000" || $b != "0000000" ]]
then
echo "$line" >> FILE_OT_YHAV
else
echo "$line" >> FILE_OT_NHAV
fi
done <$FILE_IN
I have the following file (its a dummy), the substrings being checked are both on the 4th field, so nm the exact numbers.
AAAAAAAAAAAAAA XXXXXX BB CCCCCCC 12312312443430000000
BBBBBBB AXXXXXX CC DDDDDDD 10101010000000000000
CCCCCCCCCC C C QWEQWEE DDD AAAAAAA A12312312312312310000
I m trying to write an awk script that compares two specific substrings, if either one is not 000000 it outputs the line into File A, if both of them are 000000 it outputs the line into File B, this is the code i have so far :
# Before first line.
BEGIN {
print "Awk Started"
FILE_OT_YHAV="FILE_OT_YHAV.test"
FILE_OT_NHAV="FILE_OT_NHAV.test"
FS=""
}
# For each line of input.
{
fline=$0
# print "length = #" length($0) "#"
print "length = #" length(fline) "#"
print "##" substr($0,112,7) "##" substr($0,123,7) "##"
if ( (substr($0,112,7) != "0000000") || (substr($0,123,7) != "0000000") )
print $0 > FILE_OT_YHAV;
else
print $0 > FILE_OT_NHAV;
}
# After last line.
END {
print "Awk Ended"
}
The problem is that when i run it, it :
a) Treats every line as having a different length
b) Therefore the substrings are applied to different parts of it (that is why i added the print length stuff before the if, to check on it.
This is a sample output of the line length awk reads and the different substrings :
Awk Started
length = #130#
## ## ##
length = #136#
##0000000##22016 ##
length = #133#
##0000001##16 ##
length = #129#
##0010220## ##
length = #138#
##0000000##1022016##
length = #136#
##0000000##22016 ##
length = #134#
##0000000##016 ##
length = #137#
##0000000##022016 ##
Is there a reason why awk treats lines of the same length as having a different length? Does it have something to do with the spacing of the input file?
Thanks in advance for any help.
After the comments about cleaning the file up with sed, i got this output (and yes now the lines have a different size) :
1 0M-DM-EM-G M-A.M-E. #DEH M-SM-TM-OM-IM-WM-EM-IM-A M-DM-V/M-DM-T/M-TM-AM-P 01022016 $
2 110000080103M-CM-EM-QM-OM-MM-TM-A M-A. 6M-AM-HM-GM-MM-A 1055801001102 0000120000012001001142 19500000120 0100M-D000000000000000000000001022016 $
3 110000106302M-TM-AM-QM-EM-KM-KM-A 5M-AM-AM-HM-GM-MM-A 1043801001101 0000100000010001001361 19500000100M-IM-SM-O0100M-D000000000000000000000001022016 $
4 110000178902M-JM-AM-QM-AM-CM-IM-AM-MM-MM-G M-KM-EM-KM-AM-S 71M-AM-HM-GM-MM-A 1136101001101 0000130000013001006061 19500000130 0100M-D000000000000000000000001022016 $

How to append column to existing file in awk?

I have a file named bt.B.1.log that looks like this:
.
.
.
Time in seconds = 260.37
.
.
.
Compiled procs = 1
.
.
.
Time in seconds = 260.04
.
.
.
Compiled procs = 1
.
.
.
and so on for 40 records of Time in seconds and Compiled procs (dots represent useless lines).
How do I add a single column with the value of Compiled procs (which is 1) to the result of the following two commands:
This prints the average of Time in seconds values (thanks to dawg for this one)
awk -F= '/Time in seconds/ {s+=$2; c++} END {print s/c}' bt.B.1.log > t1avg.dat
Desired output:
260.20 1
This prints all of the ocurrences of Time in seconds, but there is a small problem with it; it is printing an extra blank line at the beginning of the list.
awk 'BEGIN { FS = "Time in seconds =" } ; { printf $2 } {printf " "}' bt.B.1.log > t1.dat
Desired output:
260.37 1
260.04
.
.
.
In both cases I need the value of Compiled procs to appear only once, preferrably in the first line, and no use of intermediate files.
What I managed to do so far prints all values of Time in seconds with the Compiled procs column appearing in every line and with a strange identation:
awk '/seconds/ {printf $5} {printf " "} /procs/ {print $4}' bt.B.1.log > t1.dat
Please help!
UPDATE
Contents of file bt.B.1.log:
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Start in 16:40:51--25/12/2014
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of active processes: 1
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.7507984505732E-14
2 0.9933052259015E+02 0.9933052259015E+02 0.3147459568137E-14
3 0.3564602564454E+03 0.3564602564454E+03 0.4783990739472E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.2309751522921E-13
5 0.3270754125466E+04 0.3270754125466E+04 0.8481098651866E-14
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.2682819657265E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1989963674771E-15
3 0.1312257334221E+02 0.1312257334221E+02 0.4060995034457E-15
4 0.1200692532356E+02 0.1200692532356E+02 0.2958887128106E-15
5 0.1245957615104E+03 0.1245957615104E+03 0.2281113665977E-15
Verification Successful
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 260.37
Total processes = 1
Compiled procs = 1
Mop/s total = 2696.83
Mop/s/process = 2696.83
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 25 Dec 2014
Compile options:
MPIF77 = mpif77
FLINK = $(MPIF77)
FMPI_LIB = -L/usr/lib/openmpi/lib -lmpi -lopen-rte -lo...
FMPI_INC = -I/usr/lib/openmpi/include -I/usr/lib/openm...
FFLAGS = -O
FLINKFLAGS = -O
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb#nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
Finish in 16:45:14--25/12/2014
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Start in 16:58:50--25/12/2014
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of active processes: 1
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.7507984505732E-14
2 0.9933052259015E+02 0.9933052259015E+02 0.3147459568137E-14
3 0.3564602564454E+03 0.3564602564454E+03 0.4783990739472E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.2309751522921E-13
5 0.3270754125466E+04 0.3270754125466E+04 0.8481098651866E-14
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.2682819657265E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1989963674771E-15
3 0.1312257334221E+02 0.1312257334221E+02 0.4060995034457E-15
4 0.1200692532356E+02 0.1200692532356E+02 0.2958887128106E-15
5 0.1245957615104E+03 0.1245957615104E+03 0.2281113665977E-15
Verification Successful
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 260.04
Total processes = 1
Compiled procs = 1
Mop/s total = 2700.25
Mop/s/process = 2700.25
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 25 Dec 2014
Compile options:
MPIF77 = mpif77
FLINK = $(MPIF77)
FMPI_LIB = -L/usr/lib/openmpi/lib -lmpi -lopen-rte -lo...
FMPI_INC = -I/usr/lib/openmpi/include -I/usr/lib/openm...
FFLAGS = -O
FLINKFLAGS = -O
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb#nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
Finish in 17:03:12--25/12/2014
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
There are 40 entries in the log, but I've provided only 2 for abbreviation purposes.
To fix the first issue, replace:
awk -F= '/Time in seconds/ {s+=$2; c++} END {print s/c}' bt.B.1.log > t1avg.dat
with:
awk 'BEGIN { FS = "[ \t]*=[ \t]*" } /Time in seconds/ { s += $2; c++ } /Compiled procs/ { if (! CP) CP = $2 } END { print s/c, CP }' bt.B.1.log >t1avg.dat
A potential minor issue is that 260.205 1 might be output but the question does not address this as a weakness of the given script. Rounding with something like printf "%.2f %s\n", s/c, CP gives 260.21 1 though. To truncate the extra digit, use something like printf "%.2f %s\n", int (s/c * 100) / 100, CP.
To fix the second issue, replace:
awk 'BEGIN { FS = "Time in seconds =" } ; { printf $2 } {printf " "}' bt.B.1.log > t1.dat
with:
awk 'BEGIN { FS = "[ \t]*[=][ \t]" } /Time in seconds/ { printf "%s", $2 } /Compiled procs/ { if (CP) { printf "\n" } else { CP = $2; printf " %s\n", $2 } }' bt.B.1.log > t1.dat
BTW, the "strange indentation" is a result of failing to output a newline when using printf and failing to filter unwanted input lines from processing.

Awk space-delimited file content

I have a file whose lines I want to split using either space or "_".
Its format is
f 5.287102213 _10_ RTR --- 312 cbr 120 [13a a 6 800] ------- [6:0 20:0 29 20] [15] 1 0
s 5.288000000 _0_ AGT --- 322 cbr 100 [0 0 0 0] ------- [0:0 2:0 32 0] [18]
My awk script is as follows:
`#!/usr/bin/awk -f
BEGIN {FS="[[:space:]]|_"} # use posix space or underscore for FS
{
action = $1;
time = $2;
sta = $4 ; # shifted here because underscores are delimiters
dest = $6;
app = $10;
pkt_size = $11;
#print $1
#print $2
print $5
#print $4
#print $5
#print $6
#print $7
#print $8
#print $9
#print $10
if( action == "s" && dest == "MAC" && app == "cbr"){
startTime+=time ;
count++;
}
if( action == "r" && dest == "MAC" && app == "cbr"){
endTime+=time ;
receivedSize+=pkt_size ;
}
}`
As seen in the above script, from the above script I was expecting RTR to be in $4.
But I find that the output of $3 is as follows:
RTR --- 312 cbr 120 [13a a 6 800] ------- [6:0 20:0 29 20] [15] 1 0
AGT --- 322 cbr 100 [0 0 0 0] ------- [0:0 2:0 32 0] [18] 0 0
RTR --- 322 cbr 100 [0 0 0 0] ------- [0:0 2:0 32 0] [18] 0 0
What am I doing wrong? Am new to awk.
Change your FS value to [[:space:]_]+ to get the tokenization (splitting into fields) you want.
Test it with this statement to see the fields recognized:
awk -F'[[:space:]_]+' '{for(i=1;i<=NF;++i){print i ": " $i}}' \
<<<'f 5.287102213 _10_ RTR --- 312 cbr 120 [13a a 6 800] ------- [6:0 20:0 29 20] [15] 1 0'
The problem with your FS value, [[:space:]]|_, is that
it only recognizes 1 character at a time as the separator
it only recognizes either whitespace or _ as the separator.
Note that specifying an explicit FS value other than ' ' (a single space) causes awk to look for a single instance of that separator, and interprets multiple adjacent instances as separating multiple - and thus empty - fields.
Thus, in your case, the spans <space>_ and _<space> each represent not a single separator, but two separators abutting an empty field.
If you want spans (runs) of a given character or characters from a set to be interpreted as a single separator instance, use duplication symbol +.
However, the proposed FS value, [[:space:]_]+, may be too permissive, as it would recognize a run of any mix of whitespace and _ chars. as a separator.
To be more restrictive, you could use the following FS value:
[[:space:]]+_?|_?[[:space:]]+
That said, if the _ chars in your input function more like delimiters enclosing only one field, a better solution may be:
to use the DEFAULT value of FS, which recognizes runs of whitespace as delimiters
to strip the _ delimiters from field $3: gsub("^_|_$", "", $3)