Automatically round off - awk command - awk
I have a csv file and I want to add a column that takes some values from other columns and make some calculations. As a simplified version I'm trying this:
awk -F"," '{print $0, $1+1}' myFile.csv |head -1
The output is:
29.325172701023977,...other columns..., 30
The column added should be 30.325172701023977 but the output is rounded off.
I tried some options using printf, CONVFMT and OFMT but nothing worked.
How can I avoid the round off?
Assumptions:
the number of decimal places is not known beforehand
the number of decimal places can vary from line to line
Setup:
$ cat myfile.csv
29.325172701023977,...other columns...
15.12345,...other columns...
120.666777888,...other columns...
46,...other columns...
One awk idea where we use the number of decimal places to dynamically generate the printf "%.?f" format:
awk '
BEGIN { FS=OFS="," }
{ split($1,arr,".") # split $1 on period
numdigits=length(arr[2]) # count number of decimal places
newNF=sprintf("%.*f",numdigits,$1+1) # calculate $1+1 and format with "numdigits" decimal places
print $0,newNF # print new line
}
' myfile.csv
NOTE: assumes OP's locale uses a decimal/period to separate integer from fraction; for a locale that uses a comma to separate integer from fraction it gets more complicated since it will be impossible to distinguish between a comma as integer/fraction delimiter vs field delimiter without some changes to the file's format
This generates:
29.325172701023977,...other columns...,30.325172701023977
15.12345,...other columns...,16.12345
120.666777888,...other columns...,121.666777888
46,...other columns...,47
as long as you aren't dealing with numbers greater than 9E15, there's no need to fudge any one of CONVFMT, OFMT, or s/printf() at all :
{m,g}awk '$++NF = int((_=$!__) + sub("^[^.]*",__,_))_' FS=',' OFS=','
29.325172701023977,...other columns...,30.325172701023977
15.12345,...other columns...,16.12345
120.666777888,...other columns...,121.666777888
46,...other columns…,47
if mawk-1 is sending ur numbers to scientific notation do :
mawk '$++NF=int((_=$!!NF)+sub("^[^.]*",__,_))_' FS=',' OFS=',' CONVFMT='%.f'
when u scroll right you'll notice all input digits beyond the decimal point are fully preserved
2929292929.32323232325151515151727272727270707070701010101010232323232397979797977,...other columns...,2929292930.32323232325151515151727272727270707070701010101010232323232397979797977
1515151515.121212121234343434345,...other columns...,1515151516.121212121234343434345
12121212120.66666666666767676767777777777788888888888,...other columns...,12121212121.66666666666767676767777777777788888888888
4646464646,...other columns…,4646464647
2929.32325151727270701010232397977,...other columns...,2930.32325151727270701010232397977
1515.121234345,...other columns...,1516.121234345
12120.66666767777788888,...other columns...,12121.66666767777788888
4646,...other columns...,4647
change it to CONVFMT='%\47.f', and you can even get mawk-1 to nicely comma format them for u :
29292929292929.323232323232325151515151515172727272727272707070707070701010101010101023232323232323979797979797977,...other columns…,29,292,929,292,930.323232323232325151515151515172727272727272707070707070701010101010101023232323232323979797979797977
15151515151515.12121212121212343434343434345,...other columns…,15,151,515,151,516.12121212121212343434343434345
121212121212120.666666666666666767676767676777777777777777888888888888888,…other columns…,121,212,121,212,121.666666666666666767676767676777777777777777888888888888888
46464646464646,...other columns...,46,464,646,464,647
Related
AWK script- Not showing data
I'm trying to create a variable to sum columns 26 to 30 and 32. SO far I have this code which prints me the hearder and the output format like I want but no data is being shown. #! /usr/bin/awk -f BEGIN { FS="," } NR>1 { TotalPositiveStats= ($26+$27+$28+$29+$30+$32) } {printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s, %s\n", EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats } NR==1 { print "EndYear,Rk,G,Date,Years,Days,Age,Tm,HOme,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats" }#header Input data: EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc 1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5 1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4 1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9 1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7 1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2 1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9 1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3 Output expected: EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats 1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5,35 1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4,34 1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9,54 1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7,38 1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2,29 1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9,36 1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3,51 This script will be called like gawk -f script.awk <filename>. Currently when calling this is the output (It seems to be calculating the variable but the rest of fields are empty)
awk is well suited to summing columns: awk 'NR>1{$(NF+1)=$26+$27+$28+$29+$30+$32}1' FS=, OFS=, input-file > tmp mv tmp input-file That doesn't add a field in the header line, so you might want something like: awk '{$(NF+1) = NR>1 ? ($26+$27+$28+$29+$30+$32) : "TotalPositiveStats"}1' FS=, OFS=,
An explanation on the issues with the current printf output is covered in the 2nd half of this answer (below). It appears OP's objective is to reformat three of the current fields while also adding a new field on the end of each line. (NOTE: certain aspects of OPs code are not reflected in the expected output so I'm not 100% sure what OP is looking to generate; regardless, OP should be able to tweak the provided code to generate the desired result) Using sprintf() to reformat the three fields we can rewrite OP's current code as: awk ' BEGIN { FS=OFS="," } NR==1 { print $0, "TotalPositiveStats"; next } { TotalPositiveStats = ($26+$27+$28+$29+$30+$32) $17 = sprintf("%.3f",$17) # FG_PCT if ($20 != "") $20 = sprintf("%.3f",$20) # 3P_PCT $23 = sprintf("%.3f",$23) # FT_PCT print $0, TotalPositiveStats } ' raw.dat NOTE: while OP's printf shows a format of %.2f % for the 3 fields of interest ($17, $20, $23), the expected output shows that the fields are not actually being reformatted (eg, $17 remains %.3f, $20 is an empty string, $23 remains %.2f); I've opted to leave $20 as blank otherwise reformat all 3 fields as %.3f; OP can modify the sprintf() calls as needed This generates: EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats 1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5,40 1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1.000,3,2,5,5,2,1,3,4,21,19.4,37 1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9,57 1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1.000,2,2,4,5,3,1,6,5,25,14.7,44 1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.750,3,2,5,5,1,1,2,4,17,13.2,31 1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9,41 1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.750,4,4,8,5,3,2,5,2,33,29.3,56 NOTE: in OP's expected output it appears the last/new field (TotalPositiveStats) does not contain the value from $30 hence the mismatch between the expected results and this answer; again, OP can modify the assignment statement for TotalPositiveStats to include/exclude fields as needed Regarding the issues with the current printf ... {printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s, %s\n", EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats} ... is referencing (awk) variables that have not been defined (eg, EndYear, Rk, G). [NOTE: one exeception is the very last variable in the list - TotalPositiveStats - which has in fact been defined earlier in the script.] The default value for undefined variables is the empty string ("") or zero (0), depending on how the awk code is referencing the variable, eg: printf "%s", EndYear => EndYear is treated as a string and the printed result is an empty string; with an output field delimiter of a comma (,) this empty strings shows up as 2 commas next to each other (,,) printf "%.2f %", FG_PCT => FG_PCT is treated as a numeric (because of the %f format) and the printed result is 0.00 % Where it gets a little interesting is when the (undefined) variable name starts with a numeric (eg, 3P) in which case the P is ignored and the entire reference is treated as a number, eg: printf "%s", 3P => 3P is processed as 3 and the printed result is 3 This should explain the 5 static values (0.00 %, 3, 3, 3.00 % and 0.00 %) printed in all output lines as well as the 'missing' values between the rest of the commas (eg, ,,,,). Obviously the last value in the line is an actual number, ie, the value of the awk variable TotalPositiveStats.
awk command or sed command
000Bxxxxx111118064085vxas - header 10000000001000000000053009-000000000053009- 10000000005000000000000000+000000000000000+ 10000000030000000004025404-000000004025404- 10000000039000000000004930-000000000004930- 10000000088000005417665901-000005417665901- 90000060883328364801913 - trailer In the above file we have header and trailer and the records which start with 1 is the detail record in the detail record,want to sum the values starting from position 28 till 44 including the sign using awk/sed command
Here is sed, with help from bc to do the arithmetic: sed -rn ' /header|trailer/! { s/[[:digit:]]*[+-]([[:digit:]]+)([+-])$/\2\1/ H } $ { x s/\n//gp } ' file | bc I assume the +/- sign follows the number.
Using awk we can solve this problem making use of substr: substr(s, m[, n ]): Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s. This allows us to take the string which represents the number. Here, I assumed that the sign before and after the number is same and thus the sign of the number : $ echo "10000000001000000000053009-000000000053009-" \ | awk '{print length($0); print substr($0,27,43-27)}' 43 -000000000053009 Since awk implicitly converts strings to numbers if you do numeric operations on them we can write the following awk-code to achieve the requested : $ awk '/header|trailer/{next} {s+=substr($0,27,43-27)} END{print s}' file.dat -5421749244 Or in a single line: $ awk '/header|trailer/{next}{s+=substr($0,27,43-27)} END{print s}' file.dat -5421749244 The above examples just work on the example file given by the OP. However, if you have a file containing multiple blocks with header and trailer and you only want to use the text inside these blocks (exclude everything outside of the blocks), then you should handle it a bit differently : $ awk '/header/{s=0;c=1;next} /trailer/{S+=s;c=0;next} c{s+=substr($0,27,43-27)} END{print S}' file.dat Here we do the following: If a line with header is found, reset the block sum s to ZERO and set c=1 indicating that we take the next lines into account If a line with trailer is found, add the block sum s to the overall sum S and set c=0 indicating to ignore the lines. If c/=0 compute the block sum s At the END, print the total sum S
Comparing hexadecimal values with awk
I'm having trouble with awk and comparing values. Here's a minimal example : echo "0000e149 0000e152" | awk '{print($1==$2)}' Which outputs 1 instead of 0. What am I doing wrong ? And how should I do to compare such values ? Thanks,
To convert a string representing a hex number to a numerical value, you need 2 things: prefix the string with "0x" and use the strtonum() function. To demonstrate: echo "0000e149 0000e152" | gawk '{ print $1, $1+0 print $2, $2+0 n1 = strtonum("0x" $1) n2 = strtonum("0x" $2) print $1, n1 print $2, n2 }' 0000e149 0 0000e152 0 0000e149 57673 0000e152 57682 We can see that naively treating the strings as numbers, awk thinks their value is 0. This is because the digits preceding the first non-digit happen to be only zeros. Ref: https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html Note that strtonum is a GNU awk extension
You need to convert $1 and $2 to strings in order to enforce alphanumeric comparison. This can be done by simply append an empty string to them: echo "0000e149 0000e152" | awk '{print($1""==$2"")}' Otherwise awk would perform a numeric comparison. awk will need to convert them to numeric values in this case. Converting those values to numbers in awk leads to 0 - because of the leading zero(s) they are treated as octal numbers but parsing as an octal number fails because the values containing invalid digits which aren't allowed in octal numbers, which results in 0. You can verify that using the following command: echo "0000e149 0000e152" | awk '{print $1+0; print $2+0)}' 0 0
When using non-decimal data you just need to tell gawk that's what you're doing and specify what base you're using in each number: $ echo "0xe152 0x0000e152" | awk --non-decimal-data '{print($1==$2)}' 1 $ echo "0xE152 0x0000e152" | awk --non-decimal-data '{print($1==$2)}' 1 $ echo "0xe149 0x0000e152" | awk --non-decimal-data '{print($1==$2)}' 0 See http://www.gnu.org/software/gawk/manual/gawk.html#Nondecimal-Data
i think many forgot the fact that the hexdigits 0-9 A-F a-f rank order in ASCII - instead of wasting time performing the conversion, or risk facing numeric precision shortfall, simply : trim out leading edge zeros, including the optional 0x / 0X depending on the input source, also trim out delimiters such as ":" (e.g. IPv6, MAC address), "-" (e.g. UUID), "_" (e.g. "0xffff_ffff_ffff_ffff"), "%" (e.g. URL-encoding) etc —- be mindful of the need to pad in missing leading zeros for formats that are very flexible with delimiters, such as IPv6 compare their respective string length()s : if those differ, then one is already distinctly larger, — otherwise prefix both with something meaningless like "\1" to guarantee a string-compare operation without risk of either awk being too smart or running into extreme edge cases like locale-specific peculiarities to its collating order : (("\1") toupper(hex_str_1)) == (("\1") toupper(hex_str_2))
setting default numeric format in awk
I wanted to do a simple parsing of two files with ids and some corresponding numerical values. I didn't want awk to print numbers in scientific notation. File looks like this: someid-1 860025 50.0401 4.00022 someid-2 384319 22.3614 1.78758 someid-3 52096 3.03118 0.242314 someid-4 43770 2.54674 0.203587 someid-5 33747 1.96355 0.156967 someid-6 20281 1.18004 0.0943328 someid-7 12231 0.711655 0.0568899 someid-8 10936 0.636306 0.0508665 someid-9 10224.8 0.594925 0.0475585 someid-10 10188.8 0.59283 0.047391 when use print instead of printf : awk 'BEGIN{FS=OFS="\t"} NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); print $1,k[2],k[3],k[4],$2,$3,$4}' OSCAo.txt dme_miRNA_PIWI_OSC.txt | sort -n -r -k 7 | head i get this result: dme-miR-iab-4-5p 0.333333 0.000016 0.000001 0.25 0.000605606 9.36543e-07 dme-miR-9c-5p 10987.300000 0.525413 0.048798 160.2 0.388072 0.000600137 dme-miR-9c-3p 731.986000 0.035003 0.003251 2.10714 0.00510439 7.89372e-06 dme-miR-9b-5p 30322.500000 1.450020 0.134670 595.067 1.4415 0.00222922 dme-miR-9b-3p 2628.280000 0.125684 0.011673 48 0.116276 0.000179816 dme-miR-9a-3p 10.365000 0.000496 0.000046 0.25 0.000605606 9.36543e-07 dme-miR-999-5p 103.433000 0.004946 0.000459 0.0769231 0.00018634 2.88167e-07 dme-miR-999-3p 1513.790000 0.072389 0.006723 28 0.0678278 0.000104893 dme-miR-998-5p 514.000000 0.024579 0.002283 73 0.176837 0.000273471 dme-miR-998-3p 3529.000000 0.168756 0.015673 42 0.101742 0.000157339 Notice the scientific notation in the last column I understand that printf with appropriate format modifier can do the job but the code becomes very lengthy. I have to write something like this: awk 'BEGIN{FS=OFS="\t"} NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); printf "%s\t%3.6f\t%3.6f\t%3.6f\t%3.6f\t%3.6f\t%3.6f\n", $1,k[2],k[3],k[4],$2,$3,$4}' file1.txt file2.txt > fileout.txt This becomes clumsy when I have to parse fileout with another similarly structured file. Is there any way to specify default numerical output, such that any string will be printed like a string but all numbers follow a particular format.
I think You misinterpreted the meaning of %3.6f. The first number before the decimal point is the field width not the "number of digits before decimal point". (See prinft(3)) So You should use %10.6f instead. It can be tested easily in bash $ printf "%3.6f\n%3.6f\n%3.6f" 123.456 12.345 1.234 123.456000 12.345000 1.234000 $ printf "%10.6f\n%10.6f\n%10.6f" 123.456 12.345 1.234 123.456000 12.345000 1.234000 You can see that the later aligns to the decimal point properly. As sidharth c nadhan mentioned You can use the OFMT awk internal variable (seem awk(1)). An example: $ awk 'BEGIN{print 123.456; print 12.345; print 1.234}' 123.456 12.345 1.234 $ awk -vOFMT=%10.6f 'BEGIN{print 123.456; print 12.345; print 1.234}' 123.456000 12.345000 1.234000 As I see in You example the number with maximum digits can be 123456.1234567, so the format %15.7f to cover all and show a nice looking table. But unfortunately it will not work if the number has no decimal point in it or even if it does, but it ends with .0. $ awk -vOFMT=%15.7f 'BEGIN{print 123.456;print 123;print 123.0;print 0.0+123.0}' 123.4560000 123 123 123 I even tried gawk's strtonum() function, but the integers are considered as non-OFMT strings. See awk -vOFMT=%15.7f -vCONVFMT=%15.7f 'BEGIN{print 123.456; print strtonum(123); print strtonum(123.0)}' It has the same output as before. So I think, you have to use printf anyway. The script can be a little bit shorter and a bit more configurable: awk -vf='\t'%15.7f 'NR==FNR{x[$1]=sprintf("%s"f f f,$1,$2,$3,$4);next}$1 in x{printf("%s"f f f"\n",x[$1],$2,$3,$4)}' file1.txt file2.txt The script will not work properly if there are duplicated IDs in the first file. If it does not happen then the two conditions can be changed and the ;next can be left off.
awk 'NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); printf "%s\t%9s\t%9s\t%9s\t%9s\t%9s\t%9s\n", $1,k[2],k[3],k[4],$2,$3,$4}' file1.txt file2.txt > fileout.txt
formatted reading using awk
I am trying to read in a formatted file using awk. The content looks like the following: 1PS1 A1 1 11.197 5.497 7.783 1PS1 A1 1 11.189 5.846 7.700 . . . Following c format, these lines are in following format "%5d%5s%5s%5d%8.3f%.3f%8.3f" where, first 5 positions are integer (1), next 5 positions are characters (PS1), next 5 positions are characters (A1), next 5 positions are integer (1), next 24 positions are divided into 3 columns of 8 positions with 3 decimal point floating numbers. What I've been using is just calling these lines separated by columns using "$1, $2, $3". For example, cat test.gro | awk 'BEGIN{i=0} {MolID[i]=$1; id[i]=$2; num[i]=$3; x[i]=$4; y[i]=$5; z[i]=$6; i++} END { ...} >test1.gro But I ran into some problems with this, and now I am trying to read these files in a formatted way as discussed above. Any idea how I do this?
Looking at your sample input, it seems the format string is actually "%5d%-5s%5s%5d%8.3f%.3f%8.3f" with the first string field being left-justified. It's too bad awk doesn't have a scanf() function, but you can get your data with a few substr() calls awk -v OFS=: ' { a=substr($0,1,5) b=substr($0,6,5) c=substr($0,11,5) d=substr($0,16,5) e=substr($0,21,8) f=substr($0,29,8) g=substr($0,37,8) print a,b,c,d,e,f,g } ' outputs 1:PS1 : A1: 1: 11.197: 5.497: 7.783 1:PS1 : A1: 1: 11.189: 5.846: 7.700 If you have GNU awk, you can use the FIELDWIDTHS variable like this: gawk -v FIELDWIDTHS="5 5 5 5 8 8 8" -v OFS=: '{print $1, $2, $3, $4, $5, $6, $7}' also outputs 1:PS1 : A1: 1: 11.197: 5.497: 7.783 1:PS1 : A1: 1: 11.189: 5.846: 7.700
You never said exactly which fields you think should have what number, so I'd like to be clear about how awk thinks that works (Your choice to be explicit about calling the whitespace in your output format string fields makes me worry a little. You might have a different idea about this than awk.). From the manpage: An input line is normally made up of fields separated by white space, or by regular expression FS. The fields are denoted $1, $2, ..., while $0 refers to the entire line. If FS is null, the input line is split into one field per character. Take note that the whitespace in the input line does not get assigned a field number and that sequential whitespace is treated as a single field separator. You can test this with something like: echo "1 2 3 4" | awk '{print "1:" $1 "\t2:" $2 "\t3:" $3 "\t4:" $4}' at the command line. All of this assumes that you have not diddles the FS variable, of course.