Get Ascii Code? - awk

To retrieve the ascii code of all charterers of column 13th of a file I write this script
awk -v ch="'" '{
for (i=1;i<=length(substr($13,6,length($13)));i++)
{cmd = printf \"%d\\n\" \"" ch substr(substr($13,6,length($13)),i,1) "\"" cmd | getline output close(cmd) ;
Number= Number " " output
}
print Number ; Number=""
}' ~/a.test
but it doesn't work in the right way! I mean it works fine a while then produces the weird results!?
As an example , for this input (assume it's column 13th)
CQ:Z:%8%%%%0%%%%9%%%%:%%%%%%%%%%%%%%%%%%
I have to get this
37 56 37 37 37 37 48 37 37 37 37 57 37 37 37 37 58 37 37 37 37 ...............
But I have this
37 56 37 37 37 37 48 48 48 48 48 57 57 57 57 57 58 58 58 58 58 ...............
As you can see first miss-computation appear after character "0" (48 in result).
Do you know which part of my code is responsible for this error ?!

Try this:
awk '{
str = substr($13, 6)
for (i=1; i<=length(str); i++) {
cmd = "printf %d \42\47" substr(str, i, 1) "\42"
cmd | getline output
close(cmd)
Number= Number " " output
}
print Number
Number=""
}' ~/a.test
\42 is " and \47 is ', so this runs printf %d "'${char}" in the shell for each ${char}, which triggers evaluation as a C constant with the POSIX extension dictating a numeric value as noted in the final bullet of the POSIX printf definition's §Extended Description.
N.B. The formatting matters!
Don't try to squeeze the code unless you know exactly what you're doing!
And a pure awk solution (I took the ord/chr functions directly from the manual):
printf '%s\n' 'CQ:Z:%8%%%%0%%%%9%%%%:%%%%%%%%%%%%%%%%%%'|
awk 'BEGIN { _ord_init() }
{
str = substr($0, 6)
for (i = 0; ++i <= length(str);)
printf "%s", (ord(substr(str, i, 1)) (i < length(str) ? OFS : ORS))
}
func _ord_init( low, high, i, t) {
low = sprintf("%c", 7) # BEL is ascii 7
if (low == "\a") { # regular ascii
low = 0
high = 127
}
else if (sprintf("%c", 128 + 7) == "\a") {
# ascii, mark parity
low = 128
high = 255
}
else { # ebcdic(!)
low = 0
high = 255
}
for (i = low; i <= high; i++) {
t = sprintf("%c", i)
_ord_[t] = i
}
}
func ord(str, c) {
# only first character is of interest
c = substr(str, 1, 1)
return _ord_[c]
}
func chr(c) {
# force c to be numeric by adding 0
return sprintf("%c", c + 0)
}'

This might work for you:
awk -vSQ="'" -vDQ='"' '{args=space="";n=split($13,a,"");for(i=1;i<=n;i++){args=args space DQ SQ a[i] DQ;format=format space "%d";space=" "};format=DQ format "\\n" DQ;system("printf " format " " args)}'

Related

decoding octal escape sequences in input with awk

Updated
Let's suppose that you got octal escape sequences in a stream:
backslash \134 is escaped as \134134
single quote ' and double quote \042
linefeed `\012` and carriage return `\015`
%s &
etc...
note: The escaped characters are limited to 0x01-0x1F 0x22 0x5C 0x7F
How can you revert those escape sequences back to their corresponding character with awk?
While awk is able to understand them out-of-box when used in a literal string or as a parameter argument, I can't find the way to leverage this capability when the escape sequence is part of the data. For now I'm using one gsub per escape sequence but it doesn't feel efficient.
Here's the expected output for the given sample:
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
PS: While I have the additional constraint of unescaping each line into an awk variable before printing the result, it doesn't really matter.
Using GNU awk for strtonum() and lots of meaningfully-named variables to show what each step does:
$ cat tst.awk
function octs2chars(str, head,tail,oct,dec,char) {
head = ""
tail = str
while ( match(tail,/\\[0-7]{3}/) ) {
oct = substr(tail,RSTART+1,RLENGTH-1)
dec = strtonum(0 oct)
char = sprintf("%c", dec)
head = head substr(tail,1,RSTART-1) char
tail = substr(tail,RSTART+RLENGTH)
}
return head tail
}
{ print octs2chars($0) }
$ awk -f tst.awk file
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
If you don't have GNU awk then write a small function to convert octal to decimal, e.g. oct2dec() below, and then call that instead of strtonum():
$ cat tst2.awk
function oct2dec(oct, dec) {
dec = substr(oct,1,1) * 8 * 8
dec += substr(oct,2,1) * 8
dec += substr(oct,3,1)
return dec
}
function octs2chars(str, head,tail,oct,dec,char) {
head = ""
tail = str
while ( match(tail,/\\[0-7]{3}/) ) {
oct = substr(tail,RSTART+1,RLENGTH-1)
dec = oct2dec(oct) # replaced "strtonum(0 oct)"
char = sprintf("%c", dec)
head = head substr(tail,1,RSTART-1) char
tail = substr(tail,RSTART+RLENGTH)
}
return head tail
}
{ print octs2chars($0) }
$ awk -f tst2.awk file
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
The above assumes that, as discussed in comments, the only backslashes in the input will be in the context of the start of octal numbers as shown in the provided sample input.
With GNU awk which supports strtonum() function, would you
please try:
awk '{
while (match($0, /\\[0-7]{1,3}/)) {
printf("%s", substr($0, 1, RSTART - 1)) # print the substring before the match
printf("%c", strtonum("0" substr($0, RSTART + 1, RLENGTH))) # convert the octal string to character
$0 = substr($0, RSTART + RLENGTH) # update $0 with remaining substring
}
print
}' input_file
It processes the matched substring (octal presentation)
in the while loop one by one.
substr($0, RSTART + 1, RLENGTH) skips the leading backslash.
"0" prepended to substr makes an octal string.
strtonum() converts the octal string to the numeric value.
The final print outputs the remaining substring.
UPDATE :: about gawk's strtonum() in unicode mode :
echo '\666' |
LC_ALL='en_US.UTF-8' gawk -e '
$++NF = "<( "(sprintf("%c", strtonum((_=_<_) substr($++_, ++_))))" )>"'
0000000 909522524 539507744 690009798 2622
\ 6 6 6 < ( ƶ ** ) > \n
134 066 066 066 040 074 050 040 306 266 040 051 076 012
\ 6 6 6 sp < ( sp ? ? sp ) > nl
92 54 54 54 32 60 40 32 198 182 32 41 62 10
5c 36 36 36 20 3c 28 20 c6 b6 20 29 3e 0a
0000016
By default, gawk in unicode mode would decode out a multi-byte character instead of byte \266 | 0xB6. If you wanna ensure consistency of always decoding out a single-byte out, even in gawk unicode mode, this should do the trick :
echo '\666' |
LC_ALL='en_US.UTF-8' gawk -e '$++NF = sprintf("<( %c )>",
strtonum((_=_<_) substr($++_, ++_)) + _*++_^_++*_^++_)'
0000000 909522524 539507744 1042882742 10
\ 6 6 6 < ( 266 ) > \n
134 066 066 066 040 074 050 040 266 040 051 076 012
\ 6 6 6 sp < ( sp ? sp ) > nl
92 54 54 54 32 60 40 32 182 32 41 62 10
5c 36 36 36 20 3c 28 20 b6 20 29 3e 0a
0000015
long story short : add 4^5 * 54 to output of strtonum(), which happens to be 0xD800, the starting point of UTF-16 surrogates
=================== =================== ===================
one quick note about #Gene's proposed perl-based solution :
echo 'abc \555 456' | perl -p -e 's/\\([0-7]{3})/chr(oct($1))/ge'
Wide character in print at -e line 1, <> line 1.
abc ŭ 456
octal codes wrap around, meaning \4xx = \0xx ; \6xx = \2xx etc :
printf '\n %s\n' $'\555'
m
so perl is incorrectly decoding these as multi-byte characters, when in fact \555, as confirmed by printf, is merely lowercase "m" (0x6D)
ps : my perl is version 5.34
I got my own POSIX awk solution, so I post it here for reference.
The main idea is to build a hash that translates an octal escape sequence to its corresponding character. You can then use it while splitting the line during the search for escape sequences:
LANG=C awk '
BEGIN {
for ( i = 1; i <= 255; i++ )
tr[ sprintf("\\%03o",i) ] = sprintf("%c",i)
}
{
remainder = $0
while ( match(remainder, /\\[0-7]{3}/) ) {
printf("%s%s", \
substr(remainder, 1, RSTART-1), \
tr[ substr(remainder, RSTART, RLENGTH) ] \
)
remainder = substr(remainder, RSTART + RLENGTH)
}
print remainder
}
' input.txt
backslash `\`
single quote `'` and double quote `"`
linefeed `
` and carriage return `
%s &
etc...
this separate post is made specifically to showcase how to extend the octal lookup reference tables in gawk unicode-mode to all 256 bytes without external dependencies or warning messages:
ASCII bytes reside in table o2bL
8-bit bytes reside in table o2bH
.
# gawk profile, created Fri Sep 16 09:53:26 2022
'BEGIN {
1 makeOctalRefTables(PROCINFO["sorted_in"] = "#val_str_asc" \
(ORS = ""))
128 for (_ in o2bL) {
128 print o2bL[_]
}
128 for (_ in o2bH) {
128 print o2bH[_]
}
}
function makeOctalRefTables(_,__,___,____)
{
1 _=__=___=____=""
for (_ in o2bL) {
break
}
1 if (!(_ in o2bL)) {
1 ____=_+=((_+=_^=_<_)-+-++_)^_--
128 do { o2bL[sprintf("\\%o",_)] = \
sprintf("""%c",_)
} while (_--)
1 o2bL["\\" ((_+=(_+=_^=_<_)+_)*_--+_+_)] = "\\&"
1 ___=--_*_^_--*--_*++_^_*(_^=++_)^(! —_)
128 do { o2bH[sprintf("\\%o", +_)] = \
sprintf("%c",___+_)
} while (____<--_)
}
1 return length(o2bL) ":" length(o2bH)
}'
|
\0 \1 \2 \3 \4 \5 \6 \7 \10\11 \12
\13
\14
\16 \17
\20 \21 \22 \23 \24 \25 \26 \27 \30 \31 \32 \33 34 \35 \36 \37
\40 \41 !\42 "\43 #\44 $\45 %\47 '\50 (\51 )\52 *\53 +\54 ,\55 -\56 .\57 /
\60 0\61 1\62 2\63 3\64 4\65 5\66 6\67 7\70 8\71 9\72 :\73 ;\74 <\75 =\76 >\77 ?
\100 #\101 A\102 B\103 C\104 D\105 E\106 F\107 G\110 H\111 I\112 J\113 K\114 L\115 M\116 N\117 O
\120 P\121 Q\122 R\123 S\124 T\125 U\126 V\127 W\130 X\131 Y\132 Z\133 [\134 \\46 \&\135 ]\136 ^\137 _
\140 `\141 a\142 b\143 c\144 d\145 e\146 f\147 g\150 h\151 i\152 j\153 k\154 l\155 m\156 n\157 o
\160 p\161 q\162 r\163 s\164 t\165 u\166 v\167 w\170 x\171 y\172 z\173 {\174 |\175 }\176 ~\177
\200 ?\201 ?\202 ?\203 ?\204 ?\205 ?\206 ?\207 ?\210 ?\211 ?\212 ?\213 ?\214 ?\215 ?\216 ?\217 ?
\220 ?\221 ?\222 ?\223 ?\224 ?\225 ?\226 ?\227 ?\230 ?\231 ?\232 ?\233 ?\234 ?\235 ?\236 ?\237 ?
\240 ?\241 ?\242 ?\243 ?\244 ?\245 ?\246 ?\247 ?\250 ?\251 ?\252 ?\253 ?\254 ?\255 ?\256 ?\257 ?
\260 ?\261 ?\262 ?\263 ?\264 ?\265 ?\266 ?\267 ?\270 ?\271 ?\272 ?\273 ?\274 ?\275 ?\276 ?\277 ?
\300 ?\301 ?\302 ?\303 ?\304 ?\305 ?\306 ?\307 ?\310 ?\311 ?\312 ?\313 ?\314 ?\315 ?\316 ?\317 ?
\320 ?\321 ?\322 ?\323 ?\324 ?\325 ?\326 ?\327 ?\330 ?\331 ?\332 ?\333 ?\334 ?\335 ?\336 ?\337 ?
\340 ?\341 ?\342 ?\343 ?\344 ?\345 ?\346 ?\347 ?\350 ?\351 ?\352 ?\353 ?\354 ?\355 ?\356 ?\357 ?
\360 ?\361 ?\362 ?\363 ?\364 ?\365 ?\366 ?\367 ?\370 ?\371 ?\372 ?\373 ?\374 ?\375 ?\376 ?\377 ?

AWK new line sorting

I have a script that sorts numbers:
{
if ($1 <= 9) xd++
else if ($1 > 9 && $1 <= 19) xd1++
else if ($1 > 19 && $1 <= 29) xd2++
else if ($1 > 29 && $1 <= 39) xd3++
else if ($1 > 39 && $1 <= 49) xd4++
else if ($1 > 49 && $1 <= 59) xd5++
else if ($1 > 59 && $1 <= 69) xd6++
else if ($1 > 69 && $1 <= 79) xd7++
else if ($1 > 79 && $1 <= 89) xd8++
else if ($1 > 89 && $1 <= 99) xd9++
else if ($1 == 100) xd10++
} END {
print "0-9 : "xd, "10-19 : " xd1, "20-29 : " xd2, "30-39 : " xd3, "40-49 : " xd4, "50-59 : " xd5, "60-69 : " xd6, "70-79 : " xd7, "80-89 : " xd8, "90-99 : " xd9, "100 : " xd10
}
output:
$ cat xd1 | awk -f script.awk
0-9 : 16 10-19 : 4 20-29 : 30-39 : 2 40-49 : 1 50-59 : 1 60-69 : 1 70-79 : 1 80-89 : 1 90-99 : 1 100 : 2
how to make that every tenth was on a new line?
like this:
0-9 : 16
10-19 : 4
20-29 :
30-39 : 2
print with \n doesn't work
additionally:
in the top ten I have 16 numbers, how can I get this information using the "+" sign
like this:
0-9 : 16 ++++++++++++++++
10-19 : 4 ++++
20-29 :
30-39 : 2 ++
thank you in advance
If we rewrite the current code to use an array to keep track of counts, we can then use a simple for loop to print the results on individual lines, eg:
{ if ($1 <= 9) xd[0]++
else if ($1 <= 19) xd[1]++
else if ($1 <= 29) xd[2]++
else if ($1 <= 39) xd[3]++
else if ($1 <= 49) xd[4]++
else if ($1 <= 59) xd[5]++
else if ($1 <= 69) xd[6]++
else if ($1 <= 79) xd[7]++
else if ($1 <= 89) xd[8]++
else if ($1 <= 99) xd[9]++
else xd[10]++
}
END { for (i=0;i<=9;i++)
print (i*10) "-" (i*10)+9, ":", xd[i]
print "100 :", xd[10]
}
At this point we could also replace the 1st part of the script with a comparable for loop, eg:
{ for (i=0;i<=9;i++)
if ($1 <= (i*10)+9) {
xd[i]++
next
}
xd[10]++
}
END { for (i=0;i<=9;i++)
print (i*10) "-" (i*10)+9, ":", xd[i]
print "100 :", xd[10]
}
As for the additional requirement to print a variable number of + on the end of each line we can add a function (prt()) to generate the variable number of +:
function prt(n ,x) {
x=""
if (n) {
x=sprintf("%*s",n," ")
gsub(/ /,"+",x)
}
return x
}
{ for (i=0;i<=9;i++)
if ($1 <= (i*10)+9) {
xd[i]++
next
}
xd[10]++
}
END { for (i=0;i<=9;i++)
print (i*10) "-" (i*10)+9, ":", xd[i], prt(xd[i])
print "100 :", xd[10], prt(xd[10])
}
how to make that every tenth was on a new line?
Inform GNU AWK that you want OFS (output field separator) to be newline, consider following simple example
awk 'BEGIN{x=1;y=2;z=3}END{print "x is " x, "y is " y, "z is " z}' emptyfile
gives output
x is 1 y is 2 z is 3
whilst
awk 'BEGIN{OFS="\n";x=1;y=2;z=3}END{print "x is " x, "y is " y, "z is " z}' emptyfile
gives output
x is 1
y is 2
z is 3
Explanation: OFS value (default: space) is used for joining arguments of print. If you want to know more about OFS then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in gawk 4.2.1)
you don't need to hard-code in 10-buckets like that :
jot -r 300 1 169 | mawk '
BEGIN { _+=(_+=_^=_<_)*_*_ } { ++___[_<(__=int(($!!_)/_))?_:__] }
END {
____ = sprintf("%*s", NR, _)
gsub(".","+",____)
for(__=_-_;__<=_;__++) {
printf(" [%3.f %-6s] : %5.f %.*s\n",__*_,+__==+_?"+ "\
: " , " __*_--+_++, ___[__], ___[__], ____) } }'
[ 0 , 9 ] : 16 ++++++++++++++++
[ 10 , 19 ] : 17 +++++++++++++++++
[ 20 , 29 ] : 16 ++++++++++++++++
[ 30 , 39 ] : 19 +++++++++++++++++++
[ 40 , 49 ] : 14 ++++++++++++++
[ 50 , 59 ] : 18 ++++++++++++++++++
[ 60 , 69 ] : 18 ++++++++++++++++++
[ 70 , 79 ] : 16 ++++++++++++++++
[ 80 , 89 ] : 20 ++++++++++++++++++++
[ 90 , 99 ] : 19 +++++++++++++++++++
[100 + ] : 127 ++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++

awk how to split and change blank by NA

i have trouble doing some stuff with awk. I want to split a file into 2 files, it's working mostly but i have one last issue:
this is one of my input file :
samplexxx EH Tred GangSTR
dijen006 nofile nofile nofile
dijen006_100 22,30 22,27 19,25
dijen006_75 25,27 29 NA
dijen017 nofile nofile nofile
dijen017_100 75,121 54 24,24
dijen017_75 74,131 72 19,19
dijen081 63,84 32 40,40
dijen081_100 70,115 78 25,41
dijen081_75 79,143 95 24,104
dijen082 47,51 38 15,34
dijen082_100 46,61 52 6,32
dijen082_75 NA 55 17,17
dijen083 30,53 30,40 38,38
dijen083_100 43,53 30,59 23,32
dijen083_75 43,60 18,74 23,71
dijen1013 30 30 20,30
dijen1013_100 30 30 9,19
dijen1013_75 21 33 20,20
dijen1014 9,30 9,30 9,30
dijen1014_100 9,28 9,43 9,11
dijen1014_75 9,28 9,36 9,29
dijen1015 23,30 23,30 23,29
dijen1015_100 23,30 NA 13,22
dijen1015_75 25,27 21,42 22,39
dijen402 25,31 25,31 25,31
dijen402_100 30 29,36 14,30
dijen402_75 25,26 22,39 22,39
i am using this code :
#!/bin/awk -f
#USAGE = awk -v my_var=$ibasename $i .tsv) split_file_allelle.awk $i
BEGIN { FS=OFS="\t" }
NR == 1 {
str1 = str2 = $0
}
NR > 1 {
str1 = str2 = $1
for (i=2; i<=NF; i++) {
split($i,a,/,/)
str1 = str1 OFS a[1]
str2 = str2 OFS a[2]
}
}
{
print str1 > my_var"_all1.tsv"
print str2 > my_var"_all2.tsv"
}
and i have two file, one like that, splited on the ",". Do you think it would be a way to get, on the second file where there is no number, something like 'NA' instead of blank?
samplexxx EH Tred GangSTR
dijen006
dijen006_100 30 27 25
dijen006_75 27
dijen017
dijen017_100 121 24
dijen017_75 131 19
dijen081 84 40
dijen081_100 115 41
dijen081_75 143 104
dijen082 51 34
dijen082_100 61 32
dijen082_75 17
dijen083 53 40 38
dijen083_100 53 59 32
dijen083_75 60 74 71
dijen1013 30
dijen1013_100 19
dijen1013_75 20
dijen1014 30 30 30
dijen1014_100 28 43 11
dijen1014_75 28 36 29
dijen1015 30 30 29
dijen1015_100 30 22
dijen1015_75 27 42 39
dijen402 31 31 31
dijen402_100 36 30
dijen402_75 26 39 39
this is what i have, but i would like to have something like that :
samplexxx EH Tred GangSTR
dijen006 NA NA NA
dijen006_100 30 27 25
dijen006_75 27 NA NA
dijen017 NA NA NA
dijen017_100 121 NA 24
....
thanks for your help!
BEGIN {
FS = OFS = "\t"
all1 = my_var "_all1.tsv"
all2 = my_var "_all2.tsv"
}
NR == 1 {
str1 = str2 = $0
}
NR > 1 {
str1 = str2 = $1
for (i=2; i<=NF; i++) {
n = split($i,a,",")
str1 = str1 OFS a[1]
str2 = str2 OFS (n == 1 ? "NA" : a[2])
}
}
{
print str1 > all1
print str2 > all2
}
It wasn't necessary to change print str1 > my_var"_all1.tsv" to print str1 > all1 to solve the specific problem you asked about, the ternary using the test of split()s return does that, BUT print str1 > my_var"_all1.tsv" is undefined behavior per POSIX so it'd fail in some awks and instead needs to be written using a variable as I have or with parens around the expression that generates the file name, print str1 > (my_var"_all1.tsv"). Using a variable and doing the concatenation once total instead of once per line is more efficient.

compare file and print class

I have
file1:
id position
a1 21
a1 39
a1 77
b1 88
b1 122
c1 22
file 2
id class position1 position2
a1 Xfact 1 40
a1 Xred 41 66
a1 xbreak 69 89
b1 Xbreak 77 133
b1 Xred 140 199
c1 Xfact 1 15
c1 Xbreak 19 35
I want something like this
output:
id position class
a1 21 Xfact
a1 39 Xfact
a1 77 Xbreak
b1 88 Xbreak
b1 122 Xbreak
c1 22 Xbreak
I need a simple awk script , which print id and position from file1, take position from file1 and compare it to file 2 positions. if position in file 1 lies in range of position 1 and 2 in file two. print corresponding class
One way using awk. It's not a simple script. The process explained in short: The key point is the variable 'all_ranges', when reset reads from file of ranges saving its data, and when set, stop that process and begin reading from 'id-position'
file, checks position in the data of the array and prints if matches the range. I've tried to avoid to process the file of ranges many times and do it by chunks, which made it more complex.
EDIT to add that I assume id field in both files are sorted. Otherwise this script will fail miserably and you will need another approach.
Content of script.awk:
BEGIN {
## Arguments:
## ARGV[0] = awk
## ARGV[1] = <first_input_argument>
## ARGV[2] = <second_input_argument>
## ARGC = 3
f2 = ARGV[ --ARGC ];
all_ranges = 0
## Read first line from file with ranges to get 'class' header.
getline line <f2
split( line, fields )
class_header = fields[2];
}
## Special case for the header.
FNR == 1 {
printf "%s\t%s\n", $0, class_header;
next;
}
## Data.
FNR > 1 {
while ( 1 ) {
if ( ! all_ranges ) {
## Read line from file with range positions.
ret = getline line <f2
## Check error.
if ( ret == -1 ) {
printf "%s\n", "ERROR: " ERRNO
close( f2 );
exit 1;
}
## Check end of file.
if ( ret == 0 ) {
break;
}
## Split line in spaces.
num = split( line, fields )
if ( num != 4 ) {
printf "%s\n", "ERROR: Bad format of file " f2;
exit 2;
}
range_id = fields[1];
if ( $1 == fields[1] ) {
ranges[ fields[3], fields[4] ] = fields[2];
continue;
}
else {
all_ranges = 1
}
}
if ( range_id == $1 ) {
delete ranges;
ranges[ fields[3], fields[4] ] = fields[2];
all_ranges = 0;
continue;
}
for ( range in ranges ) {
split( range, pos, SUBSEP )
if ( $2 >= pos[1] && $2 <= pos[2] ) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
break;
}
}
END {
for ( range in ranges ) {
split( range, pos, SUBSEP )
if ( $2 >= pos[1] && $2 <= pos[2] ) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
}
Run it like:
awk -f script.awk file1 file2 | column -t
With following result:
id position class
a1 21 Xfact
a1 39 Xfact
a1 77 xbreak
b1 88 Xbreak
b1 122 Xbreak
c1 22 Xbreak

calculate the difference from flat file

I have a text file and the last 2 lines look like this...
Uptime: 822832 Threads: 32 Questions: 13591705 Slow queries: 722 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.518
Uptime: 822893 Threads: 31 Questions: 13592768 Slow queries: 732 Opens: 81551 Flush tables: 59 Open tables: 64 Queries per second avg: 16.618
How do I find the difference between the two values of each parameter?
The expected output is:
61 -1 1063 10 0 0 0 0.1
In other words I will like to deduct the current uptime value from the earlier uptime.
Find the difference between the threads and Questions and so on.
The purpose of this exercise is to watch this file and alert the user when the difference is too high. For e.g. if the slow queries are more than 500 or the "Questions" parameter is too low (<100)
(It is the MySQL status but has nothing to do with it, so mysql tag does not apply)
Just a slight variation on ghostdog74's (original) answer:
tail -2 file | awk ' {
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ");
for (i=1;i<=m;i++)
if (NR==1) b[i]=a[i]; else print a[i] - b[i]
} '
here's one way. tail is used to get the last 2 lines, especially useful in terms of efficiency if you have a big file.
tail -2 file | awk '
{
gsub(/[a-zA-Z: ]+/," ")
m=split($0,a," ")
if (f) {
for (i=1;i<=m;i++){
print -(b[i]-a[i])
}
# to check for Questions, slow queries etc
if ( -(b[3]-a[3]) < 100 ){
print "Questions parameter too low"
}else if ( -(b[4]-a[4]) > 500 ){
print "Slow queries more than 500"
}else if ( a[1] - b[1] < 0 ){
print "mysql ...... "
}
exit
}
for(i=1;i<=m;i++ ){ b[i]=a[i] ;f=1 }
} '
output
$ ./shell.sh
61
-1
1063
10
0
0
0
0.1
gawk:
BEGIN {
arr[1] = "0"
}
length(arr) > 1 {
print $2-arr[1], $4-arr[2], $6-arr[3], $9-arr[4], $11-arr[5], $14-arr[6], $17-arr[7], $22-arr[8]
}
{
arr[1] = $2
arr[2] = $4
arr[3] = $6
arr[4] = $9
arr[5] = $11
arr[6] = $14
arr[7] = $17
arr[8] = $22
}