I have a file containing a list of hexadecimal numbers, as 0x12345678 one per line.
I want to make a calculation on them. For this, I thought of using awk. But if printing an hexadecimal number with awk is easy with the printf function, I haven't find a way to interpret the hexadecimal input other than as text (or 0, conversion to integer stops on the x).
awk '{ print $1; }' // 0x12345678
awk '{ printf("%x\n", $1)}' // 0
awk '{ printf("%x\n", $1+1)}' // 1 // DarkDust answer
awk '{ printf("%s: %x\n", $1, $1)}' // 0x12345678: 0
Is it possible to print, e.g. the value +1?
awk '{ printf(%x\n", ??????)}' // 0x12345679
Edit: One liners on other languages welcomed! (if reasonable length ;-) )
In the original nawk and mawk implementations the hexadecimal (and octal) numbers are recognised. gawk (which I guess you are using) has the feature/bug of not doing this. It has a command line switch to get the behaviour you want: --non-decimal-data.
echo 0x12345678 | mawk '{ printf "%s: %x\n", $1, $1 }'
0x12345678: 12345678
echo 0x12345678 | gawk '{ printf "%s: %x\n", $1, $1 }'
0x12345678: 0
echo 0x12345678 | gawk --non-decimal-data '{ printf "%s: %x\n", $1, $1 }'
0x12345678: 12345678
gawk has the strtonum function:
% echo 0x12345678 | gawk '{ printf "%s: %x - %x\n", $1, $1, strtonum($1) }'
0x12345678: 0 - 12345678
Maybe you don't need awk at all, as string/number conversion is hairy. Bash versions 3 and 4 are very powerful. It is often simpler, clearer and more portable to stay in Bash, and maybe use grep and cut etc.
For example, in Bash hexadecimal numbers are converted naturally:
$ printf "%d" 0xDeadBeef
3735928559
$ x='0xE'; printf "%d %d %d" $x "$x" $((x + 1))
14 14 15
Hope this helps.
here are the different combinations of their behaviors :
using this command
'BEGIN { print 0xFACECAFEFEED^2, -0xFEEDCAFEFACEBEEFDEADFACEFEED7,
-"0xCAFECAFECAFECAFECAFECAFECAFECAFECAFECAFECAFECAFECAFECAFE" }'
gawk -e (GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1))
76046928626116243263483543552 -82729151009071240233065844435845120 0
gawk -P -e
00 0
-21377898657284658184582485743897013874545437686817998522919218577408
gawk -c -e
00 0 0
gawk -n -e
76046928626116243263483543552 -82729151009071240233065844435845120
-21377898657284658184582485743897013874545437686817998522919218577408
gawk -S -e
76046928626116243263483543552 -82729151009071240233065844435845120 0
gawk -M
76046928626116245157029816169
-82729151009071239007500567260950231 0
gawk -l mpfr
76046928626116243263483543552 -82729151009071240233065844435845120 0
nawk (macos awk version 20200816)
00 0 -2.13778986572846581845824857439e+67
mawk 1.3.4
00 0 -2.13779e+67
mawk2-beta (1.9.9.6)
00 0 0
In fact, if one has a custom awk-script library that works across multiple awk variants, but also wanna take their idiosyncrasies into account, one approach would be use the difference in outputs here to auto-flag, with relatively few combinations left where one needs a tie-breaker.
*** this is only an extension of my comment following schot's response, strictly for proper formatting purposes.
echo FACEBEACEBEFACEEFFFACEEEFFFACEFACFACEB |
mawk '{ printf("%s\n%.f\n%x\n%.f\n",$0,$0,"0x"$0,"0x"$0) }'
FACEBEACEBEFACEEFFFACEEEFFFACEFACFACEB
0
ffffffff
5593196314036579851314282024549245003233230848
5593196314036579608368524797845507287542639851 #exact
Related
If I do have an input stream of decimal numbers, e.g.
100 2000 599 232
and I pass them to awk, how do I print them in Hexadecimal notation?.
for example
0x64 0x74D 0x257 0xE8
starting script ...
echo "100 2000 599 232" | awk '{ print $1 }' #here print in hexa instead of decimal
You can use printf in awk with a format string to convert to hex:
awk '{ printf "%x\n", $1 }'
quick caveat - mawk 1.3.4 has severe limitations when it comes to printing octal and hex codes :
$ gawk 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0xfffffffffefffffe
$ nawk 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0xfffffffffefffffe
$ mawk 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0000000000000000
$ mawk2 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0xfffffffffefffffe
It's not even that large a value (-16777218), and mawk 1.3.4 completely bellyflops. On the flip side, it can directly decipher some hex constants (only gawk not in either posix or traditional mode can directly decipher octal constants :
$ mawk 'BEGIN { OFMT="%.f"; print +"0xDEADBEEF" }'
3735928559
nawk 'BEGIN { OFMT="%.f"; print +"0xDEADBEEF" }'
3735928559
$ mawk2 'BEGIN { OFMT="%.f"; print +"0xDEADBEEF" }'
0
$ gawk --posix 'BEGIN{ OFMT="%.f"; print +"0xDEADBEEF" }'
3735928559 <==== note the difference - posix mode only can decipher strings
the "+" in front is also necessary cuz gawk will just print
it as a string otherwise.
$ gawk -e 'BEGIN { OFMT="%.f"; print 0xDEADBEEF }'
3735928559 <==== standard mode only can decipher clear text ones
- mawk2 is the only one among those above that
even prints anything out with %p in printf(),
but still erroring out, as such
mawk2: line 1: invalid control character 'p'
in [s]printf format ("0x10f0099da
- both gawk and nawk properly prints out %a
I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage
The following two runs are different. How to make the first run the same as the second run (I still want print without any explicit arguments)? Is there a way to control the number of digits in $1 = 1/3?
$ gawk -v OFMT='%.20g' -e 'BEGIN { $1 = 1/3; print }'
0.333333
$ gawk -v OFMT='%.20g' -e 'BEGIN { print 1/3}'
0.33333333333333331483
EDIT: The following comparison is also unexpected. Ideally, if there is just one field, print $1 and print should be just the same. I think it could be considered as a bug?
$ gawk -v OFMT='%.20g' -e 'BEGIN { $1 = 1/3; print $1}'
0.33333333333333331483
$ gawk -v OFMT='%.20g' -e 'BEGIN { $1 = 1/3; print}'
0.333333
There is a subtlety here. There are two variables, OFMT and CONVFMT. The variable OFMT is used to control how numbers are converted to strings in the print statement while the variable CONVFMT is used to define how numbers are converted to strings in general (outside of the print statement):
Prior to the POSIX standard, awk used the value of OFMT for converting numbers to strings. OFMT specifies the output format to use when printing numbers with print. CONVFMT was introduced in order to separate the semantics of conversion from the semantics of printing. Both CONVFMT and OFMT have the same default value: "%.6g". In the vast majority of cases, old awk programs do not change their behaviour.
source: GNU awk manual
More detailed information about this reasoning can be found in the secion rationale of the awk POSIX standard.
numeric value in print statement:
$ awk 'BEGIN{print 1/3}'
0.333333
$ awk 'BEGIN{OFMT="%.20g"; print 1/3 }'
0.33333333333333331483
$ awk 'BEGIN{CONVFMT="%.20g"; print 1/3 }'
0.333333
variable with a numeric value in print statement:
$ awk 'BEGIN{a=1/3; print a}'
0.333333
$ awk 'BEGIN{OFMT="%.20g"; a=1/3; print a }'
0.33333333333333331483
$ awk 'BEGIN{CONVFMT="%.20g"; a=1/3; print a }'
0.333333
variable with a numeric value converted to string in print statement:
$ awk 'BEGIN{a=1/3; a=a""; print a}'
0.333333
$ awk 'BEGIN{OFMT="%.20g"; a=1/3; a=a""; print a }'
0.333333
$ awk 'BEGIN{CONVFMT="%.20g"; a=1/3; a=a""; print a }'
0.33333333333333331483
I am not sure if its a bug, but try to set a variable and not first field
gawk -v OFMT='%.20g' -e 'BEGIN { a = 1/3; print a}'
0.33333333333333331483
My GAWK version on RHEL is:
gawk-3.1.5-15.el5
I wanted to print a line if the first field of it has all digits (no special characters, even space to be considered)
Example:
echo "123456789012345,3" | awk -F, '{if ($1 ~ /^[[:digit:]]$/) print $0}'
Output:
Nothing
Expected Output:
123456789012345,3
What is going wrong here ? Does my AWK version not understand the GNU character classes ? Kindly help
To match multiple digits in the the [[:digit:]] character class add a +, which means match one or more number of digits in $1.
echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/) print $0}'
123456789012345,3
which satisfies your requirement.
A more idiomatic way ( as suggested from the comments) would be to drop the print and involve the direct match on the line and print it,
echo "123456789012345,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
123456789012345,3
Some more examples which demonstrate the same,
echo "a1,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
(and)
echo "aa,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
do NOT produce any output a per the requirement.
Another POSIX compliant way to do strict length checking of digits can be achieved with something like below, where {3} denotes the match length.
echo "123,3" | awk --posix -F, '$1 ~ /^[0-9]{3}$/'
123,3
(and)
echo "12,3" | awk --posix -F, '$1 ~ /^[0-9]{3}$/'
does not produce any output.
If you are using a relatively newer version of bash shell, it supports a native regEx operator with the ~ using POSIX character classes as above, something like
#!/bin/bash
while IFS=',' read -r row1 row2
do
[[ $row1 =~ ^([[:digit:]]+)$ ]] && printf "%s,%s\n" "$row1" "$row2"
done < file
For an input file say file
$ cat file
122,12
a1,22
aa,12
The script produces,
$ bash script.sh
122,12
Although this works, bash regEx can be slower a relatively straight-forward way using string manipulation would be something like
while IFS=',' read -r row1 row2
do
[[ -z "${row1//[0-9]/}" ]] && printf "%s,%s\n" "$row1" "$row2"
done < file
The "${row1//[0-9]/}" strips all the digits from the row and the condition becomes true only if there are no other characters left in the variable.
Here you are printing every line that matches a pattern. This is exactly the purpose of grep. Since #Inian brilliantly told you what was wrong with your code, let me propose an alternative grep-based answer that does exactly the same as the awk command (albeit much faster):
grep -E '^[[:digit:]]+,'
Could you please try following and let me know if this helps.
echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]*)$/) print $0}'
EDIT: Above code could be reduced a bit to as follows too.
echo "123456789012345,3" | awk -F, '($1 ~ /^[[:digit:]]*$/)'
Suppose I have a file like this:
$ cat a
hello this is a sentence
and this is another one
And I want to print the first two columns with some padding in between them. As this padding may change, I can for example use 7:
$ awk '{printf "%7-s%s\n", $1, $2}' a
hello this
and this
Or 17:
$ awk '{printf "%17-s%s\n", $1, $2}' a
hello this
and this
Or 25, or... you see the point: the number may vary.
Then a question popped: is it possible to assign a variable to this N, instead of hardcoding the integer in the %N-s format?
I tried these things without success:
$ awk '{n=7; printf "%{n}-s%s\n", $1, $2}' a
%{n}-shello
%{n}-sand
$ awk '{n=7; printf "%n-s%s\n", $1, $2}' a
%n-shello
%n-sand
Ideally I would like to know if it is possible to do this. If it is not, what would be the best workaround?
If you use * in your format string, it gets a number from the arguments
awk '{printf "%*-s%s\n", 17, $1, $2}' file
hello this
and this
awk '{printf "%*-s%s\n", 7, $1, $2}' file
hello this
and this
As read in The GNU Awk User’s Guide #5.5.3 Modifiers for printf Formats:
The C library printf’s dynamic width and prec capability (for example,
"%*.*s") is supported. Instead of supplying explicit width and/or prec
values in the format string, they are passed in the argument list. For
example:
w = 5
p = 3
s = "abcdefg"
printf "%*.*s\n", w, p, s
is exactly equivalent to:
s = "abcdefg"
printf "%5.3s\n", s
does this count?
idea is building the "dynamic" fmt, used for printf.
kent$ awk '{n=7;fmt="%"n"-s%s\n"; printf fmt, $1, $2}' f
hello this
and this
Using simple string concatenation.
Here "%", n and "-s%s\n" concatenates as a single string for the format. Based on the example below, the format string produced is %7-s%s\n.
awk -v n=7 '{ printf "%" n "-s%s\n", $1, $2}' file
awk '{ n = 7; printf "%" n "-s%s\n", $1, $2}' file
Output:
hello this
and this
you can use eval (maybe not the most beautiful with all the escape characters, but it works)
i=15
eval "awk '{printf \"%$i-s%s\\n\", \$1, \$2}' a"
output:
hello this
and this