Parse from String and convert to float, integer (Raku) - raku

FAQ: In Raku, how do I parse a String and get a Number ? For example:
xxx("42"); # 42 (Int)
xxx("0x42"); # 66 (Int)
xxx("42.123456789123456789"); # 42.123456789123456789 (Rat)
xxx("42.4e2"); # 4240 (Rat)
xxx("42.4e-2"); # 0.424 (Rat)

Just use the prefix +:
say +"42"; # 42 (Int)
say +"0x42"; # 66 (Int)
say +"42.123456789123456789"; # 42.123456789123456789 (Rat)
say +"42.4e2"; # 4240 (Rat)
say +"42.4e-2"; # 0.424 (Rat)
Info
val a Str routine is doing exactely what you (I) want.
Beware that it is returning Allomorph object. Use unival or just + prefix to convert it to Number
Links:
Learning Raku: Number, Strings, and NumberString Allomorphs
Same question in Python, Perl
Roseta Code: Determine if a string is numeric
Edited thanks to #Holli comment

my regex number {
\S+ #grab chars
<?{ defined +"$/" }> #assertion that coerces via '+' to Real
}
#strip factor [leading] e.g. 9/5 * Kelvin
if ( $defn-str ~~ s/( <number>? ) \s* \*? \s* ( .* )/$1/ ) {
my $factor = $0;
#...
}

Related

awk and gawk decimal integer comparison fails some time

I've written an AWK script to scan a check image log file (ASCII characters sent to laser or dot-matrix printer to print on preprinted check forms) kept with tee /tmp/$$.print for every check run. The goal is to add up the check tab invoice / discount values and compare to the dollar and cents and check amount printed on the check body. The script work as desired but fails unexpectedly where I can see no reason for the failure. Out of 750 check images processed, 37 checks are unexpectedly being included in the collected list of checks where the sum of the check-tab invoices does not equal the amount of the check with nine checks found with actual differences:
37 /tmp/eq_check
Check # 62110 04/07/2022 Sum tab 2240.45 Check amount 2240.45
Check # 62131 04/07/2022 Sum tab 2099.22 Check amount 2099.22
Check # 62134 04/07/2022 Sum tab 5124.40 Check amount 5124.40
Check # 63143 04/14/2022 Sum tab 536.58 Check amount 536.58
Check # 63148 04/14/2022 Sum tab 2354.18 Check amount 2354.18
Check # 63155 04/28/2022 Sum tab 1276.55 Check amount 1276.55
...
Check # 75161 12/09/2022 Sum tab 614.41 Check amount 614.41
Check # 75172 12/09/2022 Sum tab 17445.24 Check amount 17445.24
Check # 75176 12/09/2022 Sum tab 1194.85 Check amount 1194.85
Check # 75179 12/09/2022 Sum tab 264.10 Check amount 264.10
9 /tmp/neq_check
Check # 62122 04/07/2022 Sum tab 366.24 Check amount 150.00
Check # 63199 05/10/2022 Sum tab 22310.65 Check amount 21274.66
Check # 63268 06/09/2022 Sum tab 36086.37 Check amount 35918.21
Check # 63310 06/30/2022 Sum tab 16841.02 Check amount 14652.00
Check # 63429 09/07/2022 Sum tab 5955.87 Check amount 5707.53
Check # 63449 09/12/2022 Sum tab 947268177.91 Check amount 28064.91
Check # 75010 09/26/2022 Sum tab 562.82 Check amount 314.48
Check # 75054 10/21/2022 Sum tab 10052.77 Check amount 9804.43
Check # 75113 11/10/2022 Sum tab 19821.61 Check amount 7381.69
After I composed this post, It occurs to me to try changing the test for not equal to
if ( ( tab_total - pcheck_amt) != 0 ) to see if that works
Nope, same 37 false positives and 9 positives.
here is a test check:
# cat check63282
99820989 20220616 326.10
Discount -3.26
63282 06/21/2022 MU
$322.84
Three Hundred Twenty Two Dollars and 84 Cents********************************
#
Here is the code that is failing:
# upper tab example
# 947897461 20221024 76.00 947992349 20221031 1161.30
# Discount -1.52 Discount -23.23
# 947897457. 20221024 6754.59 94793360 20221029 5731.54
# Discount -135.09 Discount -114.63
# SHIP & DEBIT 20221027 -25,866.38 947973361 20221029 1,386.00
# 947945737 20221027 28,325.70 Discount -27.72
# Discount -566.51 947973365 20221029 312.00
# 947945740 20221027 404.00 Discount -6.24
#
#
#
# Check body example
# 63449 09/12/2022 BM
#
#
#
# $28,064.91
# Twenty Eight Thousand Sixty Four Dollars and 91 Cents***********************
#
# index(s,t)
# Returns the position in string s where string t first
# occurs, or 0 if it does not occur at all.
BEGIN{ tab_total = 0 }
{
gsub(/,/, "") # strip NN,NNN.NN -> NNNNN.NN
gsub(/\$/, "") # strip $NNN.NN -> NNN.NN
gsub(/^M/, "") # strip DOS line ending
# Find a line with decimal point
a= index($0,".")
if ( a > 0 && a < 50 ) {
for( i = 1; i <= NF; i++) {
b = index($i,".")
if( b > 0 ) {
tab_total+= ($i * 100)
}
}
}
# Find the date line
c= index($0,"/")
if ( c > 50 ) {
check_num = $1
check_date = $2
if( NF > 2 ) who_to = $3
}
# Find the printed check amount
if ( a > 50 ) {
pcheck_amt = ($0+0) * 100
}
if( $0 ~ /Dollars and/ ) {
# found check body.
gsub(/\*\*$/, "", $0)
if ( pcheck_amt != tab_total) {
printf"\n Check # %6d %s Sum tab %10.2f Check amount %7.2f\n %s\
n", check_num, check_date, tab_total/100, pcheck_amt/100, $0
}
tab_total=0
}
}
With debugging added:
BEGIN{ tab_total = pcheck_amt = 0 }
{
gsub(/,/, "")
gsub(/\$/, "")
gsub(/^M/, "", $0)
# Find a line with decimal point
a= index($0,".")
c= index($0,"/")
if ( a > 0 && a < 50 ) {
#print "a = ",a," ", $0
for( i = 1; i <= NF; i++) {
b = index($i,".")
print "b = ",b," ", $0
if( b > 0 ) {
print "before Tab total= ", tab_total
tab_total+= ($i * 100)
print "after Tab total= ", tab_total
}
}
}
# Find the date line
if ( c > 50 ) {
check_num = $1
check_date = $2
if( NF > 2 ) who_to = $3
}
# Find the printed check amount
if ( a > 50 ) {
print "a= ",a," ",$1
pcheck_amt = ($1+0) * 100
print "$1 = ", $1," *100 = ", (($1+0) * 100 )
}
#print $0
if( $0 ~ /Dollars and/ ) {
# found check body.
gsub(/\*\*$/, "", $0)
printf "RAW tab_total %d format %%d\n", tab_total
printf "RAW pcheck_amt %d format %%d\n", pcheck_amt
printf "RAW pcheck_amt %f format %%f\n", pcheck_amt
printf "RAW pcheck_amt/100 %d format %%d\n", pcheck_amt/100
printf "tab_total - pcheck_amt %f\n", tab_total - pcheck_amt
printf "pcheck_amt - tab_total %f\n", pcheck_amt - tab_total
if (( tab_total - pcheck_amt) == 0 ) print "true"
if ( pcheck_amt != tab_total) {
printf"\n Check # %6d %s Sum tab %10.2f Check amount %7.2f\n %s\
n", check_num, check_date, tab_total/100, pcheck_amt/100, $0
}
tab_total=0
}
}
And the output of the check above
# cat check63282 | gawk -f bbprint_scan.awk
b = 0 99820989 20220616 326.10
b = 0 99820989 20220616 326.10
b = 4 99820989 20220616 326.10
before Tab total= 0
after Tab total= 32610
b = 0 Discount -3.26
b = 3 Discount -3.26
before Tab total= 32610
after Tab total= 32284
a= 74 322.84
$1 = 322.84 *100 = 32284
RAW tab_total 32284 format %d
RAW pcheck_amt 32283 format %d
RAW pcheck_amt 32284.000000 format %f
RAW pcheck_amt/100 322 format %d
tab_total - pcheck_amt 0.000000 format %f
pcheck_amt - tab_total -0.000000 format %f
tab_total - pcheck_amt 7.275958e-12 format %e
pcheck_amt - tab_total -7.275958e-12 format %e
Check # 63282 06/21/2022 Sum tab 322.84 Check amount 322.84
Three Hundred Twenty Two Dollars and 84 Cents******************************
#
side note :
gsub(/^M/, "") # strip DOS line ending
^M is what some apps use to visual the invisible byte \r, but when u directly try to use that in a regex, what it actually ends up doing is stripping an ASCII capital letter M whenever it's the first letter in the entire line.
if u wanna deal with line endings, do this instead
RS = "\r?\n" # preferred
or
sub("\15$", "")

Concatenate columns and adds digits awk

I have a csv file:
number1;number2;min_length;max_length
"40";"1801";8;8
"40";"182";8;8
"42";"32";6;8
"42";"4";6;6
"43";"691";9;9
I want the output be:
4018010000;4018019999
4018200000;4018299999
42320000;42329999
423200000;423299999
4232000000;4232999999
42400000;42499999
43691000000;43691999999
So the new file will be consisting of:
column_1 = a concatenation of old_column_1 + old_column_2 + a number
of "0" equal to (old_column_3 - length of the old_column_2)
column_2 = a concatenation of old_column_1 + old_column_2 + a number of "9" equal
to (old_column_3 - length of the old_column_2) , when min_length = max_length. And when min_length is not equal with max_length , I need to take into account all the possible lengths. So for the line "42";"32";6;8 , all the lengths are: 6,7 and 8.
Also, i need to delete the quotation mark everywhere.
I tried with paste and cut like that:
paste -d ";" <(cut -f1,2 -d ";" < file1) > file2
for the concatenation of the first 2 columns, but i think with awk its easier. However, i can't figure out how to do it. Any help it's apreciated. Thanks!
Edit: Actually, added column 4 in input.
You may use this awk:
awk 'function padstr(ch, len, s) {
s = sprintf("%*s", len, "")
gsub(/ /, ch, s)
return s
}
BEGIN {
FS=OFS=";"
}
{
gsub(/"/, "");
for (i=0; i<=($4-$3); i++) {
d = $3 - length($2) + i
print $1 $2 padstr("0", d), $1 $2 padstr("9", d)
}
}' file
4018010000;4018019999
4018200000;4018299999
42320000;42329999
423200000;423299999
4232000000;4232999999
42400000;42499999
43691000000;43691999999
With awk:
awk '
BEGIN{FS = OFS = ";"} # set field separator and output field separator to be ";"
{
$0 = gensub("\"", "", "g"); # Drop double quotes
s = $1$2; # The range header number
l = $3-length($2); # Number of zeros or 9s to be appended
l = 10^l; # Get 10 raised to that number
print s*l, (s+1)*l-1; # Adding n zeros is multiplication by 10^n
# Adding n nines is multipliaction by 10^n + (10^n - 1)
}' input.txt
Explanation inline as comments.

How to use a variable in the format specifier statement?

I can use:
write (*, FMT = "(/, X, 17('-'), /, 2X, A, /, X, 17('-'))") "My Program Name"
to display the following lines on the console window:
-----------------
My Program Name
-----------------
Now, I want to show a pre-defined character instead of - in the above format. I tried this code with no success:
character, parameter :: Chr = Achar(6)
write (*, FMT = "(/, X, 17(<Chr>), /, 2X, A, /, X, 17(<Chr>))") "My Program Name"
Obviously, there are another ways to display what I am trying to show by means of a variable in the format specifier statement. For instance:
character, parameter :: Chr = Achar(6)
integer :: i, iMax = 17
write (*, FMT = "(/, X, <iMax>A1, /, 2X, A, /, X, <iMax>A1)") (Chr, i = 1, iMax), &
"My Program Name", &
(Chr, i = 1, iMax)
However, I would like to know if there is any way to use a variable or invoke a function in the format specifier statement.
The code you are trying to use (<>) is not standard Fortran. It is an extension accepted by some compilers. Just build the format string as a string.
"(/, X, 17(" // Chr // "), /, 2X, A, /, X, 17(" // Chr // "))"
For the the numeric case you have to prepare a string with the value
write(chMax, *) iMax
"(/, X, " // chMax // "A1, /, 2X, A, /, X, " // chMax // "A1)"
or you can use some function, if you have it
"(/, X, " // itoa(iMax) // "A1, /, 2X, A, /, X, " // itoa(iMax) // "A1)"
but it may still be preferable to call it beforehand, to avoid multiple calls.
The function can look like:
function itoa(i) result(res)
character(:),allocatable :: res
integer,intent(in) :: i
character(range(i)+2) :: tmp
write(tmp,'(i0)') i
res = trim(tmp)
end function

map integer values to chars in array

I've got a method that generates random strings:
def generate_letters(length)
chars = 'ABCDEFGHJKLMNOPQRSTUVWXYZ'
letters = ''
length.times { |i| letters << chars[rand(chars.length)] }
letters
end
I want to map values to generated strings, e.g.(1):
A = 1, B = 2, C = 3 , e.g.(2):
if I generate ACB it equals to 132. Any suggestions?
You can use that for concatenating these values:
s = 'ACB'
puts s.chars.map{ |c| c.ord - 'A'.ord + 10 }.join.to_i
# => 101211
and to sum them instead use Enumerable#inject method (see docs, there are some nice examples):
s.chars.inject(0) { |r, c| r + (c.ord - 'A'.ord + 10) } # => 33
or Enumerable#sum if you're doing it inside Rails:
s.chars.sum { |c| c.ord - 'A'.ord + 10 } # => 33
How would you deal with the ambiguity for leters above 10 (J) ?
For example, how would you differentiate between BKC=2113 and BAAC=2113?
Disregarding this problem you can do this:
def string_to_funny_number(str)
number=''
str.each_byte{|char_value| number << (1 + char_value - 'A'.ord).to_s}
return number.to_i
end
This function will generate a correct int by concatenating each letter value (A=1,B=2,...)
Beware that this function doesn't sanitize input, as i am assuming you are using it with output from other function.

How to format a number with padding in Erlang

I need to pad the output of an integer to a given length.
For example, with a length of 4 digits, the output of the integer 4 is "0004" instead of "4". How can I do this in Erlang?
adding a bit of explanation to Zed's answer:
Erlang Format specification is: ~F.P.PadModC.
"~4..0B~n" translates to:
~F. = ~4. (Field width of 4)
P. = . (no Precision specified)
Pad = 0 (Pad with zeroes)
Mod = (no control sequence Modifier specified)
C = B (Control sequence B = integer in default base 10)
and ~n is new line.
io:format("~4..0B~n", [Num]).
string:right(integer_to_list(4), 4, $0).
The problem with io:format is that if your integer doesn't fit, you get asterisks:
> io:format("~4..0B~n", [1234]).
1234
> io:format("~4..0B~n", [12345]).
****
The problem with string:right is that it throws away the characters that don't fit:
> string:right(integer_to_list(1234), 4, $0).
"1234"
> string:right(integer_to_list(12345), 4, $0).
"2345"
I haven't found a library module that behaves as I would expect (i.e. print my number even if it doesn't fit into the padding), so I wrote my own formatting function:
%%------------------------------------------------------------------------------
%% #doc Format an integer with a padding of zeroes
%% #end
%%------------------------------------------------------------------------------
-spec format_with_padding(Number :: integer(),
Padding :: integer()) -> iodata().
format_with_padding(Number, Padding) when Number < 0 ->
[$- | format_with_padding(-Number, Padding - 1)];
format_with_padding(Number, Padding) ->
NumberStr = integer_to_list(Number),
ZeroesNeeded = max(Padding - length(NumberStr), 0),
[lists:duplicate(ZeroesNeeded, $0), NumberStr].
(You can use iolist_to_binary/1 to convert the result to binary, or you can use lists:flatten(io_lib:format("~s", [Result])) to convert it to a list.)
Eshell V12.0.3 (abort with ^G)
1> F = fun(Max, I)-> case Max - length(integer_to_list(I)) of X when X > 0 -> string:chars($0, X) ++ integer_to_list(I); _ -> I end end.
#Fun<erl_eval.43.40011524>
2> F(10, 22).
"0000000022"
3> F(3, 22345).
22345