Raku operator for 2's complement arithmetic? - raku

I sometimes use this:
$ perl -e "printf \"%d\", ((~18446744073709551592)+1)"
24
I can't seem to do it with Raku. The best I could get is:
$ raku -e "say +^18446744073709551592"
-18446744073709551593
So: how can I make Raku give me the same answer as Perl ?

Gotta go with (my variant¹ of) Liz's custom op (in her comment below).
sub prefix:<²^>(uint $a) { (+^ $a) + 1 }
say ²^ 18446744073709551592; # 24
My original "semi-educated wild guess"² that turned out to be acceptable to #zentrunix and the basis for Liz's op:
say (+^ my uint $ = 18446744073709551592) + 1; # 24
\o/ It works!³
Footnotes
¹ I flipped the two character op because I wanted to follow the +^ form, have it sub-vocalize as "two's complement", and avoid it looking like ^2.
² One line of thinking was about the particular integer. I saw that 18446744073709551592 is close to 2**64. Another was that integers are limited precision in Perl unless you do something to make them otherwise, whereas in Raku they are arbitrary precision unless you do something to make them otherwise. A third line of thinking came from reading the doc for prefix +^ which says "converts the number to binary using as many bytes as needed" which I interpreted as meaning that the representation is somehow important. Hmm. What if I try an int variable? Overflow. (Of course.) uint? Bingo.
³ I've no idea if this solution is right for the wrong reasons. Or even worse. One thing that's concerning is that uint in Raku is defined to correspond to the largest native unsigned integer size supported by the Raku compiler used to compile the Raku code. (Iirc.) In practice today this means Rakudo and whatever underlying platform is being targeted, and I think that almost certainly means C's uint64_t in almost all cases. I imagine perl has some similar platform dependent definition. So my solution, if it is a reasonable one, is presumably only portable to the degree that the Raku compiler (which in practice today means Rakudo) agrees with the perl binary (which in practice today means P5P's perl) when run on some platform. See also #p6steve's comment below.

'Long-hand' answer:
raku -e 'put ( (18446744073709551592.base(2) - 0b1).comb.map({!$_.Int+0}).join.parse-base(2));'
OR
raku -e 'say 18446744073709551592.base(2).comb.map({!$_.Int+0}).join.parse-base(2) + 1;'
Sample Output: 24
The answers above (should?) implement "Two's-Complement" encoding directly. Neither uses Raku's +^ twos-complement operator. The first one subtracts one from the binary representation, then inverts. The second one inverts first, then adds one. Neither answer feels truly correct, yet the same answer as Perl5 is obtained (24).
Looking at the Raku Docs page, one would conclude that the "twos-complement" of a positive number would be negative, hence it's not clear what the Perl (and now Raku) answers represent. Hopefully the foregoing is somewhat useful.
https://docs.raku.org/routine/+$CIRCUMFLEX_ACCENT

Related

Formatting in Raku

I have written a function that outputs a double, upto 25 decimal
places. I am trying to print it as a formatted output from Raku.
However, the output is incorrect and truncated.
See MWE:
my $var = 0.8144262510988963255087469;
say sprintf("The variable value is: %.25f", $var)
The above code gives The variable value is: 0.8144262510988963000000000 which is not what is expected.
Also, this seems weird:
my $var = 0.8144262510988963255087469;
say $var.Str.chars; # 29 wrong, expected 27
I tested the same in C:
#include <stdio.h>
int main() {
double var = 0.8144262510988963255087469;
printf("The variable value is: %.25lf \n", var);
return 0;
}
However, it works fine. Given the identical nature of sprintf and printf, I expected this C example to work in Raku too. Seems like %lf is not supported.
So is there a workaround to fix this?
I think this is actually a bug in how Rat literals are created. Or at least as WAT :-).
I actually sort of expect 0.8144262510988963255087469 to either give a compile time warning, or create a Num, as it exceeds the standard precision of a Rat:
raku -e 'say 0.8144262510988963255087469'
0.814426251098896400086204416
Note that these are not the same.
There is fortunately an easy workaround, by creating a FatRat
$ raku -e 'say 0.8144262510988963255087469.FatRat'
0.8144262510988963255087469
FWIW, I think this is worthy of creating an issue
From your question:
I have written a function that outputs a double, upto 25 decimal places.
From google:
Double precision numbers are accurate up to sixteen decimal places
From the raku docs :
When constructing a Rat (i.e. when it is not a result of some mathematical expression), however, a larger denominator can be used
so if you go
my $v = 0.8144262510988963255087469;
say $v.raku;
#<8144262510988963255087469/10000000000000000000000000>
it works.
However, do a mathematical expression such as
my $b = $a/10000000000000000000000000;
and you get the Rat => Num degradation applied unless you explicitly declare FatRats. I visualise this as the math operation placing the result in a Num register in the CPU.
The docs also mention that .say and .put may be less faithful than .raku, presumably because they use math operations (or coercion) internally.
Sorry to be the bearer of bad news, but 10**25 > 2 **64, but what you report as an issue is correct & (fairly) well documented behaviour given the constraints of double precision IEEE P754.

How to add a small bit of context in a grammar?

I am tasked to parse (and transform) a code of a computer language, that has a slight quirk in its rules, at least I see it this way. To be exact, the compiler treats new lines (as well as semicolons) as statement separators, but other than that (e.g. inside the statement) it treats them as spacers (whitespace).
As an example, this code:
try
local x = 5 / 0
catch (i)
print(i + "\n")
is proved to be equivalent to this:
try local x = 5 / 0 catch (i) print(i + "\n")
I don't see how I can express such a rule in EBNF, or specifically in Lark EBNF dialect. I mean in a sensible way. I probably could define all possible newline positions inside all statements, but it would be cumbersome and error-prone.
I wish to find a way to treat newlines contextually. Is there a proven method for this, preferably within Python/Lark domain? If I have to modify the parser for that purpose, then where should I start?
Or if I misunderstood something in this language in particular or in machine language parsing in general, or my statement of the problem is wrong, I'd also be happy to get educated.
(As you may guess, the language in question has a well proven implementation, but no officially defined grammar. Also, it is Squirrel, for all that it matters.)
The relevant quote from the "specification" is this:
A squirrel program is a simple sequence of statements.:
stats := stat [';'|'\n'] stats
[...] Statements can be separated with a new line or ‘;’ (or with the keywords case or default if inside a switch/case statement), both symbols are not required if the statement is followed by ‘}’.
These are relatively complex rules and in their totality not context free if newlines can also be ignored everywhere else. Note however that in my understanding the text implies that ; or \n are required when no of the other cases apply. That would make your example illegal. That probably means that the BNF as written is correct, e.g. both ; and \n are optionally everywhere. In that case you can (for lark) just put an %ignore "\n" statement and it should work fine.
Also, lark should not complain if you both ignore the \n and use it in a rule: Where useful it will match it in a rule, otherwise it will just ignore it. Note however that this breaks if you use a Terminal that includes the \n (e.g. WS or /\s/). Just have \n as an extra case.
(For the future: You will probably get faster response for lark questions if you ask over on gitter or at least put a link to SO there.)

How to set lang env to "en_US.UTF-8" in awk script?

I am using awkc to generate an executable file from an awk script. I have the following line in an awk script abc.awk:
BEGIN{printf "Value=%s\n",(3.13+3.26)}
I have generated an executable file (abc.exe), which I have executed on different systems. It gives different outputs in floating point operations.
On one local system it gives the output 6.39 but it gives the output 6 on another system located in a different time zone.
When I searched in various sites I am able to see to set the LANG environmental variable, but how?
I'm not sure that this to do with the locale settings on your different systems but if you are looking for a floating-point number, you should use the "%f" format specifier. To get an answer to 2 decimal places, use "%.2f":
BEGIN{printf "Value=%.2f\n",(3.13+3.26)}
This should give the same result, regardless of the system it is run on.
Edit: based on the linked question, perhaps you should try using LC_NUMERIC=C to explicitly set the locale:
LC_NUMERIC=C awk 'BEGIN{printf "Value=%.2f\n",(3.13+3.26)}'
should work regardless of the system it is being run on.
Environment variables are typically something that you do not control if you hand a program over to someone else. I would look into being more specific in your printf-calls. You can format the numbers into strings in to an astonishing detail. You are probably looking for the float (%f) (you are now using a string (%s).
If you notice that you are making always the same trick with your printf-calls, you can control the number-to-string conversion settings with a variable CONVFMT in a BEGIN-block. I got introduced to this one through gawk, apparently it is however in the POSIX standard. Whether awkc supports this, I don't know.
From the gawk manual:
This string controls conversion of numbers to strings (see
Conversion). It works by being passed, in effect, as the first
argument to the sprintf() function (see String Functions). Its default
value is "%.6g". CONVFMT was introduced by the POSIX standard.

Failure to read full line including embedded zero bytes

Lua script:
i=io.read()
print(i)
Command line:
echo -e "sala\x00m" | lua ll.lua
Output:
sala
I want it to print all character from input, similar to this:
salam
in HEX editor:
0000000: 7361 6c61 006d 0a sala.m.
How can I print all character from input?
You tripped over one of the few places where the Lua standard library is still not 8-bit-clean.
Specifically, file reading line-by-line is not embedded-0 proof.
The reason it isn't yet is an unfortunate combination of:
Only standard C90 or equally portable constructs are allowed for the core, which does not provide for efficient 0-clean text parsing.
Every solution discussed to date on the mailinglist under that constraint has considerable overhead.
Embedded 0-bytes in text files are quite rare.
Workarounds:
Use a modified library, fixing these formats: "*l" "*L" for file:read(...)
parse your raw data yourself. (read a block using a number or as much as possible using "*a")
Badger the Lua developers/maintainers for a bugfix until they give in.

In awk, is there a functional difference between putting conditions outside the brackets or inside?

As far as I can tell, this:
$2 > 50{print}
functions exactly the same way as this:
{if ($2 > 50) print}
When should I use one vs the other, or is it a matter of style?
2nd one is much more readable in my opinion (especially for non awk gurus who may have to maintain you code).
It's mostly style. The first example is canonical AWK. There are, of course, times when an if statement is required.
Consider that:
$2 > 50 {foo; print}
takes 20 characters versus 27 for the following:
{if ($2 > 50) {foo; print}}
If you remove spaces, it would be 16 versus 22 characters.
So the former is more desirable if you are interested in economy (both in terms of actual characters and visually).
As potong points out, the if form requires curly braces if there are more than one statement (as I have shown in my second example above). If there is only one statement, they are optional. This is the source of subtle bugs if it's written without and additional statements are later added without adding braces. Also, for the reader of a script, the intent is much clearer if the braces are included.
Yes, they are the same. As with most syntax alternatives, there are trade offs.
I mostly agree with John3136 that 2nd form has a higher likelyhood of being understood by someone with background in c, c++, C#, java, and others.
And as Dennis Williamson points out, there are times you have to use the if (...) form.
For one-lines, I think the first form is useful, acceptable, and helps reinforce the primary design of awk, i.e. [pattern]-[action] (being optional or providing a default behaviour).
If you look at comp.lang.awk, you'll see that the long-timers mostly prefer style 1.
Large blocks of code in style 1, to my eye, are difficult to read, but do have the look of lex code.
So.... depends ;-) .... are you writing code that will ultimately be maintained by likely non-awk experts? Then stick w style 2. Writing for yourself? Use both so you don't get rusty!
I hope this helps.
Yes they are the same.
On the first form, you can omit { print } because it's the default action. So, that works too:
$2 > 50