Currently studying bitwise arithmetic. It's really easy, because I have some CS background. But I just don't understand one moment with this operator.
For example:
variable3 = variableOne & 3;
or
variable3 &= 3;
Actually this doesn't matter.
I don't understand how the process of setting the bits to 0 is going on. And how you can process it on the paper?
Let’s say 5&3, four-bit width:
0101b = 5dec
0011b = 3dec
------------
0001b = 1dec
You just & the bits in the same column. And since the & operator only returns 1 when both arguments are 1, the higher bits from 5 not present in 3 are masked out.
As for your example from the comments:
$ perl -E 'printf "%b\n", 0x76'
1110110
And now:
1110110 = 0x76
0000011 = 3dec
-------
0000010 = 2dec
…and just to validate:
$ perl -E 'say 0x76&3'
2
The schema is simple, you just & each column:
x
y
-
z
Where z is x&y.
Aha, judging by your comments in the neighbouring answer the problem is elsewhere. Numeric variables do not contain “hexadecimal values” in them. Numeric variables contain a bit pattern representing a number. “A number” is never binary, decimal or hexadecimal. When you say “three”, there’s no number system in play, three is a three no matter what.
When you say something like var x = 0x76 in the source code, the machine reads the hexadecimal representation of the number, creates a bit pattern representing this number and stores it in the memory corresponding to the variable. And when you then say something like x &= 3, the machine creates a bit pattern representing number three, combines that with the bit pattern stored in the variable and stores the result in the variable.
Related
I have written a function that outputs a double, upto 25 decimal
places. I am trying to print it as a formatted output from Raku.
However, the output is incorrect and truncated.
See MWE:
my $var = 0.8144262510988963255087469;
say sprintf("The variable value is: %.25f", $var)
The above code gives The variable value is: 0.8144262510988963000000000 which is not what is expected.
Also, this seems weird:
my $var = 0.8144262510988963255087469;
say $var.Str.chars; # 29 wrong, expected 27
I tested the same in C:
#include <stdio.h>
int main() {
double var = 0.8144262510988963255087469;
printf("The variable value is: %.25lf \n", var);
return 0;
}
However, it works fine. Given the identical nature of sprintf and printf, I expected this C example to work in Raku too. Seems like %lf is not supported.
So is there a workaround to fix this?
I think this is actually a bug in how Rat literals are created. Or at least as WAT :-).
I actually sort of expect 0.8144262510988963255087469 to either give a compile time warning, or create a Num, as it exceeds the standard precision of a Rat:
raku -e 'say 0.8144262510988963255087469'
0.814426251098896400086204416
Note that these are not the same.
There is fortunately an easy workaround, by creating a FatRat
$ raku -e 'say 0.8144262510988963255087469.FatRat'
0.8144262510988963255087469
FWIW, I think this is worthy of creating an issue
From your question:
I have written a function that outputs a double, upto 25 decimal places.
From google:
Double precision numbers are accurate up to sixteen decimal places
From the raku docs :
When constructing a Rat (i.e. when it is not a result of some mathematical expression), however, a larger denominator can be used
so if you go
my $v = 0.8144262510988963255087469;
say $v.raku;
#<8144262510988963255087469/10000000000000000000000000>
it works.
However, do a mathematical expression such as
my $b = $a/10000000000000000000000000;
and you get the Rat => Num degradation applied unless you explicitly declare FatRats. I visualise this as the math operation placing the result in a Num register in the CPU.
The docs also mention that .say and .put may be less faithful than .raku, presumably because they use math operations (or coercion) internally.
Sorry to be the bearer of bad news, but 10**25 > 2 **64, but what you report as an issue is correct & (fairly) well documented behaviour given the constraints of double precision IEEE P754.
from GNU gawk's page
https://www.gnu.org/software/gawk/manual/html_node/Checking-for-MPFR.html
they have a formula to check arbitrary precision
function adequate_math_precision(n) { return (1 != (1+(1/(2^(n-1))))) }
My question is : wouldn't it be more efficient by staying within integer math domain with a formula such as
( 2^abs(n) - 1 ) % 2 # note 2^(n-1) vs. 2^|n| - 1
Since any power of 2 must also be even, then subtracting 1 must always be odd, then its modulo (%) over 2 becomes indicator function for is_odd() for n >= 0, while the abs(n) handles the cases where it's negative.
Or does the modulo necessitate a casting to float point, thus nullifying any gains ?
Good question. Let's tackle it.
The proposed snippet aims at checking wether gawk was invoked with the -M option.
I'll attach some digression on that option at the bottom.
The argument n of the function is the floating point precision needed for whatever operation you'll have to perform. So, say your script is in a library somewhere and will get called but you have no control over it. You'll run that function at the beginning of the script to promptly throw exception and bail out, suggesting that the end result will be wrong due to lack of bits to store numbers.
Your code stays in the integer realm: a power of two of an integer is an integer. There is no need to use abs(n) here, because there is no point in specifying how many bits you'll need as a negative number in the first place.
Then you subtract one from an even, integer number. Now, unless n=0, in which case 2^0=1 and then your code reads (1 - 1) % 2 = 0, your snippet shall always return 1, because the quotient (%) of an odd number divided by two is 1.
Problem is: you are trying to calculate a potentially stupidly large number in a function that should check if you are able to do so in the first place.
Since any power of 2 must also be even, then subtracting 1 must always
be odd, then its modulo (%) over 2 becomes indicator function for
is_odd() for n >= 0, while the abs(n) handles the cases where it's
negative.
Except when n=0 as we discussed above, you are right. The snippet will tell that any power of 2 is even, and any power of 2, minus 1, is odd. We were discussing another subject entirely thought.
Let's analyze the other function instead:
return (1 != (1+(1/(2^(n-1)))))
Remember that booleans in awk runs like this: 0=false and non zero equal true. So, if 1+x where x is a very small number, typically a large power of two (2^122 in the example page) is mathematically guaranteed to be !=1, in the digital world that's not the case. At one point, floating computation will reach a precision rock bottom, will be rounded down, and x=0 will be suddenly declared. At that point, the arbitrary precision function will return 0: false: 1 is equal 1.
A larger discussion on types and data representation
The page you link explains precision for gawk invoked with the -M option. This sounds like technoblahblah, let's decipher it.
At one point, your OS architecture has to decide how to store data, how to represent it in memory so that it can be accessed again and displayed. Terms like Integer, Float, Double, Unsigned Integer are examples of data representation. We here are addressing Integer representation: how is an integer stored in memory?
A 32-bit system will use 4 bytes to represent and integer, which in turn determines how larger the integer will be. The 32 bits are read from most significative (MSB) to less significative (LSB) and if signed, one bit will represent the sign (the MSB typically, drastically reducing the max size of the integer).
If asked to compute a large number, a machine will try to fit in in the max number available. If the end result is larger than that, you have overflow and end up with a wrong result or an error. Many online challenges typically ask you to write code for arbitrary long loops or large sums, then test it with inputs that will break the 64bit barrier, to see if you master proper types for indexes.
AWK is not a strongly typed language. Meaning, any variable can store data, regardless of the type. The data type can change and it is determined at runtime by the interpreter, so that the developer doesn't need to care. For instance:
$awk '{a="this is text"; print a; a=2; print a; print a+3.0*2}'
-| this is text
-| 2
-| 8
In the example, a is text, then is an integer and can be summed to a floating point number and printed as integer without any special type handling.
The Arbitrary Precision Page presents the following snippet:
$ gawk -M 'BEGIN {
> s = 2.0
> for (i = 1; i <= 7; i++)
> s = s * (s - 1) + 1
> print s
> }'
-| 113423713055421845118910464
There is some math voodoo behind, we will skip that. Since s is interpreted as a floating point number, the end result is computed as floating point.
Try to input that number on Windows calculator as decimal, and it will fail. Although you can compute it as a binary. You'll need the programmer setting and to add up to 53 bits to be able to fit it as unsigned integer.
53 is a magic number here: with the -M option, gawk uses arbitrary precision for numbers. In other words, it commandeers how many bits are necessary, track them and breaks free of the native OS architecture. The default option says that gawk will allocate 53 bits for any given arbitrary number. Fun fact, the actual result of that snippet is wrong, and it would take up to 100 bits to compute correctly.
To implement arbitrary large numbers handling, gawk relies on an external library called MPFR. Provided with an arbitrary large number, MPFR will handle the memory allocation and bit requisition to store it. However, the interface between gawk and MPFR is not perfect, and gawk can't always control the type that MPFR will use. In case of integers, that's not an issue. For floating point numbers, that will result in rounding errors.
This brings us back to the snippet at the beginning: if gawk was called with the -M option, numbers up to 2^53 can be stored as integers. Floating points will be smaller than that (you'll need to make the comma disappear somehow, or rather represent it spending some of the bits allocated for that number, just like the sign). Following the example of the page, and asking an arbitrary precision larger than 32, the snippet will return TRUE only if the -M option was passed, otherwise 1/2^(n-1) will be rounded down to be 0.
say "1 10".split(" ")
returns (1,10)
When I use those 1 and 10 as arguments to the sequence operator [...]
say [...] "1 10".split(" ")
returns just (1) while it's supposed to return (1 2 3 4 5 6 7 8 9 10) I guess it's because the output of the split function is interpreted as string.
How to solve that problem? Thank you.
If you want numeric behavior then coerce to numerics:
say [...] +<< "1 10".split(" "); # (1 2 3 4 5 6 7 8 9 10)
This uses a << hyperop to apply a numeric coercion (prefix +) to each element of the sequence generated by the split.
Regarding sequence and range behavior with string endpoints:
SO Why does the Perl 6 sequence 'A' … 'AA' have only one element?. What's described in the linked SO applies to the sequence you've specified, namely "1"..."10".
The open Rakudo issue Sequence operator with string endpoints and no explicit generator produces unintuitive/undocumented results.
SO Why are some of my ranges insane?.
What you've written is equivalent to:
put gist "1"..."10";
(say is equivalent to put gist.)
A gist of "1"..."10" is (1).
That's because the gist of List.new("1") is (1) just like the gist of List.new("a") is (a).
And "1"..."10" evaluates to a List.new("1").
Why? I'm not sure yet but I'm exploring the available info.
Let's start with the doc. The doc for the infix ... op says:
The default generator is *.succ or *.pred, depending on how the end points compare
Well:
say "1" cmp "10"; # Less
which presumably means the sequence starts calling *.succ.
And then:
say "1".succ; # 2
and:
say "2" cmp "10"; # More
It seems this results in the sequence immediately terminating after the "1" rather than including the "2" and continuing.
I'm continuing to search the bug queues and examine the code around the area that #wamba++ linked in their answer to the above linked SO "Why does the Perl 6 sequence 'A' … 'AA' have only one element?".
As usual, raiph is giving the correct answer, but I find something missing about why it really does not work.
Main thing is that [] is a reduce operator, it's not applying whatever is inside it as an infix operator except as a side effect. For instance, this works:
say [+] <4 8>.words; # OUTPUT: «12»
But only because there are two components, and the reduce [] is applied to them, having the same effect. Ditto for ...
say [...] <4 8>.words; # OUTPUT: «(4 5 6 7 8)»
However that's not what you are looking for. You have two operands, a single operator, you want to call the operator itself. Which you can of course do by using its fully qualified name
say infix:<...>( | <3 5>.words ); # OUTPUT: «(3 4 5)»
as long as, of course, you flatten (with | ) its arguments to make it match the signature of an infix operator.
As usual, TIMTOWTDI. So do whatever suits you the best.
Actual task: I want to print the matrix (of my own implementation) in humanly readable format. As a pre-requisite, I figured I need to be able to specify "fit the number representation into X characters". I found #printShowingDecimalPlaces: and #printPaddedWith:to: in Float and Integer classes (the first method is in more general Number class). Individually, they work, but the former works on fractional part only and the the latter on part before fractional, e.g.:
10.3 printPaddedWith: Character space to: 5.
"' 10.3'"
-10.3 printPaddedWith: Character space to: 5.
"' -10.3'"
10.3 printShowingDecimalPlaces: 3.
"'10.300'"
Also, their action on very large (or equally small numbers) in scientific form is not ideal:
12.3e9 printShowingDecimalPlaces: 3.
"'12300000000.000'"
12.3e9 printPaddedWith: Character space to: 5.
"' 1.23e10'"
So, I would like something like Common Lisp's (FORMAT T "~10g" 12.3d9) or C's printf("%10g", 12.3e9), that (a) restricts the whole width to 10 characters and (b) chooses the most suitable format depending on the size of the number. Is there something like this in Pharo?
For versatile printing options, I suggest loading NumberPrinter package from
http://ss3.gemstone.com/ss/NumberPrinter/
(FloatPrinter fixed) digitCount: 2; print: 10.3.
-> '10.30'
I did not try it in recent Pharo versions though.
EDIT:
Ah, but I see no format for handling exponents multiple of 3, maybe you would have to create a subclass for such format.
EDIT:
Or I missunderstood: you don't want it to print as '12.3e9' but rather '1.23e10'? note that apart significand digitCount, you need extra size for at worst 1 for sign + 1 for fraction separator + 1 for exponent letter + 1 for exponent sign + 3 for exponent (worst case for double precision floating point).
The more or less equivalent to g format would be something like this:
(FloatPrinter freeFormat)
totalWidth: 13; "size of the generated string"
digitCount: 6; "number of significant figures"
print: -12.3e-205.
->' -1.23e-204'
In a given file record, I need to read the first two integer elements at first, and then the rest of the line (a large number of real elements), because the assignment depend on the first 2. Suppose the format of the first two integer elements is not really well defined.
The best way to solve the problem could be something:
read(unitfile, "(I0,I0)", advance='no') ii, jj
read(unitfile,*) aa(ii,jj,:)
But it seems to me the "(I0)" specification is not allowed in gfortran.
Basically the file read in unitfile could be something like:
0 0 <floats>
0 10 <floats>
10 0 <floats>
100 0 <floats>
100 100 <floats>
which is hard to be read with any fortran-like fixed field format specification.
Is there any other way to get around this, apparently trivial, problem?
This applies string manipulations to get the individual components, separated by blanks ' ' and/or tabs (char(9)):
program test
implicit none
character(len=256) :: string, substring
integer :: ii, jj, unitfile, stat, posBT(2), pos
real, allocatable :: a(:)
open(file='in.txt', newunit=unitfile, status='old' )
read(unitfile,'(a)') string
! Crop whitespaces
string = adjustl(trim(string))
! Get first part:
posBT(1) = index(string,' ') ! Blank
posBT(2) = index(string,char(9)) ! Tab
pos = minval( posBT, posBT > 0 )
substring = string(1:pos)
string = adjustl(string(pos+1:))
read(substring,*) ii
! Get second part:
posBT(1) = index(string,' ') ! Blank
posBT(2) = index(string,char(9)) ! Tab
pos = minval( posBT, posBT > 0 )
substring = string(1:pos)
string = adjustl(string(pos+1:))
read(substring,*) jj
! Do stuff
allocate( a(ii+jj), stat=stat )
if (stat/=0) stop 'Cannot allocate memory'
read(string,*) a
print *,a
! Clean-up
close(unitfile)
deallocate(a)
end program
For a file in.txt like:
1 2 3.0 4.0 5.0
This results in
./a.out
3.00000000 4.00000000 5.00000000
NOTE: This is just a quick&dirty example, adjust it to your needs.
[This answer has been significantly revised: the original was unsafe. Thanks to IanH for pointing that out.]
I generally try to avoid doing formatted input which isn't list-directed, when I can afford it. There's already an answer with string parsing for great generality, but I'll offer some suggestions for a simpler setting.
When you are relaxed about trusting the input, such as when it's just the formatting that's a bit tricky (or you 're happy leaving it to your compiler's bounds checking), you can approach your example case with
read(unitfile, *) ii, jj, aa(ii, jj, :)
Alternatively, if the array section is more complicated than given directly by the first two columns, it can be by an expression, or even by functions
read(unitfile, *) ii, jj, aa(fi(ii,jj), fj(ii,jj), :fn(ii,jj))
with pure integer function fi(ii,jj) etc. There is even some possibility of having range validation in those functions (returning a size 0 section, for example).
In a more general case, but staying list-directed, one could use a buffer for the real variables
read(unitfile, *) ii, jj, buffer(:) ! Or ... buffer(:fn(ii,jj))
! Validate ii and jj before attempting to access aa with them
aa(.., .., :) = buffer
where buffer is of suitable size.
Your first considered approach suggests you have some reasonable idea of the structure of the lines, including length, but when the number of reals is unknown from ii and jj, or when the type (and polymorphism reading isn't allowed) is not known, then things do indeed get tricky. Also, if one is very sensitive about validating input, or even providing meaningful detailed user feedback on error, this is not optimal.
Finally, iostat helps.