Do array indices need to be native ints? - raku

I am trying to obtain the umptieth element of an array:
my #lazy-array = lazy 1, 11, 121 ... 10**100;
say #lazy-array[10**50];
This yields
Cannot unbox 167 bit wide bigint into native integer
Same problem if I assign it to a variable. This does not seem to be reflected in the documentation, and wonder if it's a feature or a bug. Also, what would be the correct way of acessing those positions (other than iterating)

In the current implementation in Raku, which is based on NQP, array indexes have a maximum of 63 bits (at least on 64bit builds).
use nqp;
my $l := nqp::list;
dd nqp::atpos($l,0x7fff_ffff_ffff_ffff); # Mu
dd nqp::atpos($l,0x7fff_ffff_ffff_ffff + 1);
# Cannot unbox 64 bit wide bigint into native integer
I would not consider it a feature or a bug, but a limitation of the current implementation.
Please note that you could use Array::Sparse if you want to use larger indexes.

Related

How can I reduce the size of this file to 83 bytes or less?

Task description:
Create the divisibility function, which expects an array of integers and its size as a parameter. The function returns how many numbers divisible by two are in the array!
We want to save on the available storage space, so our source code can be a maximum of 83 bytes!
My current code:
int oszthatosag(int a[], int s){int r=0,i;for (i = 0; i < s; ++i)if(a[i] % 2 == 0)++r;return r;}
has a size of 96 bytes. I deleted all the unnecessary whitespaces and reduced the lenghths of the variables to a minimum, but it still doesn't seem to be enough.
You can remove additional spaces, use a branch-less strategy to remove the if, move the initialization/increment of i outside the loop, and decrease r starting from s so to remove the ==0. The resulting code should not only be shorter, but also faster. This assumes s>=0 (otherwise r is smaller than expected).
Here is the final result:
int oszthatosag(int a[],int s){int r=s,i=0;while(i<s)r-=a[i++]&1;return r;}
As pointed out by #Brendan, here is an even shorter version (still assuming s>=0):
int oszthatosag(int a[],int s){int r=s;while(s)r-=a[--s]&1;return r;}
Note that in C, the default return type is int so you can omit it (required in C++). This is generally not a good idea (and cause compiler warnings), but it is shorter:
oszthatosag(int a[],int s){int r=s;while(s)r-=a[--s]&1;return r;}

Kotlin Primitives: How to reinterpret ByteArrays as the bits of primitives in Common Multiplatform code?

TL;DR is there an equivalent of C++'s reinterpret_cast<[primitive]>(bytes) for Kotlin Multiplatform?
Basically, what I am looking for is the following functionality:
You have a ByteArray with length, let's say, 4 bytes. The contents are
00 00 00 2A
Then there should be some function or operator to reinterpret these 4 bytes as an Int:
asInt(byteArrayOf(0x00, 0x00, 0x00, 0x2A))
Ideally, there would also be a way to control the Endian-ness of this operation. And most importantly, I'd want this to work on all available platforms (JVM, JS, Native). The question is: Is there such an operation?
Currently, I am doing the following:
For the integral types (Byte, Short, Int, Long), I use SHL / OR to construct the actual primitive. But this is of course not as efficient as just reinterpreting (maybe also copying) the value as the primitive value, since the bits are already in the right configuration
For the floating-point values, I'm still struggling: So far, I have not found a platform-independent solution, so I use expect / actual. I've found solutions for JVM and JS already (although not that satisfying), but I still don't know how to do it on Native. If it turns out that there is no common solution to the whole problem, I'd be very thankful if someone could point me towards a solution just for Float / Double converting on Native.
Thank you very much!
Your existing solution for integral types is likely to be ideal for multiplatform. For floating point values, you can convert them to and from int/long with
Double/Float.fromBits and Double/Float.toRawBits, and then use your existing solution for integral types.

Handling magic constants during 64-bit migration

I confess I did something dumb and it now bites me. I used a magic number constant defined as NSUIntegerMax to define a special case index. The value is normally used as index to access selected item in NSArray. In the special case, denoted by the magic number I get the value from elsewhere, instead of from the array.
This index value is serialized in User Defaults as NSNumber.
With Xcode 5.1 my iOS app gets compiled with standard architecture that now also includes arm64. This changed the value of NSUIntegerMax, so now after deserialization I get 32-bit value of NSUIntegerMax, which no longer matches in comparisons with the magic number, whose value is now 64-bit NSUIntegerMax. And it results in NSRangeException with reason: -[__NSArrayI objectAtIndex:]: index 4294967295 beyond bounds [0 .. 10].
It is a minor issue in my code, given the normal range of that array is small, I may just get away with redefining my magic number as 4294967295. But it doesn't feel right. How should I have handled this issue properly?
I guess avoiding the magic number altogether would be the most robust approach?
Note
I think the problem with my magic number is roughly equivalent to what happened to NSNotFound constant. Apple's 64-bit Transition Guide for Cocoa Touch says in section about Common Type-Conversion Problems in Cocoa Touch:
Working with constants defined in the framework as NSInteger. Of particular note is the NSNotFound constant. In the 64-bit runtime, its value is larger than the maximum range of an int type, so truncating its value often causes errors in your app.
… but it does not say what should be done, except to be careful ;-)
If you use NSInteger/NSUInteger it's 4b on 32bit OS and 8b on 64 OS.
If you want to use the the same size integer for both OSs you should consider use int (4) or long long (8) or int32_t/int64_t. To get max int from int you can use cast:
(int)INT_MAX
//or LONG_MAX

Trie Implementation Question

I'm implementing a trie for predictive text entry in VB.NET - basically autocompletion as far as the use of the trie is concerned. I've made my trie a recursive data structure based on the generic dictionary class.
It's basically:
class WordTree Inherits Dictionary(of Char, WordTree)
Each letter in a word (all upper cased) is used as a key to a new WordTrie. A null character on a leaf indicates the termination of a word. To find a word starting with a prefix I walk the trie as far as my prefix goes then collect all children words.
My question is basically on the implementation of the trie itself. I'm using the dictionary hash function to branch my tree. I could use a list and do a linear search over the list, or do something else. What's the smooth move here? Is this a reasonable way to do my branching?
Thanks.
Update:
Just to clarify, I'm basically asking if the dictionary branching approach is obviously inferior to some other alternative. The application in which I'm using this data structure only uses upper case letters, so maybe the array approach is the best. I might use the same data structure for a more complex typeahead situation in the future (more characters). In that case, it sounds like the dictionary is the right approach - up to the point where I need to use something more complex in general.
If it's just the 26 letters, as a 26 entry array. Then lookup is by index. It probably uses less space than the Dictionary if the bucket-list is longer than 26.
If you are worried about space, you can use bitmap compression on the valid byte transitions, assuming the 26char limit.
class State // could be struct or whatever
{
int valid; // can handle 32 transitions -- each bit set is valid
vector<State> transitions;
State getNextState( int ch )
{
int index;
int mask = ( 1 << ( toupper( ch ) - 'A' )) -1;
int bitsToCount = valid & mask;
for( index = 0; bitsToCount ; bitsToCount >>= 1)
{
index += bitsToCount & 1;
}
transitions.at( index );
}
};
There are other ways to do the bit counting Here, the index into the vector is the number of set bits in the valid bitset. the other alternative is the direct indexed array of states;
class State
{
State transitions[ 26 ]; // use the char as the index.
State getNextState( int ch )
{
return transitions[ ch ];
}
};
A good data structure that's efficient in space and potentially gives sub-linear prefix lookups is the ternary search tree. Peter Kankowski has a fantastic article about it. He uses C, but it's straightforward code once you understand the data structure. As he mentioned, this is the structure ispell uses for spelling correction.
I have done this (a trie implementation) in C with 8 bit chars, and simply used the array version (as alluded to by the "26 chars" answer).
HOWEVER, I am guessing that you want full unicode support (since a .NET char is unicode, among other reasons). Assuming you have to have support for unicode, the hash/map/dictionary lookup is probably your best bet, as a 64K entry array in each node won't really work very well.
About the only hack up I could think of on this is to store entire strings (suffixes or possibly "in-fixes") on branches that do not yet split, depending on how sparse the tree, er, trie, is. That adds a lot of logic to detect the multi-char strings, though, and to split them up when an alternate path is introduced.
What is the read vs update pattern?
---- update jul 2013 ---
If .NET strings have a function like java to get the bytes for a string (as UTF-8), then having an array in each node to represent the current position's byte value is probably a good way to go. You could even make the arrays variable size, with first/last bounds indicators in each node, since MANY nodes will have only lower case ASCII letters anyway, or only upper case letters or the digits 0-9 in some cases.
I've found burst trie's to be very space efficient. I wrote my own burst trie in Scala that also re-uses some ideas that I found in GWT's trie implementation. I used it in Stripe's Capture the Flag contest on a problem that was multi-node with a small amount of RAM.

Is there a practical limit to the size of bit masks?

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.
I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.
What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?
I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.
Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):
// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//set the right bit
bytes[offset >> 3] |= (1 << (offset & 0x7));
return 0; //success
}
//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//get the right bit
return (bytes[offset >> 3] & (1 << (offset & 0x7));
}
I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";
(journalling masks in flash memory if you want to know)
many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.
My 2 cents.
With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)
How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.
If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).
Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.
Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )
However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.
( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )
The instant you have to crack open a constants table to debug code you know you've gone too far.
Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.
For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.
That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.
You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.
Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.
Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.