Type-infer string buffer - type-inference

Would it be possible to infer from the usage of a variable if a string buffer would be preferred over an immutable string, or a rope even?
Example, destructive update (thanks, dmbaturin):
a[2] = 'b'; // Used as mutable, better use buffer
The point being to get higher performant string operations without dumping the details on the programmer.

Related

Is it possible to reference a byte?

In C, it is possible to create an array and have a pointer pointing to a specific byte of that array, like this:
char array[] = "This is not a question.";
char *ptr = strchr(array, ' '); // points to the first space
This is extremely useful both for performance and reduce memory usage when parsing, sometimes I create data structures that just points to different bytes of the same buffer. I wonder if it is convenient and possible to do the same in Kotlin.
The equivalent in Java and Kotlin is simply to store an index into the array (or String).
Remember that the JVM has very powerful dynamic compilation and optimisation, so while in C that would be less efficient, on the JVM it generally won't be. (The difference generally wouldn't be significant in most applications, anyway.)
Also note that Kotlin uses Unicode, so a character is not the same as a byte. A Character is an unsigned two-byte number. (Characters outside the Basic Multilingual Plane are stored as a surrogate pair.)
So the equivalent would be:
val string = "This is not a question."
val i = string.indexOf(' ') // = 4, index of the first space
or
val array = byteArrayOf(1, 2, 3, 4, 5)
val i2 = array.indexOf(3) // = 2, index of the first occurrence of 3

Kotlin - Unique Characters in String

My function should return a boolean indicating whether the input String contains all unique characters.
e.g.
"abc" returns true, "abca" returns false
fun uniqueCharacters(s: String): Boolean = s.groupBy { it }
.values
.stream()
.allMatch { it.size == 1 }
Is there a more efficient way of solving this problem? If I was solving this in non-functional way I would store all the characters in a Map with the value being the count of that character so far, if it is greater than one then break and return false.
Not sure how best to translate this into a functional Kotlin piece of code.
You can use all function and Set::add as predicate for it
fun main() {
println("abc".allUnique()) // true
println("abca".allUnique()) // false
}
fun String.allUnique(): Boolean = all(hashSetOf<Char>()::add)
It's lazy, the function returns the result when it finds the first duplicate
Perhaps the simplest way is to create a Set of the characters, and check its size:
fun String.isUniqueCharacters() = toSet().size == length
(Since this function depends only on the contents of the string, it seems logical to make it an extension function; that makes it easier to call, too.)
As for performance, it's effectively creating a hash table of the characters, and then checking number of entries, which is the number of unique characters.  So that's not trivial.  But I can't think of a way which is significantly better.
Other approaches might include:
Copying the characters to an array, sorting it in-place, and then scanning it comparing adjacent elements.  That would save some memory allocation, but needs more processing.
As above, but using a hand-coded sort algorithm that spots duplicates and returns early. That would reduce the processing in cases where there are duplicates, but at the cost of much more coding.  (And the hand-coded sort would probably be slower than a library sort when there aren't duplicates.)
Creating an array of 65536 booleans (one for every possible Char value*), all initialised to false, and then scanning through each character in the string checking the corresponding array value (returning false if it was already set, else setting it).  That would probably be the fastest approach, but takes a lot of memory.  (And the cost of initialising the array could be significant.)
As always, it comes down to trading off memory, processing, and coding effort.
(* There are of course many more characters than this in Unicode, but Kotlin uses UTF-16 internally, so 65536 is all we need.)
Yet another approach
fun String.uniqueCharacters(): Boolean = this.toCharArray().distinct().isNotEmpty()

RenderScript Variable types and Element types, simple example

I clearly see the need to deepen my knowledge in RenderScript memory allocation and data types (I'm still confused about the sheer number of data types and finding the correct corresponding types on either side - allocations and elements. (or when to refer the forEach to input, to output or to both, etc.) Therefore I will read and re-read the documentation, which is really not bad - but it needs some time to get the necessary "intuition" how to use it correctly. But for now, please help me with this basic one (and I will return later with hopefully less stupid questions...). I need a very simple kernel that takes an ARGB Color Bitmap and returns an integer Array of gray-values. My attempt was the following:
#pragma version(1)
#pragma rs java_package_name(com.example.xxxx)
#pragma rs_fp_relaxed
uint __attribute__((kernel)) grauInt(uchar4 in) {
uint gr= (uint) (0.2125*in.r + 0.7154*in.g + 0.0721*in.b);
return gr;
}
and Java side:
int[] data1 = new int[width*height];
ScriptC_gray graysc;
graysc=new ScriptC_gray(rs);
Type.Builder TypeOut = new Type.Builder(rs, Element.U8(rs));
TypeOut.setX(width).setY(height);
Allocation outAlloc = Allocation.createTyped(rs, TypeOut.create());
Allocation inAlloc = Allocation.createFromBitmap(rs, bmpfoto1,
Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
graysc.forEach_grauInt(inAlloc, outAlloc);
outAlloc.copyTo(data1);
This crashed with the message cannot locate symbol "convert_uint". What's wrong with this conversion? Is the code otherwise correct?
UPDATE: isn't that ridiculous? I don't get this "easy one" run, even after 2 hours trying. I still struggle with the different Element- and variable-types. Let's recap: Input is a Bitmap. Output is an int[] Array. So, why doesnt it work when I use U8 in the Java-side Out-allocation, createFromBitmap in the Java-side In-allocation, uchar4 as kernel Input and uint as the kernel Output (RSRuntimeException: Type mismatch with U32) ?
There is no convert_uint() function. How about simple casting? Other than that, the code looks alright (assuming width and height have correct values).
UPDATE: I have just noticed that you allocate Element.I32 (i.e. signed integer type), but return uint from the kernel. These should match. And in any case, unless you need more than 8-bit precision, you should be able to fit your result in U8.
UPDATE: If you are changing the output type, make sure you change it in all places, e.g. if the kernel returns an uint, the allocation should use U32. If the kernel returns a char, the allocation should use I8. And so on...
You can't use a Uint[] directly because the input Bitmap is actually 2-dimensional. Can you create the output Allocation with a proper width/height and try that? You should still be able to extract the values into a Java array when you are finished.

Create Managed Array with long/size_t length

Jumping straight to code, this is what I would like to do:
size_t len = obj->someLengthFunctionThatReturnsTypeSizeT();
array<int>^ a = gcnew array<int>(len);
When I try this, I get the error
conversion from size_t to int, possible loss of data
Is there a way I can get this code to compile without explicitly casting to int? I find it odd that I can't initialize an array to this size, especially because there is a LongLength property (and how could you get a length as a long - bigger than int - if you can only initialize a length as an int?).
Thanks!
P.S.: I did find this article that says that it may be impractical to allocate an array that is truly size_t, but I don't think that is a concern. The point is that the length I would like to initialize to is stored in a size_t variable.
Managed arrays are implemented for using Int32 as indices, there is no way around that. You cannot allocate arrays larger than Int32.MaxValue.
You could use the static method Array::CreateInstance (the overload that takes a Type and an array of Int64), and then cast the resulting System::Array to the appropriate actual array type (e.g. array<int>^). Note that the passed values must not be larger than Int32.MaxValue. And you would still need to cast.
So you have at least two options. Either casting:
// Would truncate the value if it is too large
array<int>^ a = gcnew array<int>((int)len);
or this (no need to cast len, but the result of CreateInstance):
// Throws an ArgumentOutOfRangeException if len is too large
array<int>^ a = (array<int>^)Array::CreateInstance(int::typeid, len);
Personally, i find the first better. You still might want to check the actual size of len so that you don't run into any of the mentioned errors.

passing primitive or struct type as function argument

I'm trying to write some reasonably generic networking code. I have several kinds of packets, each represented by a different struct. The function where all my sending occurs looks like:
- (void)sendUpdatePacket:(MyPacketType)packet{
for(NSNetService *service in _services)
for(NSData *address in [service addresses])
sendto(_socket, &packet, sizeof(packet), 0, [address bytes], [address length]);
}
I would really like to be able to send this function ANY kind of packet, not just MyPacketType packets.
I thought maybe if the function def was:
- (void)sendUpdatePacket:(void*)packetRef
I could pass in anykind of pointer to packet. But, without knowing the type of packet, I can't dereference the pointer.
How do I write a function to accept any kind of primitive/struct as its argument?
What you are trying to achieve is polymorphism, which is an OO concept.
So while this would be quite easy to implement in C++ (or other OO languages), it's a bit more challenging in C.
One way you could get around is it to create a generic "packet" structure such as this:
typedef struct {
void* messageHandler;
int messageLength;
int* messageData;
} packet;
Where the messageHandler member is a function pointer to a callback routine which can process the message type, and the messageLength and messageData members are fairly self-explanatory.
The idea is that the method which you pass the packetStruct to would use the Tell, Don't Ask principle to invoke the specific message handler pointer to by messageHandler, passing in the messageLength and messageData without interpreting it.
The dispatch function (pointed to by messageHandler) would be message-specific and will be able to cast the messageData to the appropriate meaningful type, and then the meaningful fields can be extracted from it and processed, etc.
Of course, this is all much easier and more elegant in C++ with inheritance, virtual methods and the like.
Edit:
In response to the comment:
I'm a little unclear how "able to cast
the messageData to the appropriate
meaningful type, and then the
meaningful fields can be extracted
from it and processed, etc." would be
accomplished.
You would implement a handler for a specific message type, and set the messageHandler member to be a function pointer to this handler. For example:
void messageAlphaHandler(int messageLength, int* messageData)
{
MessageAlpha* myMessage = (MessageAlpha*)messageData;
// Can now use MessageAlpha members...
int messageField = myMessage->field1;
// etc...
}
You would define messageAlphaHandler() in such a way to allow any class to get a function pointer to it easily. You could do this on startup of the application so that the message handlers are registered from the beginning.
Note that for this system to work, all message handlers would need to share the same function signature (i.e. return type and parameters).
Or for that matter, how messageData
would be created in the first place
from my struct.
How are you getting you packet data? Are you creating it manually, reading it off a socket? Either way, you need to encode it somewhere as a string of bytes. The int* member (messageData) is merely a pointer to the start of the encoded data. The messageLength member is the length of this encoded data.
In your message handler callback, you don't want probably don't want to continue to manipulate the data as raw binary/hex data, but instead interpret the information in a meaningful fashion according to the message type.
Casting it to a struct essentially maps the raw binary information on to a meaningful set of attributes matching to the protocol of the message you are processing.
The key is that you must realize that everything in a computer is just an array of bytes (or, words, or double words).
ZEN MASTER MUSTARD is sitting at his desk staring at his monitor staring at a complex pattern of seemingly random characters. A STUDENT approaches.
Student: Master? May I interrupt?
Zen Master Mustard: You have answered your own inquiry, my son.
S: What?
ZMM: By asking your question about interrupting me, you have interrupted me.
S: Oh, sorry. I have a question about moving structures of varying size from place to place.
ZMM: If that it true, then you should consult a master who excels at such things. I suggest, you pay a visit to Master DotPuft, who has great knowledge in moving large metal structures, such as tracking radars, from place to place. Master DotPuft can also cause the slightest elements of a feather-weight strain gage to move with the force of a dove's breath. Turn right, then turn left when you reach the door of the hi-bay. There dwells Master DotPuft.
S: No, I mean moving large structures of varying sizes from place to place in the memory of a computer.
ZMM: I may assist you in that endeavor, if you wish. Describe your problem.
S: Specifically, I have a c function that I want to accept several different types of structs (they will be representing different type of packets). So my struct packets will be passed to my function as void*. But without knowing the type, I can't cast them, or really do much of anything. I know this is a solvable problem, because sento() from socket.h does exactly that:
ssize_t sendto(int socket, const void *message, size_t length, int flags, const struct sockaddr *dest_addr,socklen_t dest_len);
where sendto would be called like:
sendto(socketAddress, &myPacket, sizeof(myPacket), Other args....);
ZMM: Did you describe your problem to Zen Master MANTAR! ?
S: Yeah, he said, "It's just a pointer. Everything in C is a pointer." When I asked him to explain, he said, "Bok, bok, get the hell out of my office."
ZMM: Truly, you have spoken to the master. Did this not help you?
S: Um, er, no. Then I asked Zen Master Max.
ZMM: Wise is he. What was his advice to you useful?
S: No. When I asked him about sendto(), he just swirled his fists in the air. It's just an array of bytes."
ZMM: Indeed, Zen Master Max has tau.
S: Yeah, he has tau, but how do I deal with function arguments of type void*?
ZMM: To learn, you must first unlearn. The key is that you must realize that everything in a computer is just an array of bytes (or, words, or double words). Once you have a pointer to the beginning of a buffer, and the length of the buffer, you can sent it anywhere without a need to know the type of data placed in the buffer.
S: OK.
ZMM: Consider a string of man-readable text. "You plan a tower that will pierce the clouds? Lay first the foundation of humility." It is 82 bytes long. Or, perhaps, 164 if the evil Unicode is used. Guard yourself against the lies of Unicode! I can submit this text to sendto() by providing a pointer to the beginning of the buffer that contains the string, and the length of the buffer, like so:
char characterBuffer[300]; // 300 bytes
strcpy(characterBuffer, "You plan a tower that will pierce the clouds? Lay first the foundation of humility.");
// note that sizeof(characterBuffer) evaluates to 300 bytes.
sendto(socketAddress, &characterBuffer, sizeof(characterBuffer));
ZMM: Note well that the number of bytes of the character buffer is automatically calculated by the compiler. The number of bytes occupied by any variable type is of a type called "size_t". It is likely equivalent to the type "long" or "unsinged int", but it is compiler dependent.
S: Well, what if I want to send a struct?
ZMM: Let us send a struct, then.
struct
{
int integerField; // 4 bytes
char characterField[300]; // 300 bytes
float floatField; // 4 bytes
} myStruct;
myStruct.integerField = 8765309;
strcpy(myStruct.characterField, "Jenny, I got your number.");
myStruct.floatField = 876.5309;
// sizeof(myStruct) evaluates to 4 + 300 + 4 = 308 bytes
sendto(socketAddress, &myStruct, sizeof(myStruct);
S: Yeah, that's great at transmitting things over TCP/IP sockets. But what about the poor receiving function? How can it tell if I am sending a character array or a struct?
ZMM: One way is to enumerate the different types of data that may be sent, and then send the type of data along with the data. Zen Masters refer to this as "metadata", that is to say, "data about the data". Your receiving function must examine the metadata to determine what kind of data (struct, float, character array) is being sent, and then use this information to cast the data back into its original type. First, consider the transmitting function:
enum
{
INTEGER_IN_THE_PACKET =0 ,
STRING_IN_THE_PACKET =1,
STRUCT_IN_THE_PACKET=2
} typeBeingSent;
struct
{
typeBeingSent dataType;
char data[4096];
} Packet_struct;
Packet_struct myPacket;
myPacket.dataType = STRING_IN_THE_PACKET;
strcpy(myPacket.data, "Nothing great is ever achieved without much enduring.");
sendto(socketAddress, myPacket, sizeof(Packet_struct);
myPacket.dataType = STRUCT_IN_THE_PACKET;
memcpy(myPacket.data, (void*)&myStruct, sizeof(myStruct);
sendto(socketAddress, myPacket, sizeof(Packet_struct);
S: All right.
ZMM: Now, just us walk along with the receiving function. It must query the type of the data that was sent and the copy the data into a variable declared of that type. Forgive me, but I forget the exact for of the recvfrom() function.
char[300] receivedString;
struct myStruct receivedStruct;
recvfrom(socketDescriptor, myPacket, sizeof(myPacket);
switch(myPacket.dataType)
{
case STRING_IN_THE_PACKET:
// note the cast of the void* data into type "character pointer"
&receivedString[0] = (char*)&myPacket.data;
printf("The string in the packet was \"%s\".\n", receivedString);
break;
case STRUCT_IN_THE_PACKET:
// note the case of the void* into type "pointer to myStruct"
memcpy(receivedStruct, (struct myStruct *)&myPacket.data, sizeof(receivedStruct));
break;
}
ZMM: Have you achieved enlightenment? First, one asks the compiler for the size of the data (a.k.a. the number of bytes) to be submitted to sendto(). You send the type of the original data is sent along as well. The receiver then queries for the type of the original data, and uses it to call the correct cast from "pointer to void" (a generic pointer), over to the type of the original data (int, char[], a struct, etc.)
S: Well, I'll give it a try.
ZMM: Go in peace.