Determine whether a String is a compile-time constant - jvm

Given a reference to any String, is it possible to programmatically determine whether this is a reference to a compile time constant?
Or if it's not, then whether it's stored in the intern pool without doing s.intern() == s?
isConst("foo") -> true
isConst("foo" + "bar") -> true // 2 literals, 1 compile time string
isConst(SomeClass.SOME_CONST_STRING) -> true
isConst(readFromFile()) -> false
isConst(readFromFile().intern()) -> false // true would be acceptable too
(context for comments below: the question originally asked about literals)

To clarify the original question, every string literal is a compile-time constant, but not every compile-time constant has to originate from a string literal.
At runtime, there is no difference between a String object that has been constructed for a compile-time constant or constructed by other means. Strings constructed for compile-time constants are automatically added to a pool, but other strings may be added to the same pool manually via intern(). Since strings are constructed and added lazily, it is even possible to construct and add a string manually, so that compile-time constants with the same value get resolved to that string later-on. This answer exploits this possibility, to detect when the String instance for a compile-time constant is actually resolved.
It’s possible to derive from that answer a method to simply detect whether a string is in the pool or not:
public static boolean isInPool(String s) {
return s == new String(s.toCharArray()).intern();
}
new String(s.toCharArray()) constructs a string with the same contents, which is not in the pool and calling intern() on it must resolve to the same reference as s if s refers to an instance in the pool. Otherwise, intern() may resolve to another existing object or add our string or a newly constructed string and return a reference to it, depending on the implementation, but in either case, the returned reference will be different to s.
Note that this method has the side effect of adding a string to the pool if it wasn’t there before, which will stay there at least to the next garbage collection cycle, perhaps up to the next full gc, depending on the implementation.
The test method might be nice for debugging or satisfying curiosity, but there is no point in ever using it in production code. Application code should not depend on that property and the use case proposed in a comment, enforcing pooled strings in performance critical code, is not a good idea.
Besides the point that the test itself is expensive and counteracting the purpose of performance improvement, the underlying assumption that pooled strings are better than non-pooled is flawed. Not being in the pool doesn’t imply that the application will perform an expensive reconstruction every time it invokes the performance critical code. It may simply hold a reference in a variable or use a HashMap, both approaches way more efficient than calling intern(). In fact, even temporary strings can be the most efficient solution in some cases.

Related

Fortran Functions with a pointer result in a normal assignment

After some discussion on the question found here Correct execution of Final routine in Fortran
I thought it will be useful to know when a function with a pointer result is appropriate to use with a normal or a pointer assignment. For example, given this simple function
function pointer_result(this)
implicit none
type(test_type),intent(in) pointer :: this
type(test_type), pointer :: pointer_result
allocate(pointer_result)
end function
I would normally do test=>pointer_result(test), where test has been declared with the pointer attribute. While the normal assignment test=pointer_result(test) is legal it means something different.
What does the normal assignment imply compared to the pointer assignment?
When does it make sense to use one or the other assignment?
A normal assignment
test = pointer_result()
means that the value of the current target of test will be overwritten by the value pointed to by the resulting pointer. If test points to some invalid address (is undefined or null) the program will crash or produce undefined results. The anonymous target allocated by the function will have no pointer to it any more and the memory will be leaked.
There is hardly any legitimate use for this, but it is likely to happen when one makes a typo and writes = instead of =>. It is a very easy one to make and several style guides recommend to never use pointer functions.

Int::class.javaPrimitiveType.kotlin reference not equal to Int::class.javaObjectType.kotlin

I think that CASE 2 should also return true. Is this behavior correct?
// CASE 1
Int::class.javaPrimitiveType!!.kotlin == Int::class.javaObjectType.kotlin // true
// CASE 2
Int::class.javaPrimitiveType!!.kotlin === Int::class.javaObjectType.kotlin // false
This behavior is correct. KClass instances for a primitive type and the corresponding object type are equal (==), however they're created from different java.lang.Class instances and since .java always returns the original Class instance the KClass was constructed from, it wouldn't be possible for them to also be identical (===).
Short answer: yes.
Long answer: of course it’s hard to tell what the intended behaviour should be as nobody of us was involved in making that decision, or writing that code. However, I don’t think that it’s really a requirement that these two objects are in fact the same object; equality is sufficient, reference equality is not required here.

Do repeated objective-c string literals use more run time memory

Putting aside good programming practises. Ill give context after.
With respect to Objective-C string literals #"foobar"
Does this structure...
NSString *kFoobar = #"foobar";
[thing1 setValue:xyz forKey:kFoobar];
[thing2 setValue:abc forKey:kFoobar];
[thing3 setValue:def forKey:kFoobar];
[thing4 setValue:ghi forKey:kFoobar];
Use more runtime memory than this structure...
[thing1 setValue:xyz forKey:#"foobar"];
[thing2 setValue:abc forKey:#"foobar"];
[thing3 setValue:def forKey:#"foobar"];
[thing4 setValue:ghi forKey:#"foobar"];
Or does the compiler sort things out and merge all instances of #"foobar" into a single reference in the TEXT section
Context...
I have inherited a large amount of source code in which most keys are expressed as string literals rather than string constants. Its not mine and the owner isn't going to pay for nice to have. Is there any point to spending time on constantifying the strings from a runtime view.
I did pass the exe through strings and it appears as if the compiler does the heavy lifting but I'm not sure.
The two are, for all intents and purposes, identical. Only one instance of a given literal string is created per compilation unit. (And, in fact, in some cases even less, since the system will attempt to combine them.)
The var kFoobar used in the first example would, if a local var, be a temporary which may never be more than a register. At most it would occupy 8 bytes in the stack frame that goes away on method exit. And the compiler would likely load a temp to point to the literal anyway, for the second case. So the code for the two examples could actually be identical.
If kFoobar were some sort of instance or global var then the pointer var itself it would of course occupy instance or global space, but it would have no other effect.
And the NSMutableDictionary does not need to make a local copy of the string (when it's used as a key) because NSString is immutable. The single copy is shared by all referencing objects.

Is it bad practice to use a function in an if statement?

I have a function in another class file that gets information about the battery. In a form I have the following code:
If BatteryClass.getStatus_Battery_Charging = True Then
It appears Visual Studio accepts this. However, would it be better if I used the following code, which also works:
dim val as boolean = BatteryClass.getStatus_Battery_Charging
If val = True Then
Is there a difference between these two methods?
What you're asking in general is which approach is idiomatic.
The technical rule is not to invoke a method multiple times - unless you're specifically checking a volatile value for change - when its result can be preserved in a locally scoped variable. That's not what your asking but its important to understand that multiple calls should typically be bound to a variable.
That being said its better to produce less lines of code from a maintenance perspective as long as doing so improves the readability of your code. If you do have to declare a locally scoped variable to hold the return value of a method make sure to give the variable a meaningful name.
Prefer this [idiomatic VB.NET] one liner:
If BatteryClass.getStatus_Battery_Charging Then
over this:
Dim isBatteryCharging As Boolean = BatteryClass.getStatus_Battery_Charging
If isBatteryCharging Then
Another point you should concern yourself with are methods, which when invoked, create a side effect that affects the state of your program. In most circumstances it is undesirable to have a side effect causing method invoked multiple times - when possible rewrite such side affecting methods so that they do not cause any side effects. To limit the number of times a side effect will occur use the same local variable scoping rule instead of performing multiple invocations of the method.
No real difference.
The second is better if you need the value again of course. It's also marginally easier to debug when you have a value stored in a variable.
Personally I tend to use the first because I'm an old school C programmer at heart!

Most appropriate data structure for dynamic languages field access

I'm implementing a dynamic language that will compile to C#, and it's implementing its own reflection API (.NET's is too slow, and the DLR is limited only to more recent and resourceful implementations).
For this, I've implemented a simple .GetField(string f) and .SetField(string f, object val) interface. Until recently, the implementation just switches over all possible field string values and makes the corresponding action.
Also, this dynamic language has the possibility to define anonymous objects. For those anonymous objects, at first, I had implemented a simple hash algorithm.
By now, I am looking for ways to optimize the dynamic parts of the language, and I have come across the fact that a hash algorithm for anonymous objects would be overkill. This is because the objects are usually small. I'd say the objects contain 2 or 3 fields, normally. Very rarely, they would contain more than 15 fields. It would take more time to actually hash the string and perform the lookup than if I would test for equality between them all. (This is not tested, just theoretical).
The first thing I did was to -- at compile-time -- create a red-black tree for each anonymous object declaration and have it laid onto an array so that the object can look for it in a very optimized way.
I am still divided, though, if that's the best way to do this. I could go for a perfect hashing function. Even more radically, I'm thinking about dropping the need for strings and actually work with a struct of 2 longs.
Those two longs will be encoded to support 10 chars (A-za-z0-9_) each, which is mostly a good prediction of the size of the fields. For fields larger than this, a special function (slower) receiving a string will also be provided.
The result will be that strings will be inlined (not references), and their comparisons will be as cheap as a long comparison.
Anyway, it's a little hard to find good information about this kind of optimization, since this is normally thought on a vm-level, not a static language compilation implementation.
Does anyone have any thoughts or tips about the best data structure to handle dynamic calls?
Edit:
For now, I'm really going with the string as long representation and a linear binary tree lookup.
I don't know if this is helpful, but I'll chuck it out in case;
If this is compiling to C#, do you know the complete list of fields at compile time? So as an idea, if your code reads
// dynamic
myObject.foo = "some value";
myObject.bar = 32;
then during the parse, your symbol table can build an int for each field name;
// parsing code
symbols[0] == "foo"
symbols[1] == "bar"
then generate code using arrays or lists;
// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar
and build up reflection as a separate array;
runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0; // Dictionary<string, int>
As I say, thrown out in the hope it'll be useful. No idea if it will!
Since you are likely to be using the same field and method names repeatedly, something like string interning would work well to quickly generate keys for your hash tables. It would also make string equality comparisons constant-time.
For such a small data set (expected upper bounds of 15) I think almost any hashing will be more expensive then a tree or even a list lookup, but that is really dependent on your hashing algorithm.
If you want to use a dictionary/hash then you'll need to make sure the objects you use for the key return a hash code quickly (perhaps a single constant hash code that's built once). If you can prevent collisions inside of an object (sounds pretty doable) then you'll gain the speed and scalability (well for any realistic object/class size) of a hash table.
Something that comes to mind is Ruby's symbols and message passing. I believe Ruby's symbols act as a constant to just a memory reference. So comparison is constant, they are very lite, and you can use symbols like variables (I'm a little hazy on this and don't have a Ruby interpreter on this machine). Ruby's method "calling" really turns into message passing. Something like: obj.func(arg) turns into obj.send(:func, arg) (":func" is the symbol). I would imagine that symbol makes looking up the message handler (as I'll call it) inside the object pretty efficient since it's hash code most likely doesn't need to be calculated like most objects.
Perhaps something similar could be done in .NET.