How one deals with multiple pointer level (like char**) in Squeak FFI - smalltalk

I want to deal with a structure like this struct foo {char *name; char **fields ; size_t nfields};
If I define corresponding structure in Squeak
ExternalStructure subclass: #Foo
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'FFI-Tests'.
and define the fields naively with
Foo class>fields
^#(
(name 'char*')
(fields 'char**')
(nfields 'unsigned long')
)
then generate the accessors with Foo defineFields, I get those undifferentiated types for name and fields:
Foo>>name
^ExternalData fromHandle: (handle pointerAt: 1) type: ExternalType char asPointerType
Foo>>fields
^ExternalData fromHandle: (handle pointerAt: 5) type: ExternalType char asPointerType
That is troubling, the second indirection is missing for the fields accessor.
How should I specify fields accessor in the spec?
If not possible, how do I define it manually?
And I have the same problem for this HDF5 function prototype: int H5Tget_array_dims(hid_t tid, hsize_t *dims[])
The following syntax is not accepted:
H5Tget_array_dims: tid with: dims
<cdecl: long 'H5Tget_array_dims'(Hid_t Hsize_t * * )>
The compiler barks argument expected -> before the second *...
I add to resort to void * instead, that is totally bypassing typechecking - less than ideal...
Any idea how to deal correctly with such prototype?

Since Compiler-mt.435, the parser will not complain anymore but call back to ExternalType>>asPointerToPointerType. See source.squeak.org/trunk/Compiler-mt.435.diff and source.squeak.org/FFI/FFI-Kernel-mt.96.diff
At the time of writing this, such pointer-to-pointer type will be treated as regular pointer type. So, you loose the information that the external type actually points to an array of pointers.
When would one need that information?
When coercing arguments in the FFI plugin during the call
When constructing the returned object in the FFI plugin during the call
When interpreting instances of ExternalData from struct fields and FFI call return values
In tools such as the object explorer
There already several kinds of RawBitsArray in Squeak. Adding String and ExternalStructure (incl. packed or union) to the mix, we have all kinds of objects in Squeak to map the inner-most dimension (i.e., int*, char*, void*). ExternalData can represent the other levels of the multi-dimensional array (i.e., int**, char**, void** and so on).
So, there are remaining tasks here:
Store that pointer dimension information maybe in a new external type to be found via ExternalType>>referencedType. We may want to put new information into compiledSpec. See http://forum.world.st/FFI-Plugin-Question-about-multi-dimensional-arrays-e-g-char-int-void-td5118484.html
Update value reading in ExternalArray to unwrap one pointer after the other; and let the code generator for struct-field accessors generate code in a similar fashion.
Extend argument coercing in the plugin to accept arrays of the already supported arrays (i.e. String etc.)

Related

What is ${type}Var in Kotlin/Native?

This documentation is very unclear for me when it tries to say what is ${type}Var.
...for Kotlin enums it is named ${type}Var
wat?! What is Kotlin enums? Regular Kotlin enums?
enum class MyEnum {
FIRST, SECOND
}
I don't think it implied.
Okay, Let's look at the examples in this documentation:
struct S* is mapped to CPointer<S> , int8_t* is mapped to CPointer<int_8tVar>
Okay, it's clear
char** is mapped to CPointer<CPointerVar<ByteVar>>
Why is char** mapped to CPointer<CPointerVar<ByteVar>> but not to CPointer<CPointer<Byte>>?
So finally the question is: what is IntVar, LongVar, CPointerVar<T> and other things like ${type}Var?
You should read the whole paragraph again carefully.
All the supported C types have corresponding representations in Kotlin:
Enums can be mapped to Kotlin enum
Also in C there are lvalues and rvalues (In C++ the equivivalent is Type & for lvalues and Type for rvalues). The main distinguish is that lvalues can be set to some value, while rvalues can't be changed after the initialization. So for each type in C you need it's own Kotlin type for lvalue and for rvalue.
In the topic
All the supported C types have corresponding representations in Kotlin:
only rvalues are considered.
But for lvalues the only thing you need to add is Var to end of type. The only exception is
For structs (and typedefs to structs) this representation is the main one and has the same name as the struct itself
Now let's return to enums. The regular Kotlin enums are mapped to the regular C enums. So actually FIRST and SECOND have type MyEnum in both languages. But what if you want to create a variable containing MyEnum for example:
// This is C Code
MyEnum a = FIRST;
a has type MyEnum in C, but it's lvalue (in C++ that's MyEnum &), so in Kotlin a will have type MyEnumVar because that's exactly what is said in documentation: ${type}Var, where ${type} = MyEnum.
To the next questions:
The type argument T of CPointer must be one of the "lvalue" types
So for struct S* it should be CPointer<SVar>, but remember that structs are exceptions and we shouldn't add Var, so that's just CPointer<S>.
int8_t* is CPointer<int_8tVar> - no exception here.
char* is CPointer<ByteVar> - again no exception (only lvalue types, except for structs).
char** is CPointer<CPointerVar<ByteVar>> as we need lvalue for CPointer<ByteVar> and that's exactly CPointerVar<ByteVar>.
Finally:
IntVar, LongVar, CPointerVar<T> and other things are lvalues of types int, long, CPointer. That may be needed if you want to change the object in the function. Something like Ref<${type}> in Java.
what is IntVar, LongVar, CPointerVar<T> and other things like ${type}Var?
That's in the beginning of the sentence the end of which you quoted:
the Kotlin type representing the lvalue of this type, i.e., the value located in memory rather than a simple immutable self-contained value
"located in memory" means that you can take their address (using & operator in C, or .ptr in Kotlin).
wat?! What is Kotlin enums? Regular Kotlin enums?
Yes, so when Kotlin/Native sees MyEnum, it also generates MyEnumVar.
Why is char** mapped to CPointer<CPointerVar<ByteVar>> but not to CPointer<CPointer<Byte>>?
CPointer<CPointer<Byte>> is illegal: CPointer's type parameter must extend CPointed, and Byte and CPointer<T> don't. And the reason they need to extend CPointed is because dereferencing a pointer gives an lvalue: something that has an address!
See https://learn.microsoft.com/en-us/cpp/c-language/l-value-and-r-value-expressions or https://eli.thegreenplace.net/2011/12/15/understanding-lvalues-and-rvalues-in-c-and-c/ for more about lvalues in C (and C++).

Smalltalk: How primitives are implemented?

I know that everything is an object and you send messages to objects in Smalltalk to do almost everything.
Now how can we implement an object (memory representation and basic operations) to represent a primitive data type? For example how + for integers is implemented?
I looked at the source code for Smalltalk and found this in Smallint.st. Can someone explain this piece of code?
+ arg [
"Sum the receiver and arg and answer another Number"
<category: 'built ins'>
<primitive: VMpr_SmallInteger_plus>
^self generality == arg generality
ifFalse: [self retrySumCoercing: arg]
ifTrue: [(LargeInteger fromInteger: self) + (LargeInteger fromInteger: arg)]
]
Here is the link of above code: https://github.com/gnu-smalltalk/smalltalk/blob/62dab58e5231909c7286f1e61e26c9f503b2b3df/kernel/SmallInt.st
Conceptually speaking primitive methods are pieces of behavior (routines) implemented by the Virtual Machine (VM), not by regular Smalltalk code.
When the Smalltalk compiler finds the statement <primitive: ...> it interprets this as an special type of method whose argument (in your case VMpr_SmallInteger_plus) indicates the integer index of the target routine within the VM.
In this sense a primitive is a global routine not bound to the MethodDictionary of any particular class. The primitive logic is intended for a receiver and arguments of certain classes and that's why it must check that the receiver and the arguments (if any) conform its requirements. If not, the primitive fails and in that case the control flows to the Smalltalk code that follows the <primitive: ...> statement. Otherwise the primitive succeeds and the Smalltalk code below is not executed. Note also that the compiler will not allow for any Smalltalk code other than temporary declaration occurring above the <primitive:...> sentence.
In your example, if the argument arg is not of the expected class (presumably a SmallInteger) the routine gives up trying to sum it to the receiver and delegates the resolution of the operation to the Smalltalk code.
If the argument happens to be a SmallInteger, the primitive will compute the result (using the routine held in the VM) and answer with it.
I haven't seen the code of this primitive but it could also happen that the primitive fails if the result of the sum does not fit in a SmallInteger, in which case both the receiver and the argument would be cast to LargeIntegers and the addition would take place in the #+ method of the appropriate class (LargePositiveInteger or LargeNegativeInteger).
The other branch of the Smalltalk code allows for the implementation of a polymorphic sum between a SmallInteger and any other type of object. For instance this part of the Smalltalk code would take place if you evaluate 3 + 4.0 because in this case the argument is a Float. Something similar happens if you evaluate 3 + (4 / 3), etc.

Does fortran permit inline operations on the return value of a function?

I am trying to design a data structure composed of objects which contain, as instance variables, objects of another type.
I'd like to be able to do something like this:
CALL type1_object%get_nested_type2_object()%some_type2_method()
Notice I am trying to immediately use the getter, get_nested_type2_object() and then act on its return value to call a method in the returned type2 object.
As it stands, gfortran v4.8.2 does not accept this syntax and thinks get_nested_type2_object() is an array reference, not a function call. Is there any syntax that I can use to clarify this or does the standard not allow this?
To give a more concrete example, here is some code illustrating this:
furniture_class.F95:
MODULE furniture_class
IMPLICIT NONE
TYPE furniture_object
INTEGER :: length
INTEGER :: width
INTEGER :: height
CONTAINS
PROCEDURE :: get_length
END TYPE furniture_object
CONTAINS
FUNCTION get_length(self)
IMPLICIT NONE
CLASS(furniture_object) :: self
INTEGER :: get_length
get_length = self%length
END FUNCTION
END MODULE furniture_class
Now a room object may contain one or more furniture objects.
room_class.F95:
MODULE room_class
USE furniture_class
IMPLICIT NONE
TYPE :: room_object
CLASS(furniture_object), POINTER :: furniture
CONTAINS
PROCEDURE :: get_furniture
END TYPE room_object
CONTAINS
FUNCTION get_furniture(self)
USE furniture_class
IMPLICIT NONE
CLASS(room_object) :: self
CLASS(furniture_object), POINTER :: get_furniture
get_furniture => self%furniture
END FUNCTION get_furniture
END MODULE room_class
Finally, here is a program where I attempt to access the furniture object inside the room (but the compiler won't let me):
room_test.F95
PROGRAM room_test
USE room_class
USE furniture_class
IMPLICIT NONE
CLASS(room_object), POINTER :: room_pointer
CLASS(furniture_object), POINTER :: furniture_pointer
ALLOCATE(room_pointer)
ALLOCATE(furniture_pointer)
room_pointer%furniture => furniture_pointer
furniture_pointer%length = 10
! WRITE(*,*) 'The length of furniture in the room is', room_pointer%furniture%get_length() - This works.
WRITE(*,*) 'The length of furniture in the room is', room_pointer%get_furniture()%get_length() ! This line fails to compile
END PROGRAM room_test
I can of course directly access the furniture object if I don't use a getter to return the nested object, but this ruins the encapsulation and can become problematic in production code that is much more complex than what I show here.
Is what I am trying to do not supported by the Fortran standard or do I just need a more compliant compiler?
What you want to do is not supported by the syntax of the standard language.
(Variations on the general syntax (not necessarily this specific case) that might apply for "dereferencing" a function result could be ambiguous - consider things like substrings, whole array references, array sections, etc.)
Typically you [pointer] assign the result of the first function call to a [pointer] variable of the appropriate type, and then apply the binding for the second function to that variable.
Alternatively, if you want to apply an operation to a primary in an expression (such as a function reference) to give another value, then you could use an operator.
Some, perhaps rather subjective, comments:
Your room object doesn't really contain a furniture object - it holds a reference to a furniture object. Perhaps you use that reference in a manner that implies the parent object "containing" it, but that's not what the component definition naturally suggests.
(Use of a pointer component suggests that you want the room to point at (i.e. reference) some furniture. In terms of the language, the object referenced by a pointer component is not usually considered part of the value of the parent object of the component - consider how intrinsic assignment works, restrictions around modifying INTENT(IN) arguments, etc.
A non-pointer component suggests to me that the furniture is part of the room. In a Fortran language sense an object that is a non-pointer component it is always part of the value of the parent object of the component.
To highlight - pointer components in different rooms could potentially point at the same piece of furniture; a non-pointer furniture object is only ever directly part of one room.)
You need to be very careful using functions with pointer results. In the general case, is it:
p = some_ptr_function(args)
(and perhaps I accidentally leak memory) or
p => some_ptr_function(args)
Only one little character difference, both valid syntax, quite different semantics. If the second case is what is intended, then why not just pass the pointer back via a subroutine argument? An inconsequential difference in typing and it is much safer.
A general reminder applicable to some of the above - in the context of an expression, evaluation of a function reference yields a value. Values are not variables and hence you are not permitted to vary [modify] them.

How does Julia recognize values as singleton types?

It is a cool feature of Julia that values can be used as types, at least as type parameters. For example, one can assert that arrays are of a particular dimensionality, such as x :: Array{Int,2}. My question is: how does Julia do that and how do users of Julia get access to that power? I assume that 2 is being converted to or interpreted as some sort of singleton type of 2. I am curious to know what function does that conversion. I tried to assert 2 :: Type{2} and isa(2, Type{2}), but that only asserts a singleton if 2 is replaced by an actual type.
You can not define your own imutables and use them as singleton types (yet).
Currently anything that makes static int valid_type_param(jl_value_t *v) defined in jltypes.c return true, can be used as a type parameter. There is a TODO to add more types, and you'll probably just need a compelling usecase to get help to change the behaviour.
Update:
See also the manual documentation on types: Both abstract and concrete types can be paramaterized by other types and by certain other values (currently integers, symbols, bools, and tuples thereof). Type parameters may be completely omitted when they do not need to be referenced or restricted.

Does static typing mean that you have to cast a variable if you want to change its type?

Are there any other ways of changing a variable's type in a statically typed language like Java and C++, except 'casting'?
I'm trying to figure out what the main difference is in practical terms between dynamic and static typing and keep finding very academic definitions. I'm wondering what it means in terms of what my code looks like.
Make sure you don't get static vs. dynamic typing confused with strong vs. weak typing.
Static typing: Each variable, method parameter, return type etc. has a type known at compile time, either declared or inferred.
Dynamic typing: types are ignored/don't exist at compile time
Strong typing: each object at runtime has a specific type, and you can only perform those operations on it that are defined for that type.
Weak typing: runtime objects either don't have an explicit type, or the system attempts to automatically convert types wherever necessary.
These two opposites can be combined freely:
Java is statically and strongly typed
C is statically and weakly typed (pointer arithmetics!)
Ruby is dynamically and strongly typed
JavaScript is dynamically and weakly typed
Genrally, static typing means that a lot of errors are caught by the compiler which are runtime errors in a dynamically typed language - but it also means that you spend a lot of time worrying about types, in many cases unnecessarily (see interfaces vs. duck typing).
Strong typing means that any conversion between types must be explicit, either through a cast or through the use of conversion methods (e.g. parsing a string into an integer). This means more typing work, but has the advantage of keeping you in control of things, whereas weak typing often results in confusion when the system does some obscure implicit conversion that leaves you with a completely wrong variable value that causes havoc ten method calls down the line.
In C++/Java you can't change the type of a variable.
Static typing: A variable has one type assigned at compile type and that does not change.
Dynamic typing: A variable's type can change while runtime, e.g. in JavaScript:
js> x="5" <-- String
5
js> x=x*5 <-- Int
25
The main difference is that in dynamically typed languages you don't know until you go to use a method at runtime whether that method exists. In statically typed languages the check is made at compile time and the compilation fails if the method doesn't exist.
I'm wondering what it means in terms of what my code looks like.
The type system does not necessarily have any impact on what code looks like, e.g. languages with static typing, type inference and implicit conversion (like Scala for instance) look a lot like dynamically typed languages. See also: What To Know Before Debating Type Systems.
You don't need explicit casting. In many cases implicit casting works.
For example:
int i = 42;
float f = i; // f ~= 42.0
int b = f; // i == 42
class Base {
};
class Subclass : public Base {
};
Subclass *subclass = new Subclass();
Base *base = subclass; // Legal
Subclass *s = dynamic_cast<Subclass *>(base); // == subclass. Performs type checking. If base isn't a Subclass, NULL is returned instead. (This is type-safe explicit casting.)
You cannot, however, change the type of a variable. You can use unions in C++, though, to achieve some sort of dynamic typing.
Lets look at Java for he staitically typed language and JavaScript for the dynamc. In Java, for objects, the variable is a reference to an object. The object has a runtime type and the reference has a type. The type of the reference must be the type of the runtime object or one of its ancestors. This is how polymorphism works. You have to cast to go up the hierarchy of the reference type, but not down. The compiler ensures that these conditions are met. In a language like JavaScript, your variable is just that, a variable. You can have it point to whatever object you want, and you don't know the type of it until you check.
For conversions, though, there are lots of methods like toInteger and toFloat in Java to do a conversion and generate an object of a new type with the same relative value. In JavaScript there are also conversion methods, but they generate new objects too.
Your code should actally not look very much different, regardless if you are using a staticly typed language or not. Just because you can change the data type of a variable in a dynamically typed language, doesn't mean that it is a good idea to do so.
In VBScript, for example, hungarian notation is often used to specify the preferred data type of a variable. That way you can easily spot if the code is mixing types. (This was not the original use of hungarian notation, but it's pretty useful.)
By keeping to the same data type, you avoid situations where it's hard to tell what the code actually does, and situations where the code simply doesn't work properly. For example:
Dim id
id = Request.QueryString("id") ' this variable is now a string
If id = "42" Then
id = 142 ' sometimes turned into a number
End If
If id > 100 Then ' will not work properly for strings
Using hungarian notation you can spot code that is mixing types, like:
lngId = Request.QueryString("id") ' putting a string in a numeric variable
strId = 42 ' putting a number in a string variable