OCaml module types and separate compilation - module

I am reading through OCaml lead designer's 1994 paper on modules, types, and separate compilation. (kindly pointed to me by Norman Ramsey in another question ). I understand that the paper discusses the origins of OCaml's present module type / signature system. It it, the author proposes opaque interpretation of type declarations in signatures (to allow separate compilation) together with manifest type declarations (for expressiveness). Attempting to put together some examples of my own to demonstrate the kind of problems the OCaml module signature notation is trying to tackle I wrote the following code in two files:
In file ordering.ml (or .mli — I've tried both) (file A):
module type ORDERING = sig
type t
val isLess : t -> t -> bool
end
and in file useOrdering.ml (file B):
open Ordering
module StringOrdering : ORDERING
let main () =
Printf.printf "%b" StringOrdering.isLess "a" "b"
main ()
The idea being to expect the compiler to complain (when compiling the second file) that not enough type information is available on module StringOrdering to typecheck the StringOrdering.isLess application (and thus motivate the need for the with type syntax).
However, although file A compiles as expected, file B causes the 3.11.2 ocamlc to complain for a syntax error. I understood that signatures were meant to allow someone to write code based on the module signature, without access to the implementation (the module structure).
I confess that I am not sure about the syntax: module A : B which I encountered in this rather old paper on separate compilation but it makes me wonder whether such or similar syntax exists (without involving functors) to allow someone to write code based only on the module type, with the actual module structure provided at linking time, similar to how one can use *.h and *.c files in C/C++. Without such an ability it would seem to be that module types / signatures are basically for sealing / hiding the internals of modules or more explicit type checking / annotations but not for separate / independent compilation.
Actually, looking at the OCaml manual section on modules and separate compilation it seems that my analogy with C compilation units is broken because the OCaml manual defines the OCaml compilation unit to be the A.ml and A.mli duo, whereas in C/C++ the .h files are pasted to the compilation unit of any importing .c file.

The right way to do such a thing is to do the following:
In ordering.mli write:
(* This define the signature *)
module type ORDERING = sig
type t
val isLess : t -> t -> bool
end
(* This define a module having ORDERING as signature *)
module StringOrdering : ORDERING
Compile the file: ocamlc -c ordering.mli
In another file, refer to the compiled signature:
open Ordering
let main () =
Printf.printf "%b" (StringOrdering.isLess "a" "b")
let () = main ()
When you compile the file, you get the expected type error (ie. string is not compatible with Ordering.StringOrdering.t). If you want to remove the type error, you should add the with type t = string constraint to the definition of StringOrdering in ordering.mli.
So answer to you second question: yes, in bytecode mode the compiler just needs to know about the interfaces your are depending on, and you can choose which implementation to use at link time. By default, that's not true for native code compilation (because of inter-module optimizations) but you can disable it.

You are probably just confused by the relation between explicit module and signature definitions, and the implicit definition of modules through .ml/.mli files.
Basically, if you have a file a.ml and use it inside some other file, then it is as if you had written
module A =
struct
(* content of file a.ml *)
end
If you also have a.mli, then it is as if you had written
module A :
sig
(* content of file a.mli *)
end =
struct
(* content of file a.ml *)
end
Note that this only defines a module named A, not a module type. A's signature cannot be given a name through this mechanism.
Another file using A can be compiled against a.mli alone, without providing a.ml at all. However, you want to make sure that all type information is made transparent where needed. For example, suppose you are to define a map over integers:
(* intMap.mli *)
type key = int
type 'a map
val empty : 'a map
val add : key -> 'a -> 'a map -> 'a map
val lookup : key -> 'a map -> 'a option
...
Here, key is made transparent, because any client code (of the module IntMap that this signature describes) needs to know what it is to be able to add something to the map. The map type itself, however, can (and should) be kept abstract, because a client shouldn't mess with its implementation details.
The relation to C header files is that those basically only allow transparent types. In Ocaml, you have the choice.

module StringOrdering : ORDERING is a module declaration. You can use this in a signature, to say that the signature contains a module field called StringOrdering and having the signature ORDERING. It doesn't make sense in a module.
You need to define a module somewhere that implements the operations you need. The module definition can be something like
module StringOrderingImplementation = struct
type t = string
let isLess x y = x <= y
end
If you want to hide the definition of the type t, you need to make a different module where the definition is abstract. The operation to make a new module out of an old one is called sealing, and is expressed through the : operator.
module StringOrderingAbstract = (StringOrdering : ORDERING)
Then StringOrderingImplementation.isLess "a" "b" is well-typed, whereas StringOrderingAbstract.isLess "a" "b" cannot be typed since StringOrderingAbstract.t is an abstract type, which is not compatible with string or any other preexisting type. In fact, it's impossible to build a value of type StringOrderingAbstract.t, since the module does not include any constructor.
When you have a compilation unit foo.ml, it is a module Foo, and the signature of this module is given by the interface file foo.mli. That is, the files foo.ml and foo.mli are equivalent to the module definition
module Foo = (struct (*…contents of foo.ml…*) end :
sig (*…contents of foo.mli…*) end)
When compiling a module that uses Foo, the compiler only looks at foo.mli (or rather the result of its compilation: foo.cmi), not at foo.ml¹. This is how interfaces and separate compilation fit together. C needs #include <foo.h> because it lacks any form of namespace; in OCaml, Foo.bar automatically refers to a bar defined in the compilation unit foo if there is no other module called Foo in scope.
¹ Actually, the native code compiler looks at the implementation of Foo to perform optimizations (inlining). The type checker never looks at anything but what is in the interface.

Related

Separating operator definitions for a class to other files and using them

I have 4 files all in the same directory: main.rakumod, infix_ops.rakumod, prefix_ops.rakumod and script.raku:
main module has a class definition (class A)
*_ops modules have some operator routine definitions to write, e.g., $a1 + $a2 in an overloaded way.
script.raku tries to instantaniate A object(s) and use those user-defined operators.
Why 3 files not 1? Since class definition might be long and separating overloaded operator definitions in files seemed like a good idea for writing tidier code (easier to manage).
e.g.,
# main.rakumod
class A {
has $.x is rw;
}
# prefix_ops.rakumod
use lib ".";
use main;
multi prefix:<++>(A:D $obj) {
++$obj.x;
$obj;
}
and similar routines in infix_ops.rakumod. Now, in script.raku, my aim is to import main module only and see the overloaded operators also available:
# script.raku
use lib ".";
use main;
my $a = A.new(x => -1);
++$a;
but it naturally doesn't see ++ multi for A objects because main.rakumod doesn't know the *_ops.rakumod files as it stands. Is there a way I can achieve this? If I use prefix_ops in main.rakumod, it says 'use lib' may not be pre-compiled perhaps because of circular dependentness
it says 'use lib' may not be pre-compiled
The word "may" is ambiguous. Actually it cannot be precompiled.
The message would be better if it said something to the effect of "Don't put use lib in a module."
This has now been fixed per #codesections++'s comment below.
perhaps because of circular dependentness
No. use lib can only be used by the main program file, the one directly run by Rakudo.
Is there a way I can achieve this?
Here's one way.
We introduce a new file that's used by the other packages to eliminate the circularity. So now we have four files (I've rationalized the naming to stick to A or variants of it for the packages that contribute to the type A):
A-sawn.rakumod that's a role or class or similar:
unit role A-sawn;
Other packages that are to be separated out into their own files use the new "sawn" package and does or is it as appropriate:
use A-sawn;
unit class A-Ops does A-sawn;
multi prefix:<++>(A-sawn:D $obj) is export { ++($obj.x) }
multi postfix:<++>(A-sawn:D $obj) is export { ($obj.x)++ }
The A.rakumod file for the A type does the same thing. It also uses whatever other packages are to be pulled into the same A namespace; this will import symbols from it according to Raku's standard importing rules. And then relevant symbols are explicitly exported:
use A-sawn;
use A-Ops;
sub EXPORT { Map.new: OUTER:: .grep: /'fix:<'/ }
unit class A does A-sawn;
has $.x is rw;
Finally, with this setup in place, the main program can just use A;:
use lib '.';
use A;
my $a = A.new(x => -1);
say $a++; # A.new(x => -1)
say ++$a; # A.new(x => 1)
say ++$a; # A.new(x => 2)
The two main things here are:
Introducing an (empty) A-sawn package
This type eliminates circularity using the technique shown in #codesection's answer to Best Way to Resolve Circular Module Loading.
Raku culture has a fun generic term/meme for techniques that cut through circular problems: "circular saws". So I've used a -sawn suffix of the "sawn" typename as a convention when using this technique.[1]
Importing symbols into a package and then re-exporting them
This is done via sub EXPORT { Map.new: ... }.[2] See the doc for sub EXPORT.
The Map must contain a list of symbols (Pairs). For this case I've grepped through keys from the OUTER:: pseudopackage that refers to the symbol table of the lexical scope immediately outside the sub EXPORT the OUTER:: appears in. This is of course the lexical scope into which some symbols (for operators) have just been imported by the use Ops; statement. I then grep that symbol table for keys containing fix:<; this will catch all symbol keys with that string in their name (so infix:<..., prefix:<... etc.). Alter this code as needed to suit your needs.[3]
Footnotes
[1] As things stands this technique means coming up with a new name that's different from the one used by the consumer of the new type, one that won't conflict with any other packages. This suggests a suffix. I think -sawn is a reasonable choice for an unusual and distinctive and mnemonic suffix. That said, I imagine someone will eventually package this process up into a new language construct that does the work behind the scenes, generating the name and automating away the manual changes one has to make to packages with the shown technique.
[2] A critically important point is that, if a sub EXPORT is to do what you want, it must be placed outside the package definition to which it applies. And that in turn means it must be before a unit package declaration. And that in turn means any use statement relied on by that sub EXPORT must appear within the same or outer lexical scope. (This is explained in the doc but I think it bears summarizing here to try head off much head scratching because there's no error message if it's in the wrong place.)
[3] As with the circularity saw aspect discussed in footnote 1, I imagine someone will also eventually package up this import-and-export mechanism into a new construct, or, perhaps even better, an enhancement of Raku's built in use statement.
Hi #hanselmann here is how I would write this (in 3 files / same dir):
Define my class(es):
# MyClass.rakumod
unit module MyClass;
class A is export {
has $.x is rw;
}
Define my operators:
# Prefix_Ops.rakumod
unit module Prefix_Ops;
use MyClass;
multi prefix:<++>(A:D $obj) is export {
++$obj.x;
$obj;
}
Run my code:
# script.raku
use lib ".";
use MyClass;
use Prefix_Ops;
my $a = A.new(x => -1);
++$a;
say $a.x; #0
Taking my cue from the Module docs there are a couple of things I am doing different:
Avoiding the use of main (or Main, or MAIN) --- I am wary that MAIN is a reserved name and just want to keep clear of engaging any of that (cool) machinery
Bringing in the unit module declaration at the top of each 'rakumod' file ... it may be possible to use bare files in Raku ... but I have never tried this and would say that it is not obvious from the docs that it is even possible, or supported
Now since I wanted this to work first time you will note that I use the same file name and module name ... again it may be possible to do that differently (multiple modules in one file and so on) ... but I have not tried that either
Using the 'is export' trait where I want my script to be able to use these definitions ... as you will know from close study of the docs ;-) is that each module has it's own namespace (the "stash") and we need export to shove the exported definitions into the namespace of the script
As #raiph mentions you only need the script to define the module library location
Since you want your prefix multi to "know" about class A then you also need to use MyClass in the Prefix_Ops module
Anyway, all-in-all, I think that the raku module system exemplifies the unique combination of "easy things easy and hard thinks doable" ... all I had to do with your code (which was very close) was tweak a few filenames and sprinkle in some concise concepts like 'unit module' and 'is export' and it really does not look much different since raku keeps all the import/export machinery under the surface like the swan gliding over the river...

How to get a module type from an interface?

I would like to have my own implementation of an existing module but to keep a compatible interface with the existing module. I don't have a module type for the existing module, only an interface. So I can't use include Original_module in my interface. Is there a way to get a module type from an interface?
An example could be with the List module from the stdlib. I create a My_list module with exactly the same signature than List. I could copy list.mli to my_list.mli, but it does not seem very nice.
In some cases, you should use
include module type of struct include M end (* I call it OCaml keyword mantra *)
rather than
include module type of M
since the latter drops the equalities of data types with their originals defined in M.
The difference can be observed by ocamlc -i xxx.mli:
include module type of struct include Complex end
has the following type definition:
type t = Complex.t = { re : float; im : float; }
which means t is an alias of the original Complex.t.
On the other hand,
include module type of Complex
has
type t = { re : float; im : float; }
Without the relation with Complex.t, it becomes a different type from Complex.t: you cannot mix code using the original module and your extended version without the include hack. This is not what you want usually.
You can look at RWO : if you want to include the type of a module (like List.mli) in another mli file :
include (module type of List)

PETSC header #include'd in a module

I have a module which holds global variables. To declare some global variables, I need to use HDF5. I am also using a library, so I also need to include a header file. So the preamble of global_variable.F90 looks like this.
module global_variables
use HDF5
#include "finclude/petscsys.h"
#include "finclude/petscvec.h"
integer(HID_T) id_file
integer(HID_T) id_plist
Vec M, C, K
...
end module
Vec is a data type defined in the header file and HID_T is a data type defined in HDF5 module.
Now, I have a file which holds subroutines for I/O. This file also uses HDF5 and the same library used in global_variables.F90. So IO.F90 looks like this.
module io
use global_varibles
contains
subroutine read_input_file( vector )
Vec vector
integer HDF5err
call H5open_f( HDF5err )
...
end subroutine
end module
Question 1: compiler returns error when compiling IO.F90, saying that Vec is undefined data type. But it does not complain about HID_T. I thought global_variables module already contains both HDF5 modules and header files, using global_variables module in IO.F90 will handle every data type declaration but it seems not. Could you please help me understand what I am understanding wrong?
Question 2: Is there a way to restrict the effect of #include to the module where it is declared?
PS. If I include #include "finclude/petscvec.h" in IO.F90, which declares Vec, then it compiles well.
The syntax
Vec vector
is completely alien to Fortran. It works only because Vec is a C pre-processor (CPP) macro defined in the header file "finclude/petscvec.h" as
#define Vec PetscFortranAddr
That means that you must include the header file in every Fortran file in which you use the above syntax with Vec. The macro cannot be inherited using the Fortran use because it is not a part of Fortran.
The PetscFortranAddr is in the end defined in "finclude/petscdef.h" as an integer with 4 or 8 bytes depending on your system.
There is probably nothing you can do except reverse engineering what it in the end the pre-processor makes to be, but I wouldn't go that way, it may be unportable.

Including a module more than once

Suppose I have a module which defines some basic constants such as
integer, parameter :: i8 = selected_int_kind(8)
If I include this in my main program and I also include a module which does some other things (call this module functions) but functions also uses constants, then am I essentially including constants twice in my main program?
If so, is this bad? Can it be dangerous at all to include a module too many times in a program?
No, it is fine to do this. All you are doing with the use statement is providing access to the variables and functions defined in your module via use association. It is not like declaring variables each time they are use'd (they are in fact redeclared however).
The only thing to be wary of are circular dependencies, where module A uses module B and module B uses module A. This is not allowed.
Edit: From Metcalf et al. Fortran 95/2003 explained, pg. 72:
A module may contain use statements that access other modules. It must not access itself directly or indirectly through a chain of use statements, for example a accessing b and b accessing a.
Whilst this quote doesn't directly answer your question, it reiterates that really the only thing you can't do is have a circular dependency. So the following is perfectly valid:
module one_def
implicit none
integer, parameter :: one=1
end module one_def
module two_def
use one_def, only : one
implicit none
integer, parameter :: two=one+one
end module two_def
program test
use one_def, only : one
use two_def, only : two
implicit none
print*, two == one+one ! This prints .True.
end program

Understanding functors in OCaml

I'm quite stuck with the following functor problem in OCaml. I paste some of the code just to let you understand. Basically
I defined these two modules in pctl.ml:
module type ProbPA = sig
include Hashtbl.HashedType
val next: t -> (t * float) list
val print: t -> float -> unit
end
module type M = sig
type s
val set_error: float -> unit
val check: s -> formula -> bool
val check_path: s -> path_formula -> float
val check_suite: s -> suite -> unit
end
and the following functor:
module Make(P: ProbPA): (M with type s = P.t) = struct
type s = P.t
(* implementation *)
end
Then to actually use these modules I defined a new module directly in a file called prism.ml:
type state = value array
type t = state
type value =
| VBOOL of bool
| VINT of int
| VFLOAT of float
| VUNSET
(* all the functions required *)
From a third source (formulas.ml) I used the functor with Prism module:
module PrismPctl = Pctl.Make(Prism)
open PrismPctl
And finally from main.ml
open Formulas.PrismPctl
(* code to prepare the object *)
PrismPctl.check_suite s.sys_state suite (* error here *)
and compiles gives the following error
Error: This expression has type Prism.state = Prism.value array
but an expression was expected of type Formulas.PrismPctl.s
From what I can understand there a sort of bad aliasing of the names, they are the same (since value array is the type defined as t and it's used M with type s = P.t in the functor) but the type checker doesn't consider them the same.
I really don't understand where is the problem, can anyone help me?
Thanks in advance
(You post non-compilable code. That's a bad idea because it may make it harder for people to help you, and because reducing your problem down to a simple example is sometimes enough to solve it. But I think I see your difficulty anyway.)
Inside formulas.ml, Ocaml can see that PrismPctl.s = Pctl.Make(Prism).t = Prism.t; the first equality is from the definition of PrismPctl, and the second equality is from the signature of Pctl.Make (specifically the with type s = P.t bit).
If you don't write an mli file for Formulas, your code should compile. So the problem must be that the .mli file you wrote doesn't mention the right equality. You don't show your .mli files (you should, they're part of the problem), but presumably you wrote
module PrismPctl : Pctl.M
That's not enough: when the compiler compiles main.ml, it won't know anything about PrismPctl that's not specified in formulas.mli. You need to specify either
module PrismPctl : Pctl.M with type s = Prism.t
or, assuming you included with type s = P.t in the signature of Make in pctl.mli
module PrismPctl : Pctl.M with type s = Pctl.Make(Prism).s
This is a problem I ran into as well when learning more about these. When you create the functor you expose the signature of the functor, in this case M. It contains an abstract type s, parameterized by the functor, and anything more specific is not exposed to the outside. Thus, accessing any record element of s (as in sys_state) will result in a type error, as you've encountered.
The rest looks alright. It is definitely hard to get into using functors properly, but remember that you can only manipulate instances of the type parameterized by the functor through the interface/signature being exposed by the functor.