I would like to design 2 modules A and B which both have their own functions, for instance: A.compare: A.t -> A.t -> bool, B.compare: B.t -> B.t -> bool. The elements of A and B are convertible. So I would also need functions a_of_b : B.t -> A.t and b_of_a : A.t -> B.t. My question is where I should define these functions? inside the structure of A or the one of B or somewhere else?
Could anyone help?
Edit1: just amended some errors based on the first comment
This is a classic design problem. In OOP languages, it is hard to resolve this elegantly because a class encapsulates both a type definition and methods related to that type. Thus, as soon as you have a function such as a_of_b, which regards two types to an equal extent, there is no clear place for it.
OCaml correctly provides distinct language mechanisms for these distinct needs: type definitions are introduced with the keyword type, and related methods are collected together in a module. This gives you greater flexibility in designing your API, but does not solve the problem automatically.
One possibility is to define modules A and B, both with their respective types and compare functions. Then, the question remaining is where to put a_of_b and b_of_a. You could arbitrarily give preference to module A, and define the functions A.to_b and A.of_b. This is what the Standard Library did when it put to_list and of_list in Array. This lacks symmetry; there is no reason not to have put these functions in B instead.
Instead you could standardize on using of_ functions vs to_ functions. Let's say you prefer to_. Then you would define the functions A.to_b and B.to_a. The problem now is modules A and B are mutually dependent, which is only possible if you define them in the same file.
If you will have lots of functions that deal with values of type A.t and B.t, then it may be worth defining a module AB, and putting all these functions in there. If you will only need two, then an extra module is perhaps overkill.
On the other hand, if the total number of functions regarding A's and B's is small, you could create only the module AB, with type a, type b, and all related methods. However, this does not follow the OCaml community's convention of naming a type t within its own module, and it will be harder to apply the Set and Map functors to these types.
You probably mean A.compare: A.t -> A.t -> bool because types are in lower cases.
You can have a single module AB which contains both the type for A and the type for B.
You can have a single module AB containing both A & B as sub-modules.
You might also use recursive modules & functors.
Related
I'm trying to understand List.sum from Jane streets core. I got it to work on a simple list of integers, but don't understand the concepts of Core's containers, and find the api documentation to terse to understand. Here's some code that works:
#require "core";;
open Core;;
List.sum (module Int) [1;2;3] ~f:ident;;
- : int = 6
#show List.sum;;
val sum :
(module Base__.Container_intf.Summable with type t = 'sum) ->
'a list -> f:('a -> 'sum) -> 'sum
Why do I have to use module Int and the identity function. [1;2;3] already provides a type of int list. Is there any good information about the design ideas behind Core?
The module provides the means of summing the values in question. The f provides a transformation function from the type of elements in the list to the type of elements you want to sum.
If all you want want to do is sum the integers in a list, then the summation function desired is in the Int module (thus we need module Int) and the transformation function is just ident (because we needn't transform the values at all).
However, what if you wanted obtain a sum of integers, but starting with a list of strings representing integers? Then we would have
utop # List.sum (module Int) ["1";"2";"3";"4"];;
- : f:(string -> int) -> int = <fun>
i.e., if we want to sum using the module Int over a list of strings, then we'll first need a function that will convert each value of type string to a value of type int. Thus:
utop # List.sum (module Int) ["1";"2";"3";"4"] ~f:Int.of_string;;
- : int = 10
This is pretty verbose, but it gives us a lot of flexibility! Imagine trying to sum using a different commutative operation, perhaps over a particular field in a record.
However, this is not the idiomatic way to sum a list of integers in OCaml. List.sum is a specific function which the List module "inherits" by virtue of it satisfying the a container interface used in the library design of Base (which provides the basic functionality of Core. The reason this function is relatively complex to use is because it is the result of a highly generalized design over algebraic structures (in this case, over collections of elements which can be transformed in elements which have a commutative operation defined over them).
For mundane integer summation, OCamlers just use a simple fold:
utop # List.fold [1;2;3;4] ~init:0 ~f:(+);;
- : int = 10
One good place to look for some insight into the design decisions behind Core is https://dev.realworldocaml.org/ . Another good resource is the Janestreet tech blog. You might also consult the Base repo (https://github.com/janestreet/base) or post a question asking for more specific details on the design philosophy in https://discuss.ocaml.org/
Janestreet's libraries have been notoriously opaque to newcomers, but they are getting a lot better, and the community will be happy to help you learn.
Tho the documentation is terse, it is very expressive. In particular, it tends to rely on the types to carry much of the weight, which means the code is largely self-documenting. It takes some practice to learn to read the types well, but this is well worth the effort, imo, and carries its own rewards!
I can extend a program by adding a module file in which I extend originally defined derived types like e.g.:
module mod1
type type1
real :: x
end type
end module
module mod2
use mod1
type,extends(type1) :: type2
contains
procedure,pass :: g
end type
contains
function g(y,e)
class(type2), intent(in) :: y
real,intent(in) :: e
g=y%x+e
end function
end module
program test
use mod2
type(type2) :: a
a%x=3e0
write(*,*) a%g(5e0)
end program
But with this solution I need to change the declaration of 'a' (type1->type2) in the calling program, each time when I'm adding another module. So my question is if there is a way around this, i.e. I can add a type bound procedure to a derived type in another module without changing the original name of the type.
I totally understand that this might not work since I could then declare a variable and extend its type later, what sounds problematic for me. So, I thought about the deferred statement. But this isn't really what I want, since I first have to ad it to the original definition and second I need to provide an interface and thus need to know about the variables of the later coming function (here g) already. However, maybe someone has a nice solution for this.
All this is of course to bring more structure in the program, especially when I think about different people working on one program at the same time, such a possibility to split workpackages seems rather useful.
You can rename entities that are use associated by using the renaming capability of the USE statement.
MODULE m2
USE m1, chicken => type1
TYPE, EXTENDS(chicken) :: type1
...
type1 in module m2 is a different type to type1 in module m1.
You could also do this renaming in the USE statement in your main program, or via some intermediate module.
If both type1 names are accessible in another scope and you reference the name type1, then your compiler will complain.
If you use this trick and other programmers read your code, then they might complain.
To some extent submodules would help you, but they are implemented in the most widely used compilers. You could defer the implementation of the procedure to the submodule, but you would have to specify the interface anyway.
It isn't possible in any other way as far as I know.
I'd like to optimize the readability of my codes in Fortran by using OOP.
I thus use derived types. what is the best practice to name the types and derived types?
For example, is it better to:
type a
real :: var
end type
type(a) :: mya
or always begin type names by type_ like in type_a? I like this one but maybe better ideas can be foud.
Also, is it better (then why) to use short names that are less readable or longer names that end up quite difficult to read if the type has too many "levels". For example, in a%b%c%d%e, if a, b, c, d and e are 8 or more letters long as in country%hospital%service%patient%name, then once again readability seems to be a concern.
Advices from experts are really welcome.
This not anything special to Fortran. You can use coding recommendation for other languages.
Usually, type names are not designated by any prefix or suffix. In many languages class names start with a capital letter. You can use this in Fortran also, even if it is not case sensitive. Just be sure not to reuse the name with a small letter as a variable name.
One example of a good coding guideline is this and you can adapt it for Fortran very easily. Also, have a look on some Fortran examples in books like MRC or RXX. OS can be also useful.
I would recommend not to use too short component names, if the letter is not the same as used in the written equation. In that case it can be used.
Use the associate construct or pointers to make aliases to nested names like country%hospital%service%patient%name.
In my experience, naming issues come up in OO Fortran more than other languages (e.g. C++) because of Fortran's named modules, lack of namespaces, and case-insensitivity, among other things. Case-insensitivity hurts because if you name a type Foo, you cannot have a variable named foo or you will get a compiler error (tested with gfortran 4.9).
The rules I have settled on are:
Each Fortran module provides a single, primary class named Namespace_Foo.
The class Namespace_Foo can be located in your source tree as Namespace/Foo_M.f90.
Class variables are nouns with descriptive, lower case names like bar or bar_baz.
Class methods are verbs with descriptive (but short if possible) names and use a rename search => Namespace_Foo_search.
Instances of class Namespace_Foo can be named foo (without namespace) when there is no easy alternative.
These rules make it particularly easy to mirror a C/C++ class Namespace::Foo in Fortran or bind (using BIND(C)) a C++ class to Fortran. They also avoid all of the common name collisions I've run into.
Here's a working example (tested with gfortran 4.9).
module Namespace_Foo_M
implicit none
type :: Namespace_Foo
integer :: bar
real :: bar_baz
contains
procedure, pass(this) :: search => Namespace_Foo_search
end type
contains
function Namespace_Foo_search(this, offset) result(index)
class(Namespace_Foo) :: this
integer,intent(in) :: offset !input
integer :: index !return value
index = this%bar + int(this%bar_baz) + offset
end function
end module
program main
use Namespace_Foo_M !src/Namespace/Foo_M.f90
type(Namespace_Foo) :: foo
foo % bar = 1
foo % bar_baz = 7.3
print *, foo % search(3) !should print 11
end program
Note that for the purpose of running the example, you can copy/paste everything above into a single file.
Final Thoughts
I have found the lack of namespaces extremely frustrating in Fortran and the only way to hack it is to just include it in the names themselves. We have some nested "namespaces", e.g. in C++ Utils::IO::PrettyPrinter and in Fortran Utils_IO_PrettyPrinter. One reason I use CamelCase for classes, e.g. PrettyPrinter instead of Pretty_Printer, is to disambiguate what is a namespace. It does not really matter to me if namespaces are upper or lower case, but the same case should be used in the name and file path, e.g. class utils_io_PrettyPrinter should live at utils/io/PrettyPrinter_M.f90. In large/unfamiliar projects, you will spend a lot of time searching the source tree for where specific modules live and developing a convention between module name and file path can be a major time saver.
Is currying for functional programming the same as overloading for OO programming? If not, why? (with examples if possible)
Tks
Currying is not specific to functional programming, and overloading is not specific to object-oriented programming.
"Currying" is the use of functions to which you can pass fewer arguments than required to obtain a function of the remaining arguments. i.e. if we have a function plus which takes two integer arguments and returns their sum, then we can pass the single argument 1 to plus and the result is a function for adding 1 to things.
In Haskellish syntax (with function application by adjacency):
plusOne = plusCurried 1
three = plusOne 2
four = plusCurried 2 2
five = plusUncurried 2 3
In vaguely Cish syntax (with function application by parentheses):
plusOne = plusCurried(1)
three = plusOne(2)
four = plusCurried(2)(2)
five = plusUncurried(2, 3)
You can see in both of these examples that plusCurried is invoked on only 1 argument, and the result is something that can be bound to a variable and then invoked on another argument. The reason that you're thinking of currying as a functional-programming concept is that it sees the most use in functional languages whose syntax has application by adjacency, because in that syntax currying becomes very natural. The applications of plusCurried and plusUncurried to define four and five in the Haskellish syntax merge to become completely indistinguishable, so you can just have all functions be fully curried always (i.e. have every function be a function of exactly one argument, only some of them will return other functions that can then be applied to more arguments). Whereas in the Cish syntax with application by parenthesised argument lists, the definitions of four and five look completely different, so you need to distinguish between plusCurried and plusUncurried. Also, the imperative languages that led to today's object-oriented languages never had the ability to bind functions to variables or pass them to other functions (this is known as having first-class functions), and without that facility there's nothing you can actually do with a curried-function other than invoke it on all arguments, and so no point in having them. Some of today's OO languages still don't have first-class functions, or only gained them recently.
The term currying also refers to the process of turning a function of multiple arguments into one that takes a single argument and returns another function (which takes a single argument, and may return another function which ...), and "uncurrying" can refer to the process of doing the reverse conversion.
Overloading is an entirely unrelated concept. Overloading a name means giving multiple definitions with different characteristics (argument types, number of arguments, return type, etc), and have the compiler resolve which definition is meant by a given appearance of the name by the context in which it appears.
A fairly obvious example of this is that we could define plus to add integers, but also use the same name plus for adding floating point numbers, and we could potentially use it for concatenating strings, arrays, lists, etc, or to add vectors or matrices. All of these have very different implementations that have nothing to do with each other as far as the language implementation is concerned, but we just happened to give them the same name. The compiler is then responsible for figuring out that plus stringA stringB should call the string plus (and return a string), while plus intX intY should call the integer plus (and return an integer).
Again, there is no inherent reason why this concept is an "OO concept" rather than a functional programming concept. It simply happened that it fit quite naturally in statically typed object-oriented languages that were developed; if you're already resolving which method to call by the object that the method is invoked on, then it's a small stretch to allow more general overloading. Completely ad-hoc overloading (where you do nothing more than define the same name multiple times and trust the compiler to figure it out) doesn't fit as nicely in languages with first-class functions, because when you pass the overloaded name as a function itself you don't have the calling context to help you figure out which definition is intended (and programmers may get confused if what they really wanted was to pass all the overloaded definitions). Haskell developed type classes as a more principled way of using overloading; these effectively do allow you to pass all the overloaded definitions at once, and also allow the type system to express types a bit like "any type for which the functions f and g are defined".
In summary:
currying and overloading are completely unrelated
currying is about applying functions to fewer arguments than they require in order to get a function of the remaining arguments
overloading is about providing multiple definitions for the same name and having the compiler select which definition is used each time the name is used
neither currying nor overloading are specific to either functional programming or object-oriented programming; they each simply happen to be more widespread in historical languages of one kind or another because of the way the languages developed, causing them to be more useful or more obvious in one kind of language
No, they are entirely unrelated and dissimilar.
Overloading is a technique for allowing the same code to be used at different types -- often known in functional programming as polymorphism (of various forms).
A polymorphic function:
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
Here, map is a function that operates on any list. It is polymorphic -- it works just as well with a list of Int as a list of trees of hashtables. It also is higher-order, in that it is a function that takes a function as an argument.
Currying is the transformation of a function that takes a structure of n arguments, into a chain of functions each taking one argument.
In curried languages, you can apply any function to some of its arguments, yielding a function that takes the rest of the arguments. The partially-applied function is a closure.
And you can transform a curried function into an uncurried one (and vice-versa) by applying the transformation invented by Curry and Schonfinkel.
curry :: ((a, b) -> c) -> a -> b -> c
-- curry converts an uncurried function to a curried function.
uncurry :: (a -> b -> c) -> (a, b) -> c
-- uncurry converts a curried function to a function on pairs.
Overloading is having multiple functions with the same name, having different parameters.
Currying is where you can take multiple parameters, and selectively set some, so you may just have one variable, for example.
So, if you have a graphing function in 3 dimensions, you may have:
justgraphit(double[] x, double[] y, double[] z), and you want to graph it.
By currying you could have:
var fx = justgraphit(xlist)(y)(z) where you have now set fx so that it now has two variables.
Then, later on, the user picks another axis (date) and you set the y, so now you have:
var fy = fx(ylist)(z)
Then, later you graph the information by just looping over some data and the only variability is the z parameter.
This makes complicated functions simpler as you don't have to keep passing what is largely set variables, so the readability increases.
I am wondering if there exists already some naming conventions for Ocaml, especially for names of constructors, names of variables, names of functions, and names for labels of record.
For instance, if I want to define a type condition, do you suggest to annote its constructors explicitly (for example Condition_None) so as to know directly it is a constructor of condition?
Also how would you name a variable of this type? c or a_condition? I always hesitate to use a, an or the.
To declare a function, is it necessary to give it a name which allows to infer the types of arguments from its name, for example remove_condition_from_list: condition -> condition list -> condition list?
In addition, I use record a lot in my programs. How do you name a record so that it looks different from a normal variable?
There are really thousands of ways to name something, I would like to find a conventional one with a good taste, stick to it, so that I do not need to think before naming. This is an open discussion, any suggestion will be welcome. Thank you!
You may be interested in the Caml programming guidelines. They cover variable naming, but do not answer your precise questions.
Regarding constructor namespacing : in theory, you should be able to use modules as namespaces rather than adding prefixes to your constructor names. You could have, say, a Constructor module and use Constructor.None to avoid confusion with the standard None constructor of the option type. You could then use open or the local open syntax of ocaml 3.12, or use module aliasing module C = Constructor then C.None when useful, to avoid long names.
In practice, people still tend to use a short prefix, such as the first letter of the type name capitalized, CNone, to avoid any confusion when you manipulate two modules with the same constructor names; this often happen, for example, when you are writing a compiler and have several passes manipulating different AST types with similar types: after-parsing Let form, after-typing Let form, etc.
Regarding your second question, I would favor concision. Inference mean the type information can most of the time stay implicit, you don't need to enforce explicit annotation in your naming conventions. It will often be obvious from the context -- or unimportant -- what types are manipulated, eg. remove cond (l1 # l2). It's even less useful if your remove value is defined inside a Condition submodule.
Edit: record labels have the same scoping behavior than sum type constructors. If you have defined a {x: int; y : int} record in a Coord submodule, you access fields with foo.Coord.x outside the module, or with an alias foo.C.x, or Coord.(foo.x) using the "local open" feature of 3.12. That's basically the same thing as sum constructors.
Before 3.12, you had to write that module on each field of a record, eg. {Coord.x = 2; Coord.y = 3}. Since 3.12 you can just qualify the first field: {Coord.x = 2; y = 3}. This also works in pattern position.
If you want naming convention suggestions, look at the standard library. Beyond that you'll find many people with their own naming conventions, and it's up to you to decide who to trust (just be consistent, i.e. pick one, not many). The standard library is the only thing that's shared by all Ocaml programmers.
Often you would define a single type, or a single bunch of closely related types, in a module. So rather than having a type called condition, you'd have a module called Condition with a type t. (You should give your module some other name though, because there is already a module called Condition in the standard library!). A function to remove a condition from a list would be Condition.remove_from_list or ConditionList.remove. See for example the modules List, Array, Hashtbl,Map.Make`, etc. in the standard library.
For an example of a module that defines many types, look at Unix. This is a bit of a special case because the names are mostly taken from the preexisting C API. Many constructors have a short prefix, e.g. O_ for open_flag, SEEK_ for seek_command, etc.; this is a reasonable convention.
There's no reason to encode the type of a variable in its name. The compiler won't use the name to deduce the type. If the type of a variable isn't clear to a casual reader from the context, put a type annotation when you define it; that way the information provided to the reader is validated by the compiler.