Separate Namespaces for Functions and Variables in Common Lisp versus Scheme - variables

Scheme uses a single namespace for all variables, regardless of whether they are bound to functions or other types of values. Common Lisp separates the two, such that the identifier "hello" may refer to a function in one context, and a string in another.
(Note 1: This question needs an example of the above; feel free to edit it and add one, or e-mail the original author with it and I will do so.)
However, in some contexts, such as passing functions as parameters to other functions, the programmer must explicitly distinguish that he's specifying a function variable, rather than a non-function variable, by using #', as in:
(sort (list '(9 A) '(3 B) '(4 C)) #'< :key #'first)
I have always considered this to be a bit of a wart, but I've recently run across an argument that this is actually a feature:
...the
important distinction actually lies in the syntax of forms, not in the
type of objects. Without knowing anything about the runtime values
involved, it is quite clear that the first element of a function form
must be a function. CL takes this fact and makes it a part of the
language, along with macro and special forms which also can (and must)
be determined statically. So my question is: why would you want the
names of functions and the names of variables to be in the same
namespace, when the primary use of function names is to appear where a
variable name would rarely want to appear?
Consider the case of class names: why should a class named FOO prevent
the use of variables named FOO? The only time I would be referring the
class by the name FOO is in contexts which expect a class name. If, on
the rare occasion I need to get the class object which is bound to the
class name FOO, there is FIND-CLASS.
This argument does make some sense to me from experience; there is a similar case in Haskell with field names, which are also functions used to access the fields. This is a bit awkward:
data Point = Point { x, y :: Double {- lots of other fields as well --} }
isOrigin p = (x p == 0) && (y p == 0)
This is solved by a bit of extra syntax, made especially nice by the NamedFieldPuns extension:
isOrigin2 Point{x,y} = (x == 0) && (y == 0)
So, to the question, beyond consistency, what are the advantages and disadvantages, both for Common Lisp vs. Scheme and in general, of a single namespace for all values versus separate ones for functions and non-function values?

The two different approaches have names: Lisp-1 and Lisp-2. A Lisp-1 has a single namespace for both variables and functions (as in Scheme) while a Lisp-2 has separate namespaces for variables and functions (as in Common Lisp). I mention this because you may not be aware of the terminology since you didn't refer to it in your question.
Wikipedia refers to this debate:
Whether a separate namespace for functions is an advantage is a source of contention in the Lisp community. It is usually referred to as the Lisp-1 vs. Lisp-2 debate. Lisp-1 refers to Scheme's model and Lisp-2 refers to Common Lisp's model. These names were coined in a 1988 paper by Richard P. Gabriel and Kent Pitman, which extensively compares the two approaches.
Gabriel and Pitman's paper titled Technical Issues of Separation in Function Cells and Value Cells addresses this very issue.

Actually, as outlined in the paper by Richard Gabriel and Kent Pitman, the debate is about Lisp-5 against Lisp-6, since there are several other namespaces already there, in the paper are mentioned type names, tag names, block names, and declaration names. edit: this seems to be incorrect, as Rainer points out in the comment: Scheme actually seems to be a Lisp-1. The following is largely unaffected by this error, though.
Whether a symbol denotes something to be executed or something to be referred to is always clear from the context. Throwing functions and variables into the same namespace is primarily a restriction: the programmer cannot use the same name for a thing and an action. What a Lisp-5 gets out of this is just that some syntactic overhead for referencing something from a different namespace than what the current context implies is avoided. edit: this is not the whole picture, just the surface.
I know that Lisp-5 proponents like the fact that functions are data, and that this is expressed in the language core. I like the fact that I can call a list "list" and a car "car" without confusing my compiler, and functions are a fundamentally special kind of data anyway. edit: this is my main point: separate namespaces are not a wart at all.
I also liked what Pascal Constanza had to say about this.

I've met a similar distinction in Python (unified namespace) vs Ruby (distinct namespaces for methods vs non-methods). In that context, I prefer Python's approach -- for example, with that approach, if I want to make a list of things, some of which are functions while others aren't, I don't have to do anything different with their names, depending on their "function-ness", for example. Similar considerations apply to all cases in which function objects are to be bandied around rather than called (arguments to, and return values from, higher-order functions, etc, etc).
Non-functions can be called, too (if their classes define __call__, in the case of Python -- a special case of "operator overloading") so the "contextual distinction" isn't necessarily clear, either.
However, my "lisp-oid" experience is/was mostly with Scheme rather than Common Lisp, so I may be subconsciously biased by the familiarity with the uniform namespace that in the end comes from that experience.

The name of a function in Scheme is just a variable with the function as its value. Whether I do (define x (y) (z y)) or (let ((x (lambda (y) (z y)))), I'm defining a function that I can call. So the idea that "a variable name would rarely want to appear there" is kind of specious as far as Scheme is concerned.
Scheme is a characteristically functional language, so treating functions as data is one of its tenets. Having functions be a type of their own that's stored like all other data is a way of carrying on the idea.

The biggest downside I see, at least for Common Lisp, is understandability. We can all agree that it uses different namespaces for variables and functions, but how many does it have? In PAIP, Norvig showed that it has "at least seven" namespaces.
When one of the language's classic books, written by a highly respected programmer, can't even say for certain in a published book, I think there's a problem. I don't have a problem with multiple namespaces, but I wish the language was, at the least, simple enough that somebody could understand this aspect of it entirely.
I'm comfortable using the same symbol for a variable and for a function, but in the more obscure areas I resort to using different names out of fear (colliding namespaces can be really hard to debug!), and that really should never be the case.

There's good things to both approaches. However, I find that when it matters, I prefer having both a function LIST and a a variable LIST than having to spell one of them incorrectly.

Related

Why do people put "my" in their variable names?

I have seen an incredibly large number of variables in our codebase at work that are along the lines of myCounter or myClassVariable. Why?
I don't just see this at work, either. I see this in tutorials online, github, blogs, etc. I would understand if it's just a placeholder for an example, but otherwise I can't imagine it being a standard of any kind.
Is this a holdover from some old standard or is it just a bad practice that snuck in before code reviews were common place? Was it an ancestor to people's usage of the underscore to indicate a _ClassName?
Starting a variable name with my indicates that it is defined by the programmer rather than a predefined variable. In a statically typed language (with explicit types) it also prevents a name clash with the type name, for instance
Collection myCollection;
A my variable name should only be used in generic examples where the variable has no other interpretation.
It may also be worth mentioning that in the programming language Perl (since version 5 from 1994) the keyword my is used to declare a local variable, for instance
my $message = "hello there";
https://perldoc.perl.org/functions/my

Elm syntax for generating variables and functions: how to tell the difference?

Forgive me if this is a kind of silly question, but I've been going through "Programming Elm" and one thing struck me as a little odd: in the text he shows an example of creating a record,
dog = { name = "Tucker", age = 11 }
and then right after that he shows a function that returns a record
haveBirthday d = { name = d.name, age = d.age + 1 }
To me, the syntax for both seems remarkably similar. How does the compiler know which is which? By the + on the right hand side of the function, that implies change, so it has to be a function? By the fact that there's an argument, d? Or is it that the difference between generating a record and a function is quite obvious, and it's just in this case that they seem so alike? Or is it that in some subtle way that I don't yet have the Zen to grasp, they are in fact the same thing? (That is, something like "everything is a function"?)
I've looked at https://elm-lang.org/docs/syntax#functions -- the docs are very user-friendly, but brief. Are there any other resources that give a more didactic view of the syntax (like this book does for Haskell)?
Thanks for any help along the way.
In an imperative language where "side-effects" are the norm, the term "function" is often used to describe what's more appropriately called a procedure or sub-routine. A set of instruction to be executed when called, and where order of execution and re-evaluation is essential because mutation and other side-effects can change anything from anywhere at any time.
In functional programming, however, the notion of a function is closer to the mathematical sense of the term, where its return value is computed entirely based on its arguments. This is especially true for a "pure" functional language like Elm, which normally does not allow "side-effects". That is, effects that interact with the "outside world" without going through the input arguments or return value. In a pure functional language it does not make sense to have a function that does not take any arguments, because it would always do the same thing, and computing the same value again and again is just wasteful. A function with no arguments is effectively just a value. And a function definition and value binding can therefore be distinguished solely based on whether or not it has any arguments.
But there are also many hybrid programming languages. Most functional languages are hybrids in fact, that allow side-effects but still stick close to the mathematical sense of a function. These languages also typically don't have functions without arguments, but use a special type called unit, or (), which has only one value, also called unit or (), which is used to denote a function that takes no significant input, or which returns nothing significant. Since unit has only one value, it carries no significant information.
Many functional languages don't even have functions that take multiple arguments either. In Elm and many other languages, a function takes exactly one argument. No more and no less, ever. You might have seen Elm code which appears to have multiple arguments, but that's all an illusion. Or syntax sugar as it's called in language theoretic lingo.
When you see a function definition like this:
add a b = a + b
that actual translates to this:
add = \a -> \b -> a + b
A function that takes an argument a, then returns another function which takes an argument b, which does the actual computation and returns the result. This is called currying.
Why do this? Because it makes it very convenient to partially apply functions. You can just leave out the last, or last few, arguments, then instead of an error you get a function back which you can fully apply later to get the result. This enables you to do some really handy things.
Let's look at an example. To fully apple add from above we'd just do:
add 2 3
The compiler actually parses this as (add 2) 3, so we've kind of done partial application already, but then immediately applied to another value. But what if we want to add 2 to a whole bunch of things and don't want write add 2 everywhere, because DRY and such? We write a function:
add2ToThings thing =
add 2 thing
(Ok, maybe a little bit contrived, but stay with me)
Partial application allows us to make this even shorter!
add2ToThings =
add 2
You see how that works? add 2 returns a function, and we just give that a name. There have been numerous books written about this marvellous idea in OOP, but they call it "dependency injection" and it's usually slightly more verbose when implemented with OOP techniques.
Anyway, say we have a list of "thing"s, we can get a new list with 2 added to everything by mapping over it like this:
List.map add2ToThings things
But we can do even better! Since add 2 is actually shorter than the name we gave it, we might as well just use it directly:
List.map (add 2) things
Ok, but then say we want to filter out every value that is exactly 5. We can actually partially apply infix operators too, but we have to surround the operator in parentheses to make it behave like an ordinary function:
List.filter ((/=) 5) (List.map (add 2) things)
This is starting to look a bit convoluted though, and reads backwards since we filter after we map. Fortunately we can use Elm's pipe operator |> to clean it up a bit:
things
|> List.map (add 2)
|> List.filter ((/=) 5)
The pipe operator was "discovered" because of partial application. Without that it couldn't have been implemented as an ordinary operator, but would have to be implemented as a special syntax rule in the parser. It's implementation is (essentially) just:
x |> f = f x
It takes an arbitrary argument on its left side and a function on its right side, then applies the function to the argument. And because of partial application we can conveniently get a function to pass in on the right side.
So in three lines of ordinary idiomatic Elm code we've used partial application four times. Without that, and currying, we'd have to write something like:
List.filter (\thing -> thing /= 5) (List.map (\thing -> add 2 thing) things)
Or we might want to write it with some variable bindings to make it more readable:
let
add2ToThings thing =
add 2 thing
thingsWith2Added =
List.map add2ToThings things
thingsWith2AddedAndWithout5 =
List.filter (\thing -> thing /= 5) thingWith2Added
in
thingsWith2AddedAndWithout5
And so that's why functional programming is awesome.

Ocaml naming convention

I am wondering if there exists already some naming conventions for Ocaml, especially for names of constructors, names of variables, names of functions, and names for labels of record.
For instance, if I want to define a type condition, do you suggest to annote its constructors explicitly (for example Condition_None) so as to know directly it is a constructor of condition?
Also how would you name a variable of this type? c or a_condition? I always hesitate to use a, an or the.
To declare a function, is it necessary to give it a name which allows to infer the types of arguments from its name, for example remove_condition_from_list: condition -> condition list -> condition list?
In addition, I use record a lot in my programs. How do you name a record so that it looks different from a normal variable?
There are really thousands of ways to name something, I would like to find a conventional one with a good taste, stick to it, so that I do not need to think before naming. This is an open discussion, any suggestion will be welcome. Thank you!
You may be interested in the Caml programming guidelines. They cover variable naming, but do not answer your precise questions.
Regarding constructor namespacing : in theory, you should be able to use modules as namespaces rather than adding prefixes to your constructor names. You could have, say, a Constructor module and use Constructor.None to avoid confusion with the standard None constructor of the option type. You could then use open or the local open syntax of ocaml 3.12, or use module aliasing module C = Constructor then C.None when useful, to avoid long names.
In practice, people still tend to use a short prefix, such as the first letter of the type name capitalized, CNone, to avoid any confusion when you manipulate two modules with the same constructor names; this often happen, for example, when you are writing a compiler and have several passes manipulating different AST types with similar types: after-parsing Let form, after-typing Let form, etc.
Regarding your second question, I would favor concision. Inference mean the type information can most of the time stay implicit, you don't need to enforce explicit annotation in your naming conventions. It will often be obvious from the context -- or unimportant -- what types are manipulated, eg. remove cond (l1 # l2). It's even less useful if your remove value is defined inside a Condition submodule.
Edit: record labels have the same scoping behavior than sum type constructors. If you have defined a {x: int; y : int} record in a Coord submodule, you access fields with foo.Coord.x outside the module, or with an alias foo.C.x, or Coord.(foo.x) using the "local open" feature of 3.12. That's basically the same thing as sum constructors.
Before 3.12, you had to write that module on each field of a record, eg. {Coord.x = 2; Coord.y = 3}. Since 3.12 you can just qualify the first field: {Coord.x = 2; y = 3}. This also works in pattern position.
If you want naming convention suggestions, look at the standard library. Beyond that you'll find many people with their own naming conventions, and it's up to you to decide who to trust (just be consistent, i.e. pick one, not many). The standard library is the only thing that's shared by all Ocaml programmers.
Often you would define a single type, or a single bunch of closely related types, in a module. So rather than having a type called condition, you'd have a module called Condition with a type t. (You should give your module some other name though, because there is already a module called Condition in the standard library!). A function to remove a condition from a list would be Condition.remove_from_list or ConditionList.remove. See for example the modules List, Array, Hashtbl,Map.Make`, etc. in the standard library.
For an example of a module that defines many types, look at Unix. This is a bit of a special case because the names are mostly taken from the preexisting C API. Many constructors have a short prefix, e.g. O_ for open_flag, SEEK_ for seek_command, etc.; this is a reasonable convention.
There's no reason to encode the type of a variable in its name. The compiler won't use the name to deduce the type. If the type of a variable isn't clear to a casual reader from the context, put a type annotation when you define it; that way the information provided to the reader is validated by the compiler.

How is the `*var-name*` naming-convention used in clojure?

As a non-lisper coming to clojure how should I best understand the naming convention where vars get a name like *var-name*?
This appears to be a lisp convention indicating a global variable. But in clojure such vars appear in namespaces as far as I can tell.
I would really appreciate a brief explanation of what I should expect when an author has used such vars in their code, ideally with a example of how and why such a var would be used and changed in a clojure library.
It's a convention used in other Lisps, such as Common Lisp, to distinguish between special variables, as distinct from lexical variables. A special or dynamic variable has its binding stored in a dynamic environment, meaning that its current value as visible to any point in the code depends upon how it may have been bound higher up the call stack, as opposed to being dependent only on the most local lexical binding form (such as let or defn).
Note that in his book Let Over Lambda, Doug Hoyte argues against the "earmuffs" asterix convention for naming special variables. He uses an unusual macro style that makes reference to free variables, and he prefers not to commit to or distinguish whether those symbols will eventually refer to lexical or dynamic variables.
Though targeted specifically at Common Lisp, you might enjoy Ron Garret's essay The Idiot's Guide to Special Variables. Much of it can still apply to Clojure.
Functional programming is all about safe predictable functions. Infact some of us are afraid of that spooky "action at a distance" thing. When people call a function they get a warm fuzzy satisfaction that the function will always give them the same result if they call the function or read the value again. the *un-warm-and-fuzzy* bristly things exist to warn programmers that this variable is less cuddly than some of the others.
Some references I found in the Clojure newsgroups:
Re: making code readable
John D. Hume
Tue, 30 Dec 2008 08:30:57 -0800
On Mon, Dec 29, 2008 at 4:10 PM, Chouser wrote:
I believe the idiom for global values like this is to place asterisks
around the name.
I thought the asterisk convention was for variables intended for
dynamic binding. It took me a minute to figure out where I got that
idea. "Programming Clojure" suggests it (without quite saying it) in
chapter 6, section 3.
"Vars intended for dynamic binding are sometimes called special vari-
ables. It is good style to name them
with leading and trailing asterisks."
Obviously the book's a work in progress, but that does sound
reasonable. A special convention for variables whose values change (or
that my code's welcome to rebind) seems more useful to me than one for
"globals" (though I'm not sure I'd consider something like grid-size
for a given application a global). Based on ants.clj it appears Rich
doesn't feel there needs to be a special naming convention for that
sort of value.
and...
I believe the idiom for global values like this is to place asterisks
around the name. Underscores (and CamelCase) should only be used when
required for Java interop:
(def *grid-size* 10)
(def *height* 600)
(def *margin* 50)
(def *x-index* 0)
(def *y-index* 1)

Is there a concise catalog of variable naming-conventions?

There are many different styles of variable names that I've come across over the years.
The current wikipedia entry on naming conventions is fairly light...
I'd love to see a concise catalog of variable naming-conventions, identifying it by a name/description, and some examples.
If a convention is particularly favored by a certain platform community, that would be worth noting, too.
I'm turning this into a community wiki, so please create an answer for each convention, and edit as needed.
The best naming convention set that I've seen is in the book "Code Complete" Steve McConnell has a great section in there about naming conventions and lots of examples. His examples run through a number of "best practices" for different languages, but ultimately leave it up to the developer, dev manager, or architect to decide the specific action.
PEP8 is the style guide for Python code that is part of the standard library which is relatively influential in the Python community. For bonus points, in addition to covering the naming conventions used by the Python standard library, it provides an overview of naming conventions in general.
According to PEP8: variables, instance variables, functions and methods in the standard library are supposed to use lowercase words separated by underscores for readability:
foo
parse_foo
a_long_descriptive_name
To distinguish a name from a reserved word, append an underscore:
in_
and_
or_
Names that are weakly private (not imported by 'from M import *') start with a single underscore. Names whose privacy is enforced with name mangling start with double underscores.:
_some_what_private
__a_slightly_more_private_name
Names that are Python 'special' methods start and end with double underscores:
__hash__ # hash(o) = o.__hash__()
__str__ # str(o) = o.__str__()
Sun published a list of variable name conventions for Java here. The list includes conventions for naming packages, classes, interfaces, methods, variables, and constants.
The section on variables states
Except for variables, all instance, class, and class constants are in mixed case with a lowercase first letter. Internal words start with capital letters. Variable names should not start with underscore _ or dollar sign $ characters, even though both are allowed.
Variable names should be short yet meaningful. The choice of a variable name should be mnemonic- that is, designed to indicate to the casual observer the intent of its use. One-character variable names should be avoided except for temporary "throwaway" variables. Common names for temporary variables are i, j, k, m, and n for integers; c, d, and e for characters.
Some examples include
int i;
char c;
float myWidth;
Whatever your naming convention is, it usually do not allow you to easily distinguish:
instance (usually private) variable
local variables (used as parameters or local to a function)
I use a naming convention based on a the usage of indefinite article (a, an, some) for local variables, and no article for instance variables, as explained in this question about variable prefix.
So a prefix (either my way, or any other prefix listed in that question) can be one way of refining variable naming convention.
I know your question is not limited on variables, I just wanted to point out that part of naming convention.
There is, the "Java Programmers Phrasebook" by Einar Hoest.
He analysed the grammatical structure of method names together with the structure of the method bodies, and collected information such as:
add-[noun]-* These methods often have parameters and create objects.
They very rarely return field values.
The phrase appears in practically all
applications.
etcetera ... collected from hundreds of open-source projects.
For more background information, please refer to his SLE08 paper.